Problems to read Text: Latin A with php

4

I have a problem with this script below the function it does is find the pads or # the Hashtag of a message the problem is that it does not recognize the text Latin A - UTF-8 , I want it to work but I do not know how to make the script or what function to use.

#ŞŞŞ no funciona este tipo de texto
#SSS pero esta si

Example of the script:

function Hashtags($str)
{
  // Match the hashtags
  preg_match_all('/(^|[^a-z0-9_])#([a-z0-9_]+)/i', $str, $matchedHashtags);
  $hashtag = '';
  // For each hashtag, strip all characters but alnum
  if(!empty($matchedHashtags[0])) {
      foreach($matchedHashtags[0] as $match) {
          $hashtag .= preg_replace("/[^a-z0-9]+/i", "", $match).',';
      }
  }
    //to remove last comma in a string
    return rtrim ($hashtag, ',');
}
    
asked by sode 26.04.2018 в 05:11
source

1 answer

4

You just have to adapt the pattern a bit to allow or restrict certain characters.

And in the foreach instead of passing the key 0 that contains the pads, you pass the key 1 that has captured the > hashtag without them.

function Hashtags($str)
{
    // Match the hashtags
    preg_match_all('/#([\S]+)/', $str, $matchedHashtags);
    $hashtag='';
    // For each hashtag, strip all characters but alnum
    if(!empty($matchedHashtags[0])) {
        foreach($matchedHashtags[1] as $match) {
            $hashtag.=$match . ',';
        }
    }

    //to remove last comma in a string
    return rtrim($hashtag, ',');
}

echo Hashtags('#SSS pero esta si');
# out: SSS

echo Hashtags('#ŞŞŞ no funciona este tipo de texto'); 
# ŞŞŞ

echo Hashtags('Con #SSŞŞŞSS$$$ no funciona este tipo de texto #SSS pero esta si'); 
# SSŞŞŞSS$$$,SSS

If you want to restrict certain characters you add them within the capture of the pattern . Examples:

/#([\w]+)/ == [A-Za-z0-9_]
/#([\d]+)/ == [0-9]
/#([A-za-z0-9Ş]+)/

If I did not make a mistake in writing it, I should give the commented outings.

  

I edit the pattern with the suggestion of @Xerif

    
answered by 26.04.2018 / 18:15
source