get the first 3 words of a sentence in PHP

4

I have a question about how to get the first 2 or 3 words of a sentence. For example:

Example_1:

$text_1 = "Chaleco azul con tejido algodón";
Resultado=> Chaleco azul

Example_2:

$text_2 = "Lampara de lectura para la cama";
Resultado=> Lampara de lectura

Example_3:

$text_3 = "Suave oso de peluche de color morado";
Resultado=> Suave oso peluche

If you notice sometimes you can take the "DE", as in the example 2, instead in the example 3 it does not take the "DE". I do not know how the letters "DE" can be discriminated.

I say ... If the next word is more than 4 characters, add it otherwise do not suppress the remaining words ...

Thanks in advance

Forgive me, for having explained me so badly. I detail the idea better and with more examples, and thanks for the quick answers. ;)

The idea is to always get the first 3 words, except when in the 3rd position there is an "DE". In which case we would include the word that occupies the 4th position.

$Ejemplo_1= "zapatillas de fútbol para correr";

Resultado_1=> zapatillas de fútbol 

As you can see in Result_1, we always rescue the first 3 words. In this case the "DE" does not end in the 3rd position, therefore the result would be OK.

$Ejemplo_2= "15 metros de lona para el restaurante";

Resultado_2: 15 metros de lona

In the Result_2, if you look at the 3rd Position ends with a "DE", therefore the requirement mentioned above is met. Therefore we will only take the next word that is in the 4th position.

$Ejemplo_3= "Tarjeta memoria kodak 15 fotografías HD";

Resultado_3: Tarjeta memoria kodak

In Result_3, only take the 3 words, since it does not contain any "DE".

    
asked by Fumatamax 15.02.2018 в 13:17
source

4 answers

8

I have done a somewhat generic method for what you want:

function cortarFrase($frase, $maxPalabras = 3, $noTerminales = ["de"]) {
  $palabras = explode(" ", $frase);
  $numPalabras = count($palabras);
  if ($numPalabras > $maxPalabras) {
     $offset = $maxPalabras - 1;
     while (in_array($palabras[$offset], $noTerminales) && $offset < $numPalabras) { $offset++; }
     return implode(" ", array_slice($palabras, 0, $offset + 1));
  }
  return $frase;
}

This method supports up to three parameters:

  • $ phrase: the sentence to be cut
  • $ maxPalabras: the number of words you want to stay with. Optional parameter with initial value 3.
  • $ nonTerminal: a list of the words that the clipping should not end with. Optional parameter with initial value ["of"]

So what it does is break the chain and transform it into an array (with explode ). If the string has 3 words or less, the string is returned unchanged; If you have more than 3 words, then check that the last word is not in the list of words not allowed as terminals (adding one more word to the string if it is).

For example, using that function in the following code:

<?php

$frase1 = "zapatillas de fútbol para correr";
$frase2 = "15 metros de lona para el restaurante";
$frase3 = "Tarjeta memoria kodak 15 fotografías HD";

function cortarFrase($frase, $maxPalabras = 3, $noTerminales = ["de"]) {
    $palabras = explode(" ", $frase);
    $numPalabras = count($palabras);
    if ($numPalabras > $maxPalabras) {
        $offset = $maxPalabras - 1;
        while (in_array($palabras[$offset], $noTerminales) && $offset < $numPalabras) { $offset++; }
        return implode(" ", array_slice($palabras, 0, $offset+1));
    }
    return $frase;
}


echo cortarFrase($frase1) . "\n";
echo cortarFrase($frase2) . "\n";
echo cortarFrase($frase3) . "\n";
// quedarse con 4 palabras en lugar de 3
echo cortarFrase($frase1, 4) . "\n";
// quedarse con 3 palabras, pero no puede terminar en "de" o "lona"
echo cortarFrase($frase2, 3, ["de", "lona"]) . "\n";

you will get as a result:

  

soccer shoes
  15 meters of canvas
  Kodak memory card
  soccer shoes for
  15 meters of canvas for

    
answered by 15.02.2018 / 17:36
source
2
  

Always get the first 3 words, except when in the 3rd position you find a DE . In which case we would include the word that occupies the 4th position.

We can use the regex

/^\W*(?:\w+\W+){2}(?:de\W+)?\w+/iu

that matches the first 2 words ^\W*(?:\w+\W+){2} , optionally (?:de\W+)? and the last word \w+ .

Code:

if (preg_match( '/^\W*(?:\w+\W+){2}(?:de\W+)?\w+/iu', $texto, $match)) {
    $resultado = $match[0];
}


Demo:

$pruebas = [
    "Chaleco azul con tejido algodón",
    "Lampara de lectura para la cama",
    "Suave oso de peluche de color morado",
    "zapatillas de fútbol para correr",
    "15 metros de lona para el restaurante",
    "Tarjeta memoria kodak 15 fotografías HD"
];

foreach ($pruebas as &$texto) {
    if (preg_match( '/^\W*(?:\w+\W+){2}(?:de\W+)?\w+/iu', $texto, $match)) {
        $resultado =  $match[0];

        //imprimimos
        echo $texto . "\t\t=>\t" . $resultado . "\n";
    }
}

Result:

Chaleco azul con tejido algodón         =>  Chaleco azul con
Lampara de lectura para la cama         =>  Lampara de lectura
Suave oso de peluche de color morado    =>  Suave oso de peluche
zapatillas de fútbol para correr        =>  zapatillas de fútbol
15 metros de lona para el restaurante   =>  15 metros de lona
Tarjeta memoria kodak 15 fotografías HD =>  Tarjeta memoria kodak
    
answered by 16.02.2018 в 01:46
1

You can use the PHP explode function as follows:

<?php
    $texto = "Chaleco azul con tejido algodón";
    $palabras = explode(" ", $texto);
?>

This way you have an array in $ words that contains the words of the phrase, then you can use in_array to find the ones you want. To catch the first 3 would be:

<?php
    if(count($palabras) >= 3)
        for($i = 0; $i <= 2; $i++)
            echo $palabra[$i];
?>

But as far as you know, I think what you want is to analyze the phrase and keep the subject's part. In which case, it is too complex to answer here. Maybe with AI techniques, neural networks and such ...

    
answered by 15.02.2018 в 13:37
1

Another possibility would be to use regular expressions. The idea would be to obtain the substring composed of palabra1-espacio-palabra2-espacio-palabra3 , where palabra3 can not be "from", and if it is "from", then the next space and the next word should be taken.

The regular expression for that would be:

  

Note: with this regular expression we would be looking not only for blank spaces but also for any type of separator (for example, tabulators would also be valid).

/^((?:\S+\s+){2}((de\s\S+)|(\S+))).*/

What it does as described above: take the first two words ( (?:\S+\s+){2} ) and one of these two: "de" followed by a separator and another word ( (de\s\S+) ), or any other word ( (\S+) ) until the next separator.

Then in PHP you could run that regular expression and return the substring that complies with it and, if none is found, return the entire string (because it would mean that it has less than 3 words). The code would be something like this:

<?php

$frase1 = "zapatillas de fútbol para correr";
$frase2 = "15 metros de lona para el restaurante";
$frase3 = "Tarjeta memoria kodak 15 fotografías HD";
$frase4 = "Hola Caracola";

function cortarFrase($frase) {

    preg_match('/^((?:\S+\s+){2}((de\s\S+)|(\S+))).*/', $frase, $matches);
    if (isset($matches[1])) { return $matches[1]; }

    return $frase;
}

echo cortarFrase($frase1) . "\n";
echo cortarFrase($frase2) . "\n";
echo cortarFrase($frase3) . "\n";
echo cortarFrase($frase4) . "\n";

And the result you get would be this:

  

soccer shoes
  15 meters of canvas
  Kodak memory card
  Hello Caracola

    
answered by 15.02.2018 в 23:41