I have done this program to identify proper names in a text.
This is my text:
Vine con Pablo a la casa.
Pedro me lo dijo.
Fui con Mariano García a la cena.
Cristina Maña no come.
No me cuentes con el AGG.
Ay que ver con Ana García Villa.
Soraya Puerto de Santamaría no es Ok.
At the moment, I just want to take my own names within the sentence.
My code is this:
#!/usr/bin/perl
use warnings;
$texto = "Corpus.txt";
open(INFILE, "<", $texto) or die "Can't open < input.txt: $!";
while (my $row = <INFILE>)
{
#chomp $row;
push @array, $row;
#print "$row\n";
}
foreach $linea (@array) {
# Una NE unitoken dentro de la oración. Ejemplo: Vine con Pablo a la casa.
$linea =~ m/\s([A-Z][a-z]+)\s/;
$pablo = $1;
print("$pablo\n");
#print $l;
}
What I do not understand is why when I print $ pablo, it returns this result:
Pablo
Pablo
Mariano
Mariano
Mariano
Ana
Puerto
Puerto
I do not understand it. Why do you evaluate the first line more than once, and yet line 6, where Ana's name is, only takes it out once?
Obviously, I've only been learning to program for a few weeks. And the program is doing something that is not what I think it should do. Let's see if someone can tell me where the "fundamental error" is.
Thank you very much.