Extract words from a pattern in c #

0

I was wondering how I could extract the words out there, having a pattern.

For example, given the following string, extract the word from inside:

Palabra 1: "Raqueta" end:
Palabra 2: "Motocicleta" end:

It's just a simple example, the real case would be something like this:

<span wordnr="1" class="">casa</span> 
<span wordnr="2" class="">mejor</span> 
<span wordnr="3" class="">todo</span>

Now, without doing webscraping, passing the text in a textbox, how could I get the words from within the string? (In c #)

In php, (I have not touched it for a long time), I know that the preg_match function existed and that with regular expressions, (in the case of curl and web scraping) with (. *) the value could be extracted.

However, in c # it only occurs to me to splite every time I meet:

<span wordnr="1" class="">

and

</span>

to make a list with the words. The problem is "1".

Any ideas? Thanks in advance.

    
asked by Georgia Fernández 17.12.2018 в 01:52
source

1 answer

1

For this case, using a Dom Parser would be a waste of resources, however you can extract the words from simple strings using regular expressions. For example between the characters "> and < /

// *** using System.Text.RegularExpressions; ***

string str = "<span wordnr=\"1\" class=\"\">casa</span>";
str += "<span wordnr = \"2\" class=\"\">mejor</span>";
str += "<span wordnr = \"3\" class=\"\">todo</span>";

Regex r = new Regex(@""">(.+?)\<\/"); // palabras entre "> y </
MatchCollection palabras = r.Matches(str);

foreach (Match p in palabras)
{
    Console.WriteLine(p.Groups[1].Value);
}

// casa
// mejor
// todo
    
answered by 17.12.2018 / 11:51
source