I am scraping a web whose information is in a table with the following structure.
<tbody>
<tr class='Leaguestitle'>
<td>...<\td>
<td>...<\td>
<\tr>
<tr id='tr1_abababa'>
<td>...<\td>
<td>...<\td>
<\tr>
<tr id='tr2_abababa'>..<\tr>
.
.
<tr id='tr1_acacaca'>..<\tr>
<tr id='tr2_acacaca'>..<\tr>
<tr align='center'>..<\tr>
.
.
<tr id='tr1_cbcbcbc'>..<\tr>
<tr id='tr2_cbcbcbc'>--<\tr>
<\tbody>
This structure is periodic. What interests me is the node with attribute class
that gives me a header, nodes with attribute id
that contain tr1 and the node with attribute align
that is the one that marks the end of the data that I they are interesting For this, I create a list with the 3 types of nodes doing this:
allrows = table.find_elements_by_xpath("//tr[@class='Leaguestitle' or contains(@id,'tr1') and not (@align='center')]")
My wish is to iterate the list, and depending on whether the node is of attribute class
that goes to a sublist, if it is of attribute id
that goes to another, and if it is the node with attribute align
finalize the program.
The problem is that the selected tr
nodes do not have the structure of the beginning, that is:
<tr id='tr1_abababa'>
<td>...<\td>
<td>...<\td>
<\tr>
If not:
<td>...<\td>
<td>...<\td>
So since the attribute id
of the node tr
is not present, neither the attribute class
or the align
it is impossible to address said node to one list or another.
How could you, in a pythonic way, make such a classification?