I'm scraping a web page using C # and an HtmlAgilityPack framework
I have done well, the problem is when I want to get data from a link%% of%
I do not know how to scrap a link in any idea using HtmlAgilityPack
Code
private void btnBuscar_Click(object sender, EventArgs e)
{
var web = new HtmlAgilityPack.HtmlWeb();
var doc =
web.Load(
"https://dxxxxxxxxxxxxxxxxxx/facturacion-internet/consultas/publico/xxx-datos2.jspa?ruc=" +
txtRuc.Text);
var razonSocialNodes = doc.DocumentNode.SelectNodes("//*[@id=\"contenido\"]/form/table//td");
var innerTexts = razonSocialNodes.Select(node => node.InnerText).ToArray();
txtRazonSocial.Text = innerTexts[0];
txtNombreComercial.Text = innerTexts[5];
txtEstado.Text = innerTexts[7];
txtClaseContribuyente.Text = innerTexts[9];
txtTipoContribuyente.Text = innerTexts[11];
txtObligadoContabilidad.Text = innerTexts[13];
txtActividadEconomica.Text = innerTexts[15];
txtFechaInicio.Text = innerTexts[17];
//Aca obtengo los hrefs
var hrefs = doc.DocumentNode.Descendants("a").Select(node => node.GetAttributeValue("href", "")).ToArray();
// Aca obtengo el href que necesito lo obtengo por posicion pero seria mejor
// ponerle el nombre de la pagina: ruc-establec.jspa
var localizar = hrefs[3];
}
Here is where the a href is
In this image I need to navigate to where the address is and take it.
It seemed to me that with this code you could do
static void Main()
{
var doc = new HtmlDocument();
doc.Load("test.html");
var anchor = doc.DocumentNode.SelectSingleNode("//a[contains(@href, 'url-a')]");
if (anchor != null)
{
var id = anchor.ParentNode.SelectSingleNode("following-sibling::td/a");
if (id != null)
{
Console.WriteLine(id.InnerHtml);
var img = id.ParentNode.SelectSingleNode("following-sibling::td/a");
if (img != null)
{
Console.WriteLine(img.InnerHtml);
}
}
}
}