How to get data from a scraping page

0

I'm scraping a web page using C # and an HtmlAgilityPack framework

I have done well, the problem is when I want to get data from a link%% of%

I do not know how to scrap a link in any idea using HtmlAgilityPack

Code

private void btnBuscar_Click(object sender, EventArgs e)
    {
        var web = new HtmlAgilityPack.HtmlWeb();

        var doc =
            web.Load(
                "https://dxxxxxxxxxxxxxxxxxx/facturacion-internet/consultas/publico/xxx-datos2.jspa?ruc=" +
                txtRuc.Text);

        var razonSocialNodes = doc.DocumentNode.SelectNodes("//*[@id=\"contenido\"]/form/table//td");
        var innerTexts = razonSocialNodes.Select(node => node.InnerText).ToArray();

        txtRazonSocial.Text = innerTexts[0];
        txtNombreComercial.Text = innerTexts[5];
        txtEstado.Text = innerTexts[7];
        txtClaseContribuyente.Text = innerTexts[9];
        txtTipoContribuyente.Text = innerTexts[11];
        txtObligadoContabilidad.Text = innerTexts[13];
        txtActividadEconomica.Text = innerTexts[15];
        txtFechaInicio.Text = innerTexts[17];

        //Aca obtengo los hrefs
        var hrefs = doc.DocumentNode.Descendants("a").Select(node => node.GetAttributeValue("href", "")).ToArray();
        // Aca obtengo el href que necesito lo obtengo por posicion pero seria mejor
        // ponerle el nombre de la pagina: ruc-establec.jspa
        var localizar = hrefs[3]; 
    }

Here is where the a href is

In this image I need to navigate to where the address is and take it.

It seemed to me that with this code you could do

static void Main()
{
    var doc = new HtmlDocument();
    doc.Load("test.html");
    var anchor = doc.DocumentNode.SelectSingleNode("//a[contains(@href, 'url-a')]");
    if (anchor != null)
    {
        var id = anchor.ParentNode.SelectSingleNode("following-sibling::td/a");
        if (id != null)
        {
            Console.WriteLine(id.InnerHtml);
            var img = id.ParentNode.SelectSingleNode("following-sibling::td/a");
            if (img != null)
            {
                Console.WriteLine(img.InnerHtml);
            }
        }
    }
}
    
asked by Pedro Ávila 02.01.2019 в 01:35
source

0 answers