Learn From Proficient Asp.Net Web Developers To Extract Values From Html

This article is written and published by proficient asp.net web developers of a renowned company to help other people in the community in extracting values from HTML using HTMLAgilitypack. You will learn the trick step-by-step via this post. Do read every point carefully to understand the way precisely.

This article shows how to extract values from HTML Using HTML Agility   pack using C#.net.

Create a simple Windows application to give you example of how we can extract values from HTML using Agility pack.


To begin with, create one Application; add references for required dll in Bin Folder like HtmlAgilityPack.dll.You can get dll from here so you have to just reference that in Application.

So, your Application shows something like this.



Now, we move to Create form and we will name it as “URL”.


Now, after View Code and we move to Create url.cs page. Add namespace for required page (using HtmlAgilityPack).

 using System;  
 using System.Collections.Generic;  
 using System.ComponentModel;  
 using System.Data;  
 using System.Data.SqlClient;  
 using System.Drawing;  
 using System.Linq;  
 using System.Net;  
 using System.Text;  
 using System.Threading.Tasks;  
 using System.Windows.Forms;  
 using HtmlAgilityPack;  
 namespace ExtractValuesHTMLAgilitypack  
 {  
   public partial class URLPage : Form  
   {  
     public URLPage()  
     {  
       InitializeComponent();  
     }  
    }  
 }  

So, you’re HTML Design shows something like this.



I just started using HTML Agility pack with C#.net, and I have an html page that contains the following snippet.

 <table class="tableFile" summary="Document Format Files">  
         <tr>  
           <th scope="col" style="width: 5%;">  
           <acronym title="Sequence Number">Seq</acronym></th>  
           <th scope="col" style="width: 40%;">Description</th>  
           <th scope="col" style="width: 20%;">Document</th>  
           <th scope="col" style="width: 10%;">Type</th>  
           <th scope="col">Size</th>  
         </tr>  
         <tr>  
           <td scope="row">1</td>  
           <td scope="row">DEF 14A</td>  
           <td scope="row">  
           <a href="/a2226041zdef14a.htm">a2226041zdef14a.htm</a></td>  
           <td scope="row">DEF 14A</td>  
           <td scope="row">562684</td>  
         </tr>  
         <tr class="blueRow">  
           <td scope="row">2</td>  
           <td scope="row">G191503.JPG</td>  
           <td scope="row"><a href/g191503.jpg">g191503.jpg</a></td>  
           <td scope="row">GRAPHIC</td>  
           <td scope="row">6864</td>  
         </tr>  
         <tr>  
           <td scope="row">3</td>  
           <td scope="row">G441495.JPG</td>  
           <td scope="row"><a href/g441495.jpg">g441495.jpg</a></td>  
           <td scope="row">GRAPHIC</td>  
           <td scope="row">5702</td>  
         </tr>  
         <tr class="blueRow">  
           <td scope="row">4</td>  
           <td scope="row">G198472.JPG</td>  
           <td scope="row"><a href/g198472.jpg">g198472.jpg</a></td>  
           <td scope="row">GRAPHIC</td>  
           <td scope="row">4863</td>  
         </tr>  
         <tr>  
           <td scope="row">5</td>  
           <td scope="row">G198572BEI001.GIF</td>  
           <td scope="row">  
           <a href="/g198572bei001.gif">g198572bei001.gif</a></td>  
           <td scope="row">GRAPHIC</td>  
           <td scope="row">72642</td>  
         </tr>  
         <tr class="blueRow">  
           <td scope="row">6</td>  
           <td scope="row">G198572BEI002.GIF</td>  
           <td scope="row">  
           <a href="/g198572bei002.gif">g198572bei002.gif</a></td>  
           <td scope="row">GRAPHIC</td>  
           <td scope="row">53233</td>  
         </tr>  
         <tr>  
           <td scope="row">&nbsp;</td>  
           <td scope="row">Complete submission text file</td>  
           <td scope="row">  
           <a href="/0001047469-15-007604.txt">0001047469-15-007604.txt</a></td>  
           <td scope="row">&nbsp;</td>  
           <td scope="row">761602</td>  
         </tr>  
    </table>  

Once done, you can use it to parse and extract data from html page.

      To find a specific Here, the path syntax would be //table [@class='tableFile']", meaning: "get any A tag that has an HREF attribute equal to 'your url'. Add URL name and class name, shows something like this.

  public partial class URLPage : Form  
   {  
     public URLPage()  
     {  
       InitializeComponent();  
       WebClient webClient = new WebClient();  
       string page = webClient.DownloadString("URLNAME");  
       HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();  
       doc.LoadHtml(page);  
       List<List<string>> table =  doc.DocumentNode.SelectSingleNode("//table[@class='tableFile']")  
             .Descendants("tr")  
             .Skip(1)  
             .Where(tr => tr.Elements("td").Count() > 1)  
             .Select(tr => tr.Elements("td").Select(td => td.InnerHtml.Trim()).ToList())  
             .ToList();  
          }  
        }  

Then you can use it download page, shows something like this.

 namespace ExtractValuesHTMLAgilitypack  
 {  
   public partial class URLPage : Form  
   {  
     public URLPage()  
     {  
       InitializeComponent();  
       WebClient webClient = new WebClient();  
       string page = webClient.DownloadString("URLNAME");  
       HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();  
       doc.LoadHtml(page);  
       List<List<string>> table = doc.DocumentNode.SelectSingleNode("//table[@class='tableFile']")  
             .Descendants("tr")  
             .Skip(1)  
             .Where(tr => tr.Elements("td").Count() > 1)  
             .Select(tr => tr.Elements("td").Select(td => td.InnerHtml.Trim()).ToList())  
             .ToList();  
       foreach (var value in table)  
       {  
         try  
         {  
         }  
         catch (Exception ee)  
         {  
           Console.Write("Exception");  
         }  
       }  
     }  
   }  
 }  

Now, we move to html table insert into Database table that is ours .cs coding part, this would return you an enumerable object which you could loop around and insert values of the html into the database. Show something like this.

 namespace ExtractValuesHTMLAgilitypack  
 {  
   public partial class URLPage : Form  
   {  
     public URLPage()  
     {  
       InitializeComponent();  
       WebClient webClient = new WebClient();  
       string page = webClient.DownloadString("URLNAME");  
       HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();  
       doc.LoadHtml(page);  
       List<List<string>> table = doc.DocumentNode.SelectSingleNode("//table[@class='tableFile']")  
             .Descendants("tr")  
             .Skip(1)  
             .Where(tr => tr.Elements("td").Count() > 1)  
             .Select(tr => tr.Elements("td").Select(td => td.InnerHtml.Trim()).ToList())  
             .ToList();  
       foreach (var value in table)  
       {  
         try  
         {  
           SqlConnection conn = new SqlConnection(@"Data Source=SourceName;initial catalog=Test;persist security info=True;user id=sa;password=11@23;");  
           conn.Open();  
           SqlCommand cmd = new SqlCommand("INSERT INTO HtmlvalueGet(Seq,Description,[Document],Type,Size,Datetime) VALUES('" + value[0] + "','" + value[1] + "','" + value[2] + "','" + value[3] + "','" + value[4] + "','" + DateTime.Now + "')", conn);  
           cmd.ExecuteNonQuery();  
           conn.Close();  
           //}  
         }  
         catch (Exception ee)  
         {  
           Console.Write("Exception");  
         }  
       }  
     }  
   }  
 }  

After completing Html script table insert into database table. The result will display something like this image.



Our Asp.net web developers team have shared this method for NET community, that will help them in extracting values from HTML using HTMLAgilitypack. You can follow every step discussed in the post to avail correct results. Don’t forget to review the article. We also welcome queries related to asp.net development.