How To Get Html Page Source using C#

When working with web scraping or content analysis, it’s essential to fetch the HTML content of a webpage. In this blog post, we’ll explore a simple C# program that demonstrates how to retrieve HTML from a specified URL and convert a hostname to its associated IP addresses.

1. Fetching HTML Content

The first part of our program focuses on fetching HTML content from a given URL. We use the HttpWebRequest and HttpWebResponse classes to send an HTTP request and receive the server’s response. The HTML content is then extracted from the response stream and displayed.

static string GetHtmlFromUrl(string url)
{
    // Check if the URL is provided
    if (string.IsNullOrEmpty(url))
        throw new ArgumentNullException("url", "Parameter is null or empty");

    // Initialize HTML content
    string html = "";

    try
    {
        // Generate and configure the HTTP request
        HttpWebRequest request = GenerateHttpWebRequest(url);

        // Get the response from the server
        using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
        {
            // Get the response stream
            using (Stream responseStream = response.GetResponseStream())
            {
                // Read the HTML content from the response stream
                using (StreamReader reader = new StreamReader(responseStream, Encoding.UTF8))
                {
                    html = reader.ReadToEnd();
                }
            }
        }
    }
    catch (Exception ex)
    {
        // Handle exceptions and provide error messages
        html = $"Error retrieving HTML content: {ex.Message}";
    }

    return html;
}

2. Converting Hostname to IP Addresses

The second part of our program demonstrates how to convert a hostname to its associated IP addresses using the Dns.GetHostEntry method. This function returns an IPHostEntry object containing a list of IP addresses for the given hostname.

static string HostNameToIP(string hostname)
{
    try
    {
        // Resolve the hostname into an IPHostEntry using the Dns class
        IPHostEntry iphost = System.Net.Dns.GetHostEntry(hostname);

        // Get all possible IP addresses for this hostname
        IPAddress[] addresses = iphost.AddressList;

        // Build a text representation of the IP addresses
        StringBuilder addressList = new StringBuilder();

        // Iterate through each IP address
        foreach (IPAddress address in addresses)
        {
            addressList.AppendFormat("IP Address: {0};", address.ToString());
        }

        return addressList.ToString();
    }
    catch (Exception ex)
    {
        // Handle exceptions and provide error messages
        return $"Error resolving hostname to IP: {ex.Message}";
    }
}

3. Putting It All Together

In the Main method, we showcase how to use these functions by fetching the HTML content from “http://www.google.com” and converting the hostname to IP addresses.

static void Main(string[] args)
{
    string targetUrl = "http://www.google.com";

    // Get and print the HTML content from the specified URL
    string htmlContent = GetHtmlFromUrl(targetUrl);
    Console.WriteLine(htmlContent);

    // Convert the hostname to IP addresses and print them
    string ipAddresses = HostNameToIP(targetUrl);
    Console.WriteLine(ipAddresses);

    Console.Read();
}

This simple C# program provides a foundation for more advanced web scraping and content analysis tasks. Feel free to customize and expand upon it based on your specific requirements.

Post a Comment

Please do not post any spam link in the comment box😊

Previous Post Next Post

Blog ads

CodeGuru