Console File.ReadAllText HTML Paragraph Regex

Title, HTML

HTML documents have title elements. The data in title elements is important, and it can be extracted with some C# code in many cases.

This simple method extracts the TITLE elements from HTML documents. It uses the Regex.Match method, and looks for specific strings in the HTML.

Example

We can extract the contents of the TITLE element from HTML. This is important for making sure your HTML is correct. After the code, we see the Regex parts in detail and more factors.

Note This console application first gets the first TITLE element from the HTML file.

Then The program prints the title to the console. The application must have the specified HTML file present in the current directory.

Tip The Regex looks for a start tag and an end tag. It ignores whitespace between the inner parts of the tags and the string.

using System;
using System.Text.RegularExpressions;

class Program
{
    static void Main()
    {
        // Get an HTML string.
        string html = "<html><title>Example.</title><body><p>...</p></body></html>";

        // Get the title of the HTML.
        Console.WriteLine(GetTitle(html));
    }

    /// <summary>
    /// Get title from an HTML string.
    /// </summary>
    static string GetTitle(string file)
    {
        Match match = Regex.Match(file, @"<title>\s*(.+?)\s*</title>");
        if (match.Success)
        {
            return match.Groups[1].Value;
        }
        else
        {
            return "";
        }
    }
}
Example.

Errors

This code is not flexible enough for some HTML documents. It won't work for complicated HTML, such as HTML that heavily uses attributes.

Also The logic assumes the HTML is lowercase, although this could be easily changed.

We can capture the contents of the TITLE and paragraph elements from HTML documents using the C# language. The regular expression can be hard to read, but it works in many situations.

Dot Net Perls is a collection of pages with code examples, which are updated to stay current. Programming is an art, and it can be learned from examples.

Donate to this site to help offset the costs of running the server. Sites like this will cease to exist if there is no financial support for them.

Sam Allen is passionate about computer languages, and he maintains 100% of the material available on this website. He hopes it makes the world a nicer place.

This page was last updated on Nov 7, 2023simplify.

Home

Changes