Paragraph, HTML. HTML pages have paragraphs in them. In a C# program, we can match these paragraphs with Regex. This is useful for extracting summaries from many pages or articles.
C# method info. This simple method extracts and matches the first paragraph element in an HTML document. This function uses the regular expression library included in .NET.
Example. We scan an entire HTML file and extract text in between a paragraph opening tag and closing tag. You can put this method, GetFirstParagraph, in a utility class.
Step 1 We specify some HTML text as a string literal. Then we pass this string to the GetFirstParagraph method.
Step 2 GetFirstParagraph() uses the static Regex.Match method declared in the System.Text.RegularExpressions namespace.
Info The Regex looks for brackets with the letter "p" in between them. It then skips zero or more whitespace characters inside those tags.
Finally The Regex captures the minimum number of characters between the start tag and end tag. Both tags must be found for the match to proceed.
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
// Step 1: call method with html text.
string html = "<html><title>...</title><body><p>Result.</p></body></html>";
Console.WriteLine(GetFirstParagraph(html));
}
/// <summary>/// Get first paragraph between P tags./// </summary>
static string GetFirstParagraph(string file)
{
// Step 2: use Regex to match a paragraph.
Match match = Regex.Match(file, @"<p>\s*(.+?)\s*</p>");
if (match.Success)
{
return match.Groups[1].Value;
}
else
{
return "";
}
}
}Result.
A discussion. Understanding regular expressions can be difficult, but this one is fairly simple. The method is not flexible. It is hard to parse HTML correctly all the time without an HTML parser.
Summary. We looked at how you can match the paragraph element in your HTML files. This is useful code that I run several times a day, and it functions correctly.
Dot Net Perls is a collection of pages with code examples, which are updated to stay current. Programming is an art, and it can be learned from examples.
Donate to this site to help offset the costs of running the server. Sites like this will cease to exist if there is no financial support for them.
Sam Allen is passionate about computer languages, and he maintains 100% of the material available on this website. He hopes it makes the world a nicer place.