Paragraph, HTML. Often VB.NET programs must perform some tasks like simple processing of HTML to find paragraphs. This can be done with regular expressions in certain cases.
With a regular expression, we can extract text from between paragraph "P" tags. And by using the Groups property, we can get this value as a String.
Example. As we begin, please notice that we import the System.Text.RegularExpressions namespace with the Imports keyword. This makes the program compile correctly.
Start We specify an HTML string and pass it to GetFirstParagraph. In real programs, we might read in a file with File.ReadAllText.
Next In GetFirstParagraph, we have some complex regular expression logic. We specify some Kleene closures to access data within paragraph tags.
Tip The star character, meaning zero or more repeats, is a Kleene closure and it matches whitespace and the inner value for us here.
Imports System.Text.RegularExpressions
Module Module1
Sub Main()
Dim html as String = "<html><title>...</title><body><p>Result.</p></body></html>"
Console.WriteLine(GetFirstParagraph(html))
End Sub
Function GetFirstParagraph(value as String)
' Use regular expression to match a paragraph.
Dim match as Match = Regex.Match(value, "<p>\s*(.+?)\s*</p>")
If match.Success
Return match.Groups(1).Value
Else
Return ""
End If
End Function
End ModuleResult.
Some notes. When using the Groups property on a Match result from Regex.Match, it is important to access element 1 for the first group. The collection is one-based, not zero-based.
Summary. Accessing inner text values from within HTML strings can be difficult. And regular expressions are not the best solution in all cases, but they can work on simple HTML pages.
Dot Net Perls is a collection of tested code examples. Pages are continually updated to stay current, with code correctness a top priority.
Sam Allen is passionate about computer languages. In the past, his work has been recommended by Apple and Microsoft and he has studied computers at a selective university in the United States.