Often VB.NET programs must perform some tasks like simple processing of HTML to find paragraphs. This can be done with regular expressions in certain cases.
With a regular expression, we can extract text from between paragraph "P" tags. And by using the Groups property, we can get this value as a String
.
As we begin, please notice that we import the System.Text.RegularExpressions
namespace with the Imports keyword. This makes the program compile correctly.
string
and pass it to GetFirstParagraph
. In real programs, we might read in a file with File.ReadAllText
.GetFirstParagraph
, we have some complex regular expression logic. We specify some Kleene closures to access data within paragraph tags.Imports System.Text.RegularExpressions Module Module1 Sub Main() Dim html as String = "<html><title>...</title><body><p>Result.</p></body></html>" Console.WriteLine(GetFirstParagraph(html)) End Sub Function GetFirstParagraph(value as String) ' Use regular expression to match a paragraph. Dim match as Match = Regex.Match(value, "<p>\s*(.+?)\s*</p>") If match.Success Return match.Groups(1).Value Else Return "" End If End Function End ModuleResult.
When using the Groups property on a Match
result from Regex.Match
, it is important to access element 1 for the first group. The collection is one-based, not zero-based.
Accessing inner text values from within HTML strings can be difficult. And regular expressions are not the best solution in all cases, but they can work on simple HTML pages.