Remove
HTMLOften we encounter Strings that contains HTML markup. It is possible to remove this markup with a custom VB.NET Function.
We develop a custom Function based on the Regex
type. It uses a regular expression to strip HTML markup tags—this works on many source strings.
To begin, this program introduces the StripTags
Function, which performs the HTML removal. This calls the Regex.Replace
function.
StripTags()
all text matching a tag start character and ending with a tag end character is replaced with an empty string
.Main()
we declare a String
literal that contains HTML markup. Next, the StripTags
function is invoked with that String
as the argument.string
has no HTML markup remaining by printing it to the Console
.Imports System.Text.RegularExpressions Module Module1 Sub Main() ' Input. Dim html As String = "<p>There was a <b>.NET</b> programmer " + "and he stripped the <i>HTML</i> tags.</p>" ' Call Function. Dim res As String = StripTags(html) ' Write. Console.WriteLine(res) End Sub ''' <summary> ''' Strip HTML tags. ''' </summary> Function StripTags(ByVal html As String) As String ' Remove HTML tags. Return Regex.Replace(html, "<.*?>", "") End Function End ModuleThere was a .NET programmer and he stripped the HTML tags.
If you have HTML markup that is malformed in any way, or has comments, this method will not work. You may wish to first validate the markup.
The easiest way to strip HTML tags is to use the Regex
type. Other methods that scan the String
and use Char
arrays are more efficient, but will also be more complicated.