Remove
HTMLSuppose a string
in a Go program contains HTML markup, but we do not want to keep the markup. We can remove the tags with a for
-loop.
With a rune
slice, we can build up the runes for the result. We detect markup by testing each rune
in the original string
for angle brackets.
We introduce the stripHtml
function, which receives a string
and returns another string
. We call stripHtml
to test in the main()
function.
for-range
loop. This gives us each individual rune
in the string.rune
to the data rune
slice. At this point, we have skipped past runes including and surrounded by angle brackets.rune
slice back into a string
. This string
now contains all non-markup runes.package main import ( "fmt" ) func stripHtml(source string) string { data := []rune{} inside := false // Step 1: loop over string with range loop. for _, c := range source { if c == '<' { inside = true continue } if c == '>' { inside = false continue } // Step 2: append chars not inside markup tags starting and ending with brackets. if !inside { data = append(data, c) } } // Step 3: return string based on the rune slice. return string(data) } func main() { // Call the stripHtml function. input := "<p>Hello <b>world</b>!</p>" result := stripHtml(input) fmt.Println(input) fmt.Println(result) }<p>Hello <b>world</b>!</p> Hello world!
In the results, we can see that the "p" and "b" tags were removed from the markup. Note that this function will fail for HTML comments—a more powerful parser would be needed.
It is possible to use regular expressions to remove markup from strings, but this offers little advantage over a for
-loop. And it is usually slower.