String Remove HTML Tags
This page was last reviewed on Aug 27, 2023.
Dot Net Perls
Remove HTML. Sometimes a string has HTML characters in it that are not desired—they may have been entered by mistake. It is straightforward to remove common HTML tags with a Rust function.
By checking each character for the angle brackets, we can detect when markup starts and ends. Then we can avoid adding those characters to a string copy.
To begin, we introduce a strip_html function that receives a str reference, and returns a new string. The result of this function is a string with no HTML markup.
Step 1 We use a for-loop over the chars in the string. Chars() returns an iterator of the individual characters.
Loop, String Chars
Step 2 We detect the angle brackets and set a local flag variable to true or false based on whether we are inside a markup region.
Step 3 Here we reach characters that are not part of a markup region, so we add them to our resulting string.
Step 4 We return the string. This contains all characters in the source string excluding markup regions.
fn strip_html(source: &str) -> String { let mut data = String::new(); let mut inside = false; // Step 1: loop over string chars. for c in source.chars() { // Step 2: detect markup start and end, and skip over markup chars. if c == '<' { inside = true; continue; } if c == '>' { inside = false; continue; } if !inside { // Step 3: push other characters to the result string. data.push(c); } } // Step 4: return string. return data; } fn main() { // Use the strip html function to remove markup. let input = "<p>Hello <b>world</b>!</p>"; let result = strip_html(input); println!("{input}"); println!("{result}"); }
<p>Hello <b>world</b>!</p> Hello world!
Results. It is easy to determine that the function correctly removes simple HTML tags. A problem would be tags inside of HTML comments—a more complex function would be needed to support this.
Summary. It is possible to remove HTML tags from a string in Rust, and regular expressions are not needed. The code is small and easy to maintain.
Dot Net Perls is a collection of tested code examples. Pages are continually updated to stay current, with code correctness a top priority.
Sam Allen is passionate about computer languages. In the past, his work has been recommended by Apple and Microsoft and he has studied computers at a selective university in the United States.
This page was last updated on Aug 27, 2023 (new).
© 2007-2024 Sam Allen.