Remove HTML. Sometimes a string has HTML characters in it that are not desired—they may have been entered by mistake. It is straightforward to remove common HTML tags with a Rust function.
By checking each character for the angle brackets, we can detect when markup starts and ends. Then we can avoid adding those characters to a string copy.
To begin, we introduce a strip_html function that receives a str reference, and returns a new string. The result of this function is a string with no HTML markup.
Step 1 We use a for-loop over the chars in the string. Chars() returns an iterator of the individual characters.
Step 2 We detect the angle brackets and set a local flag variable to true or false based on whether we are inside a markup region.
Step 3 Here we reach characters that are not part of a markup region, so we add them to our resulting string.
Step 4 We return the string. This contains all characters in the source string excluding markup regions.
fn strip_html(source: &str) -> String {
let mut data = String::new();
let mut inside = false;
// Step 1: loop over string chars.
for c in source.chars() {
// Step 2: detect markup start and end, and skip over markup chars.
if c == '<' {
inside = true;
continue;
}
if c == '>' {
inside = false;
continue;
}
if !inside {
// Step 3: push other characters to the result string.
data.push(c);
}
}
// Step 4: return string.
return data;
}
fn main() {
// Use the strip html function to remove markup.
let input = "<p>Hello <b>world</b>!</p>";
let result = strip_html(input);
println!("{input}");
println!("{result}");
}<p>Hello <b>world</b>!</p>
Hello world!
Results. It is easy to determine that the function correctly removes simple HTML tags. A problem would be tags inside of HTML comments—a more complex function would be needed to support this.
Summary. It is possible to remove HTML tags from a string in Rust, and regular expressions are not needed. The code is small and easy to maintain.
Dot Net Perls is a collection of tested code examples. Pages are continually updated to stay current, with code correctness a top priority.
Sam Allen is passionate about computer languages. In the past, his work has been recommended by Apple and Microsoft and he has studied computers at a selective university in the United States.