two important methods. Both methods work on trivial HTML sources. On comments, and unusual markup, they may (and often will) fail.
public class Program {
public static String stripHtmlRegex(String source) {
// Replace all tag characters with an empty string.
return source.replaceAll(
"<.*?>",
"");
}
public static String stripTagsCharArray(String source) {
// Create char array to store our result.
char[] array = new char[source.length()];
int arrayIndex = 0;
boolean inside = false;
// Loop over characters and append when not inside a tag.
for (int i = 0; i < source.length(); i++) {
char let = source.charAt(i);
if (let == '<') {
inside = true;
continue;
}
if (let == '>') {
inside = false;
continue;
}
if (!inside) {
array[arrayIndex] = let;
arrayIndex++;
}
}
// ... Return written data.
return new String(array, 0, arrayIndex);
}
public static void main(String[] args) {
final String html =
"<p id=x>Sometimes, <b>simpler</b> is better, "
+
"but <i>not</i> always.</p>";
System.out.println(html);
String test = stripHtmlRegex(html);
System.out.println(test);
String test2 = stripTagsCharArray(html);
System.out.println(test2);
}
}
<p id=x>Sometimes, <b>simpler</b> is better, but <i>not</i> always.</p>
Sometimes, simpler is better, but not always.
Sometimes, simpler is better, but not always.