You need to convert plain text to an HTML-encoded version. You can take text from XML or the user online and print it out in an HTML file. If you don't do this, you may have security problems and bugs. Here we look at one simple and easy way you can encode HTML in ASP.NET and the C# programming language.
Input: Not HTML encoded Contents: You & me > them Output: Is HTML encoded Contents: You & me > them
First, there are really good methods built into the .NET framework for us. You are likely using ASP.NET, but even if not you can use these methods. You could develop your own, but that has some pitfalls. Here we look at some example C# code.
=== ASPX code-behind file that encodes HTML (C#) ===
using System;
using System.IO;
using System.Web;
using System.Web.UI;
public partial class _Default : Page
{
protected void Page_Load(object sender, EventArgs e)
{
// This could mess up HTML.
string text = "you & me > them"; // 1
// Replace > with >
string htmlEncoded = Server.HtmlEncode(text); // 2
// Now has the > again.
string original = Server.HtmlDecode(htmlEncoded); // 3
// This is how you can access the Server in any class.
string alsoEncoded = HttpContext.Current.Server.HtmlEncode(text); // 4
StringWriter stringWriter = new StringWriter();
using (HtmlTextWriter writer = new HtmlTextWriter(stringWriter))
{
// Write a DIV with encoded text.
writer.RenderBeginTag(HtmlTextWriterTag.Div);
writer.WriteEncodedText(text);
writer.RenderEndTag();
}
string html = stringWriter.ToString(); // 5
}
}
=== Notes on the code ===
Step 1: Before encoding has occurred.
String: you & me > them
Step 2: The string is encoded for HTML.
String: you & me > them
Step 3: String is converted back from HTML.
String: you & me > them
Step 4: The string is encoded for HTML again.
String: you & me > them
Step 5: The HTML string is written into a DIV.
Text: <div>you & me > them</div>Description. In the above code example, you will see three different methods. The first two just return an encoded or decoded string, and the HtmlTextWriter uses an interesting method called WriteEncodedText. This has the interesting potential of being more efficient, as it could avoid a string copy. I tested these methods with breakpoints.
In my brief benchmarks, I found Server.HtmlEncode and Server.HtmlDecode to be much faster than my home-grown version that used StringBuilder. So unless you want to put lots of effort into a better implementation, it is best to use these framework methods.
Here we note that the HttpUtility class in System.Web is actually a better way to encode HTML and URLs in programs written in the C# language. You will want to call HttpUtility.HtmlDecode and HttpUtility.HtmlEncode on your strings. This site has a detailed example of these methods.
(See HttpUtility.HtmlEncode Methods.)
Here we saw ways you can encode HTML strings in C# using the HtmlEncode method. Always encode and decode your strings for displaying in a web page in HTML. Security nightmares and injection attacks are possible otherwise. Use the ASP.NET methods shown here for a fast and reliable approach.