Normalize
The Normalize
method changes Unicode character sequences. A string
's buffer is represented in Unicode. Normalize
affects how the Unicode characters are ordered.
We explore how the representations of string
data change. This method is not need in many C# programs, but when it is needed, it is important.
This program introduces a string
with an accent on the lowercase a. We call Normalize
with no parameters, and then Normalize
with the parameters NormalizationForm.FormD
, FormKC
, and FormKD
.
Console.WriteLine
, the resulting strings to the screen as we go along.Normalize
uses the NormalizationForm.FormC
enum
in its implementation. This detail can be seen in IL Disassembler.FormD
and FormKD
, the single-quote character follows the accented letter.using System; using System.Text; const string input = "á"; string val2 = input.Normalize(); Console.WriteLine(val2); string val3 = input.Normalize(NormalizationForm.FormD); Console.WriteLine(val3); string val4 = input.Normalize(NormalizationForm.FormKC); Console.WriteLine(val4); string val5 = input.Normalize(NormalizationForm.FormKD); Console.WriteLine(val5);á a ' á a '
IsNormalized
In Unicode strings, there are different normalization forms. With the IsNormalized
method you can test for normalized character data.
string
that has an accent in it. With Normalize
and IsNormalized
, only non-ASCII characters are affected.IsNormalized
returns true if the string
is normalized to FormC
. It returns false if the form is FormD
.IsNormalized
. In this case, that specific normalization form is checked.using System; using System.Text; const string input = "á"; string val2 = input.Normalize(); string val3 = input.Normalize(NormalizationForm.FormD); Console.WriteLine(input.IsNormalized()); Console.WriteLine(val2.IsNormalized()); Console.WriteLine(val3.IsNormalized()); Console.WriteLine( val3.IsNormalized(NormalizationForm.FormD));True True False True
Mainly, the Normalize
method is useful for interoperability purposes. If you have to interact with another program that uses Unicode, it would be important to call Normalize
.
Normalize
if you are just using ASCII or if you are not interoperating with another Unicode form.IsNormalized
addresses the need to determine the normalization status of a string
. Normalization is necessary when interoperating with other systems.
IsNormalized
and just leave strings in their default normalization format.Normalize()
provides interoperation with other systems. It is not a commonly needed string
method. But it reveals an important detail of the string
implementation.