Dot Net Perls
C#

Unsafe Optimization for Parsing int

by Sam Allen

Problem

Use the unsafe keyword to optimize an integer parsing routine. Benchmark the improvement, and examine properties of unsafe code in C#. Carefully compare the unsafe code to the managed framework code, and draw conclusions about when popular programs use unsafe, and why they do.

Solution: C#

Safety is critical in life, but sometimes we can be daredevils and use C#'s unsafe keyword to greatly optimize code. The situation we examine here is that of integer (int & ushort) parsing in C#. These two methods are static ones we can call to convert strings into integers. Here is how they are used.

{
    // Take the string |line| and convert it to a number with the framework.
    int number = int.Parse(line);

    // Take the string |singleNums[v]| and convert it to an unsigned short.
    ushort number = ushort.Parse(singleNums[v]);

    // Shows the ParseUnsafe method from below
    int number = ParseUnsafe(line);
    ushort number = (ushort)ParseUnsafe(singleNumbers[v]);
}

Why should we rewrite int.Parse?

Its performance may be non-optimal. Think of how many different things is must deal with: negative numbers, decimals, null characters, letters, linebreaks, spaces, colons and different locales. What if we just need to parse a series of characters and store it in an int or ushort? We can probably optimize it greatly.

How do we use unsafe?

First, go to Project > Properties... in Visual Studio and select "Allow unsafe code". unsafe is a keyword that allows us to use pointers like we can in C++ and C. In popular desktop programs like Paint.NET, unsafe is used frequently on operations that involve single bits or chars. Here is a parse method that uses unsafe pointers.

/// <summary>
/// Convert a series of characters to a number using pointers.
/// </summary>
/// <param name="value">String that must contain only digits.</param>
/// <returns>Integer represented by the input string.</returns>
unsafe static int ParseUnsafe(string value)
{
    int result = 0;
    fixed (char* valuePointer = value)
    {
        char* str = valuePointer;
        while (*str != '\0')
        {
            result = 10 * result + (*str - 48);
            str++;
        }
    }
    return result;
}
  1. Method receives a string
    ParseUnsafe above receives a C# string and returns a C# integer that is represented by the string. It has useful XML comments, and is reliable.
  2. Uses fixed syntax
    Look at the keyword fixed used above. It is essential for telling the framework not to destroy the memory when the unsafe code runs. You must use this syntax, and you should assign a char* pointer to the string.
  3. Assigns another pointer
    You must assign a new char* to the fixed char pointer. This is because we can't change the first char*. (I am not really clear on all the details here.)
  4. Increments pointer
    We increment the |str| pointer and walk through each character.
  5. Converts ASCII to int
    The line with |result| being changed simply uses a classic C operation to convert a character like '1' to the integer 1. ASCII characters are stored in different positions than the numbers equivalent to them, but in the same order.

What does the multiplication do?

Here is what happens: As we read the string from the left to the right, the current digit is one tenth of the previous one. Therefore we can multiply the previous number by 10 to satisfy this condition. This is classic C and blazingly fast. The 48 I use simply shifts ASCII digits to ints.

Is it faster?

Yes, and by quite a lot. For a couple programs I developed, I was parsing about 175,000 integers (and also unsigned short ints). This didn't take a long time, but imagine if you parsed even more or were running on a different device like a phone. I compare the int.Parse method and the ParseUnsafe method in my program here at startup.

Version Time required for startup (in ms)
Average of 100 runs
int.Parse & ushort.Parse 89.85
ParseUnsafe 56.78

There are drawbacks

You sacrifice the benefits of managed code and any error-checking whatsoever. This erases a lot of the point of .NET. However, in my programs, my requirements were so narrow that this was appropriate. The data was static and would never change. Saving a large percentage of startup time was worth it.

I hate C!

Many .NET developers do, and I have had my moments. However, knowing C is important, and being able to deal with characters and pointers is a great skill. As I said, popular desktop programs like Paint.NET use C-style pointers in C# extensively. You get excellent performance. Isolate your unsafe code and be careful.

Conclusion

Unsafe code can greatly speed up and even simplify your logic. It has many risks and should only be used by developers who grasp C and C++. As an experienced programmer, you should know how to use pointers, and that knowledge is relevant even with C# .NET. Parse integers with the method here for a big performance improvement.

© 2008 Sam Allen. All rights reserved.

Ads by The Lounge