SIMD. Modern processors have special instructions that can act upon more than one unit of data at once. These are SIMD features, and we can access them in .NET with System.Numerics.
By placing data in types like Vector2, Vector3, and Vector4, we can access SIMD in a reliable way. Vector2 stores 2 values, and Vector3 and Vector4 store 3 and 4.
Example. To begin, we need to include the System.Numerics namespace at the top of the program. Then we can easily access Vector2, Vector3 and Vector4.
Part 1 The key to using types like Vector2 is that we can use operators like multiply on entire vectors at once.
Part 2 We can use static methods like Vector3.Abs() to convert each element in a vector into its absolute value.
Part 3 We use Vector4 with multiply and addition. Each operation that acts upon 2 Vector4 instances can be replaced with a SIMD instruction.
Part 4 To access individual elements within the vector, we can access them by their index, like using an array or List.
using System;
using System.Numerics;
// Part 1: create a Vector2 and multiply it by itself.
var v0 = new Vector2(1, 2);
var result0 = v0 * v0;
Console.WriteLine(result0);
// Part 2: create a Vector3 instances and use Abs on it.
var v1 = new Vector3(1, -2, 3);
var result1 = Vector3.Abs(v1);
Console.WriteLine(result1);
// Part 3: create 2 Vector4 instances, then multiply and add another Vector4.
var v3 = new Vector4(10, 20, 30, 40);
var v4 = new Vector4(2, 3, 4, 5);
var result2 = v3 * v4;
result2 += new Vector4(1, 1, 1, 1);
// Part 4: sum up all the result ints and print the total.
var sum = result2[0] + result2[1] + result2[2] + result2[3];
Console.WriteLine(sum);<1, 4>
<1, 2, 3>
404
Benchmark. Does the Vector4 type actually improve performance? This benchmark is designed to test Vector4 and SIMD-optimized code against array-based code that uses for-loops.
Version 1 The SIMD code uses 3 Vector4 instances and multiplies and adds them. It then sums up individual elements.
Version 2 The array-based, looping code reuses 3 arrays—this is so we do not count the time required to perform garbage collection.
Result Each version of the code comes up with the same sum, but the Vector4-using code is about 3 times faster.
#define VERSION1
using System;
using System.Diagnostics;
using System.Numerics;
class Program
{
public static void Main()
{
// For the array version.
var array0 = new int[]{ 10, 20, 30, 40 };
var array1 = new int[]{ 2, 3, 4, 5 };
var arrayResult = new int[4];
const int _max = 10000000;
var s1 = Stopwatch.StartNew();
for (int i = 0; i < _max; i++)
{
#if VERSION1// Version 1: uses vectors.
var v0 = new Vector4(10, 20, 30, 40);
var v1 = new Vector4(2, 3, 4, 5);
var result = v0 * v1;
result += new Vector4(1, 1, 1, 1);
var sum = result[0] + result[1] + result[2] + result[3];
if (sum != 404)
{
throw new Exception();
}
#endif#if VERSION2// Version 2: uses arrays.
Array.Clear(arrayResult);
for (int x = 0; x < array0.Length; x++)
{
arrayResult[x] = array0[x] * array1[x];
}
for (int x = 0; x < arrayResult.Length; x++)
{
arrayResult[x] += 1;
}
var sum = arrayResult[0] + arrayResult[1] + arrayResult[2] + arrayResult[3];
if (sum != 404)
{
throw new Exception();
}
#endif
}
s1.Stop();
Console.WriteLine(((double)(s1.Elapsed.TotalMilliseconds * 1000000) / _max).ToString("0.00 ns"));
}
} 5.28 ns Vector4
16.33 ns int[], for
For a program that performs many numeric computations, the System.Numerics namespace and the Vector2, Vector3 and Vector4 types are a valuable optimization. Results will vary based on the CPU.
Dot Net Perls is a collection of tested code examples. Pages are continually updated to stay current, with code correctness a top priority.
Sam Allen is passionate about computer languages. In the past, his work has been recommended by Apple and Microsoft and he has studied computers at a selective university in the United States.