Modern processors have special instructions that can act upon more than one unit of data at once. These are SIMD features, and we can access them in .NET with System.Numerics
.
By placing data in types like Vector2
, Vector3
, and Vector4
, we can access SIMD in a reliable way. Vector2
stores 2 values, and Vector3
and Vector4
store 3 and 4.
To begin, we need to include the System.Numerics
namespace at the top of the program. Then we can easily access Vector2
, Vector3
and Vector4
.
Vector2
is that we can use operators like multiply on entire vectors at once.static
methods like Vector3.Abs()
to convert each element in a vector into its absolute value.Vector4
with multiply and addition. Each operation that acts upon 2 Vector4
instances can be replaced with a SIMD instruction.List
.using System; using System.Numerics; // Part 1: create a Vector2 and multiply it by itself. var v0 = new Vector2(1, 2); var result0 = v0 * v0; Console.WriteLine(result0); // Part 2: create a Vector3 instances and use Abs on it. var v1 = new Vector3(1, -2, 3); var result1 = Vector3.Abs(v1); Console.WriteLine(result1); // Part 3: create 2 Vector4 instances, then multiply and add another Vector4. var v3 = new Vector4(10, 20, 30, 40); var v4 = new Vector4(2, 3, 4, 5); var result2 = v3 * v4; result2 += new Vector4(1, 1, 1, 1); // Part 4: sum up all the result ints and print the total. var sum = result2[0] + result2[1] + result2[2] + result2[3]; Console.WriteLine(sum);<1, 4> <1, 2, 3> 404
Does the Vector4
type actually improve performance? This benchmark is designed to test Vector4
and SIMD-optimized code against array-based code that uses for
-loops.
Vector4
instances and multiplies and adds them. It then sums up individual elements.Vector4
-using code is about 3 times faster.#define VERSION1 using System; using System.Diagnostics; using System.Numerics; class Program { public static void Main() { // For the array version. var array0 = new int[]{ 10, 20, 30, 40 }; var array1 = new int[]{ 2, 3, 4, 5 }; var arrayResult = new int[4]; const int _max = 10000000; var s1 = Stopwatch.StartNew(); for (int i = 0; i < _max; i++) { #if VERSION1 // Version 1: uses vectors. var v0 = new Vector4(10, 20, 30, 40); var v1 = new Vector4(2, 3, 4, 5); var result = v0 * v1; result += new Vector4(1, 1, 1, 1); var sum = result[0] + result[1] + result[2] + result[3]; if (sum != 404) { throw new Exception(); } #endif #if VERSION2 // Version 2: uses arrays. Array.Clear(arrayResult); for (int x = 0; x < array0.Length; x++) { arrayResult[x] = array0[x] * array1[x]; } for (int x = 0; x < arrayResult.Length; x++) { arrayResult[x] += 1; } var sum = arrayResult[0] + arrayResult[1] + arrayResult[2] + arrayResult[3]; if (sum != 404) { throw new Exception(); } #endif } s1.Stop(); Console.WriteLine(((double)(s1.Elapsed.TotalMilliseconds * 1000000) / _max).ToString("0.00 ns")); } } 5.28 ns Vector4 16.33 ns int[], for
For a program that performs many numeric computations, the System.Numerics
namespace and the Vector2
, Vector3
and Vector4
types are a valuable optimization. Results will vary based on the CPU.