Virtual functions come with a significant performance penalty. In order to overcome this, we map generics to templates. The generated source code can then be inlined, and the flexibility of objects can still be used with performance.
Expm1 |
GFLOPS |
GCFLOPS |
usage |
Local |
975 |
538 |
92% |
Dispatch |
478 |
263 |
45% |
peak |
1174 |
587 |
– |
Template concepts in C++ are not expressed, as compiler tells whether the type is compliant or not. In dot net, the concept is expressed by constraints on the generic type. The following example illustrated.
[HybridTemplateConcept]
public interface IMyArray {
double this[int index] { get; set; }
}
[HybridRegisterTemplate(Specialize=typeof(MyAlgorithm<MyArray>))]
public struct MyArray : IMyArray
{
double[] _data;
[Kernel] public double this[int index] {
get { return _data[index]; }
set { _data[index] = value; }
}
}
public class MyAlgorithm<T> where T : struct, IMyArray
{
T a, b;
[Kernel] public void Add(int n) {
for (int k = threadIdx.x + blockDim.x * blockIdx.x;
k < n; k += blockDim.x * gridDim.x)
a[k] += b[k];
}
}
Using this approach, we restore performances at a level very similar to performances we obtain without any polymorphism.
Expm1 |
GFLOPS |
GCFLOPS |
usage |
Local |
975 |
538 |
92% |
Dispatch |
478 |
263 |
45% |
Generics |
985 |
544 |
93% |
peak |
1174 |
587 |
– |