Debugging and Profiling

If debug information is available in the input binary (java or dot net), it is re-inserted into the generated source code allowing users to benefit from debugging on the target environment. Illustrated here, a snapshot of a debugging session in NSIGHT Visual Studio edition.

It also integrates in profilers such as NSIGHT Visual Studio edition or Intel VTUNE Amplifier.

Hybridizer HOWTO — Generics

Virtual functions come with a significant performance penalty. In order to overcome this, we map generics to templates. The generated source code can then be inlined, and the flexibility of objects can still be used with performance.

Expm1 GFLOPS GCFLOPS usage
Local 975 538 92%
Dispatch 478 263 45%
peak 1174 587

Template concepts in C++ are not expressed, as compiler tells whether the type is compliant or not. In dot net, the concept is expressed by constraints on the generic type. The following example illustrated.


[HybridTemplateConcept]
public interface IMyArray {
    double this[int index] { get; set; }
}

[HybridRegisterTemplate(Specialize=typeof(MyAlgorithm<MyArray>))]
public struct MyArray : IMyArray
{
    double[] _data;
    [Kernel] public double this[int index] {
        get { return _data[index]; }
        set { _data[index] = value; }
    }
}

public class MyAlgorithm<T> where T : struct, IMyArray
{
    T a, b;
    [Kernel] public void Add(int n) {
        for (int k = threadIdx.x + blockDim.x * blockIdx.x;
            k < n; k += blockDim.x * gridDim.x)
            a[k] += b[k];
    }
}

Using this approach, we restore performances at a level very similar to performances we obtain without any polymorphism.

Expm1 GFLOPS GCFLOPS usage
Local 975 538 92%
Dispatch 478 263 45%
Generics 985 544 93%
peak 1174 587

Hybridizer HOWTO — Virtual Functions

Hybridizer supports virtual functions. If an implementation/override of a virtual needs to be available on the GPU, it has to be flagged with a Kernel attribute.


public interface ISimple
{
    int f();
}

public class Answer : ISimple
{
    [Kernel]
    public int f()
    {
        return 42 ;
    }
}

public class Other : ISimple
{
    [Kernel]
    public int f()
    {
        return 12;
    }
}

Hybridizer HOWTO — Resident Memory

From one kernel call to another, we might want to have some data resident on the device (of course this mainly applies when device memory is physically different from host memory). This is done using an interface: IResidentArray, and some attributes. This way, we can dramatically reduce the amount of memcpy and restrict to the minimal needed, still using automated memory management.