Altimesh - TALK

Poster GTC 2017 – Hybrid Vector Library—From Memory Bound to Compute Bound with NVVM

When submitting small tasks to the GPU, grid scheduling and synchronization costs may be much higher than computations, even on a CPU. In this case, the benefit of GPU computing is lost. Leveraging runtime compilation, we illustate an approach that generates source code to replace a list of library API calls into a single kernel […]

Talk GTC Europe 2016 – How Pascal And Power 8 Will Accelerate Counterparty Risk Calculations at BNP Paribas

Since the financial crisis of 2008, regulators have been increasingly demanding in terms of risk analysis and stress scenario simulations. In this talk, we present an approach for counterparty risk calculations based on Directed Acyclic Graphs. Calculations are arranged in a tree, where nodes are simulation parts. Nodes hold temporary data that may be reused […]

Tags: NVLink, Pascal, Power 8

Talk GTC 2016 – Java Image Processing: How Runtime Compilation Transforms Memory-Bound into Compute-Bound

A wide variety of image processing algorithms are typically parallel. However, depending on filter-size or neighborhood search pattern, memory access is critical for performances. We’ll show how loop reordering and memory locality fine-tuning help achieve best performance. Using Hybridizer to automate Java byte-code transformation to CUDA source code, and using new CUDA feature Run Time […]

Tags: CUDA, Image Processing, Java

Poster GTC 2016 – Using CLANG/LLVM Vectorization to Generate Mixed Precision Source Code

At Supercomputing 2015, NVIDIA announced Jetson TX1. This platform is the first available to natively expose mixed precision instructions. However, this instruction set requires that operations on 16-bit precision floating points are done in pairs, requiring usage of the half2 type which pairs two values in a single register. see it at GTC On-Demand — […]