For some time now we have been playing ‘limbo’ with execution times on the Chain Ladder algorithm. Last time we put it through Julia and with some tinkering got it down to about 25 microseconds. This time we are claiming less than 2 microseconds by using Julia’s ccall()
function that allows programmers to call C and Fortran libraries.
The code in this blog entry is run/compiled on Ubuntu 14.04, core i7 processor having 16 GB RAM, using julia version 0.2.1 and gcc version 4.8.2 compiler. The scripts are available on Github.
As we mentioned previously, Julia’s C++ api is still in its infancy, so we decided to go with pure C with this one. To pass a 2D array of floats to ccall()
you need to convert it to an array of pointers to 1D floats in Julia before sending to C. We did not like the idea of having to construct an array of pointers in Julia - because it would mean doing more tasks which may slow down the algorithm. After a bit of head-scrathing we decided to unfold the claims triangle into a 1D array, send it using a single pointer to be processed in C and then reshape that array back into a 2D array holding the complete claims triangle.
The C code functions for calculating the “age-to-age” factors and chain ladder are given below:
/* File: speed-cl.c */
#include <stdio.h>;
/* Function to calculate the age-to-age factors*/
void get_factor(double *vec, double *fact, int p)
{
int j, m;
double tmp[2];
for(j = 0; j < p-1; ++j)
{
tmp[0] = 0.; tmp[1] = 0.;
for(m = 0; m < (p - (j + 1)); ++m)
{
tmp[0] += vec[(m + j*p)];
tmp[1] += vec[(m + (1 + j)*p)];
}
fact[j] = tmp[1]/tmp[0];
}
}
/* Chain Ladder function */
void cl(double *vec, double *fact, int p)
{
int i, j, k;
for(i = p - 1; i > 0; --i)
{
for(j = 0; j < p - i; ++j)
{
k = p*i + (p - i) + j*p;
vec[k] = vec[k - p]*fact[i+j-1];
}
}
}
# Chain Ladder function
function chain_ladder(tri)
p = size(tri)[1]
tri = reshape(tri, (p*p, 1))
fact = zeros(p)
ccall((:get_factor, "speed-cl"), Void, (Ptr{Float64}, Ptr{Float64}, Int32), pointer(tri), pointer(fact), p)
ccall((:cl, "speed-cl"), Void, (Ptr{Float64}, Ptr{Float64}, Int32), pointer(tri), pointer(fact), p)
return reshape(tri, (p, p))
end
function bench(n::Int)
output = [@elapsed chain_ladder(x) for i = 1:n]
output = output*1E6 # microseconds
return DataFrames.DataFrame(min = quantile(output, 0), lq = quantile(output, 0.25), median = quantile(output, 0.5),
uq = quantile(output, 0.75), max = quantile(output, 1), neval = n)
end
Benchmarks
bench(1000)
1x6 DataFrame:
min lq median uq max neval
[1,] 1.586 1.604 1.614 1.68475 8.172 1000
That’s a median time of less than 2 microseconds.
Sometimes imposing restrictions on the way you write code can lead to some unforseen benefits, in this case we became more creative with the way we carried out the chain ladder calculation which led to a very nice uplift in performance. What restrictions can you impose on the way you work to gain efficiency benefits from increased performance? Be careful not to deprive yourself of tools that you need to maintain your effectiveness.