Speed Chain Ladder analysis with Rcpp

Active Analytics Ltd: posted 23 June 2014 by Chibisi Chima-Okereke

Introduction

Some time ago we compared the use of R and Python for statistical applications. In that blog entry we also discussed the ease of integrating C++ code with each programming application. In this blog entry we present a simple example of calling C++ Chain Ladder code with R using the Rcpp and RcppArmadillo packages. If you are looking to integrate C++ libraries with R, Rcpp is by far the easiest way to do this, RcppArmadillo is a C++ wrapper for the linear algebra library Armadillo that provides an interface between Rcpp and Armadillo objects.

Some time ago, we looked at Chain Ladder calculations in R for large data sets. This time round we will compare what happens to the calculation time when we carry out the chain ladder matrix calculations in C++ and call these functions from R using Rcpp.

Preparations

The C++ code we would like to call sits in a file called cL.cpp, on the R script we load the Rcpp package.

require(Rcpp)

In the cL.cpp file, we load the libraries

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
#include <iostream>
#include <vector>
#include <iterator>
#include <algorithm>  // most of the algorithms 
#include <numeric> // some numeric algorithm

using namespace std;
using namespace arma;
using namespace Rcpp;

Simulating the claims traingle

To simulate the claims traingle we will use the same R code as in our previous blog repeated here:

# Age-To-Age Factors
ageFact <- seq(1.9, 1, by = -.1)

# Inflation Rate
infRate <- 1.02

# Function to reverse matrix columns
revCols <- function(x){
  x[,ncol(x):1]
}

# Similar to jitter
shake <- function(vec, sigmaScale = 100)
{
  rnorm(n = length(vec), mean = vec, sd = vec/sigmaScale)
}

# Row generation funtion
GenerateRow <- function(iDev, dFactors = cumprod(ageFact), dInflationRate = 1.02, initClaim = 154)
{
  shake(initClaim)*shake(c(1, dFactors))*(dInflationRate^iDev)
}

# Function to generate a claims matrix
GenerateTriangle <- function(iSize, ...)
{
  indices = 1:iSize
  mClaimTri = t(sapply(indices, GenerateRow, ...))
  # Reverse columns to get the claims triangle
  mClaimTri = revCols(mClaimTri)
  # Assign nan to lower triangle
  mClaimTri[lower.tri(mClaimTri)] = NA
  mClaimTri = revCols(mClaimTri)
  return(mClaimTri)
}

R Chain Ladder R code

The code for Chain Ladder analysis on the R side is is essentially the same as in the previous blog but is repeated here for your convenience. One of the functions GetFactorR() is used to calculate the age-to-age factors and the other GetChainSquareR() is used to complete the traingle to a "square".

# Get claims factor at a particular column index
GetFactorR <- function(index, mTri)
{
  fact = matrix(mTri[-c((nrow(mTri) - index + 1):nrow(mTri)), index:(index + 1)], ncol = 2)
  fact = c(sum(fact[,1]), sum(fact[,2]))  
  return(fact[2]/fact[1])
}

# Function to carry out Chain Ladder on a claims triangle
GetChainSquareR <- function(mClaimTri)
{
  nCols <- ncol(mClaimTri)
  dFactors = sapply(1:(nCols - 1), GetFactorR, mTri = mClaimTri)
  dAntiDiag = diag(revCols(mClaimTri))[2:nCols]
  for(index in 1:length(dAntiDiag))
  {
	mClaimTri[index + 1, (nCols - index + 1):nCols] = dAntiDiag[index]*cumprod(dFactors[(nCols - index):(nCols - 1)])
  }
  mClaimTri
}

R Chain Ladder C++ code

There are four C++ functions, the first calculates the age-to-age factor:

double GetFactor(int index, mat mTri)
{
	int nRow = mTri.n_rows, nCol = mTri.n_cols;
	mat subMat = mTri.submat(0, index, nRow - (index + 2), index + 1);
	rowvec colSums = arma::sum(subMat, 0);
	double inFact = colSums(1)/colSums(0);
	return inFact;
}

The second calculated the factors over the whole matrix

vec GetFactors(mat mTri)
{
	int nRow = mTri.n_rows, nCol = mTri.n_cols;
	vec dFactors(nCol - 1);
	for(int i=0; i < nCol - 1; ++i)
	{
		dFactors(i) = GetFactor(i, mTri);
	}
	return dFactors;
}

The third calculates the cumulative product over a vector

// cummulative product
vec cumprod(vec mvec)
{
	int nElem = mvec.n_elem;
	double cprod = mvec(0);
	for(int i = 1; i < nElem; ++i)
	{
		cprod *= mvec(i);
		mvec(i) = cprod;
	}
	return mvec;
}

The fourth function completes our claims triangle

// [[Rcpp::export]]
SEXP GetChainSquareCpp(SEXP mClaimTri)
{
	NumericMatrix nMat(mClaimTri);
	int nRow = nMat.nrow(), nCol = nMat.ncol();
	mat armMat(nMat.begin(), nRow, nCol, FALSE);
	
	vec dFactors = GetFactors(armMat);
	mat revMat = fliplr(armMat);
	vec dAntiDiag = diagvec(revMat);
	dAntiDiag = dAntiDiag.subvec(1, nCol - 1);
	double dMult;
	vec prodVec;
	for(int index = 0; index < dAntiDiag.n_elem; ++index)
	{
		dMult = dAntiDiag(index);
		prodVec = dFactors.subvec(nCol - index - 2, nCol - 2);
		prodVec = cumprod(prodVec);
		armMat(index + 1, span(nCol - index - 1, nCol - 1)) = dMult*prodVec.st();
	}
	return wrap(armMat);
}

R vs. C++ speed test

We now compile the C++ code and incorporate the functions in our R environment and load the microbenchmarks package:

sourceCpp("cL.cpp")
require(microbenchmark)

We generate a triangle and run the benchmark:

x <- GenerateTriangle(11)
microbenchmark(GetChainSquareR(x), GetChainSquareCpp(x), times = 10000L)
Unit: microseconds
                 expr     min      lq   median      uq      max neval
   GetChainSquareR(x) 177.885 187.923 193.4030 205.044 3087.584 10000
 GetChainSquareCpp(x)   5.121   6.045   8.3935   9.128  179.754 10000

Evidently a significant speed-up is achived by calling C++ from R.

Conclusion

There are times when not much needs to be said about the benefits of using certain packages/techniques. It is common to use C/C++ to speed up calculations or to call other C++ libraries for various applications from some domain specific language. Rcpp is fantastic for this purpose. It provides an easy to use interface that allows seamless integration between R and C++. It gives you the benefits of C++ speed but makes it easy to integrate this with R.

Data Science Consulting & Software Training

Active Analytics Ltd. is a data science consultancy, and Open Source Statistical Software Training company. Please contact us for more details or to comment on the blog.

Dr. Chibisi Chima-Okereke, R Training, Statistics and Data Analysis.