Some time ago we compared the use of R and Python for statistical applications. In that blog entry we also discussed the ease of integrating C++ code with each programming application. In this blog entry we present a simple example of calling C++ Chain Ladder code with R using the Rcpp and RcppArmadillo packages. If you are looking to integrate C++ libraries with R, Rcpp is by far the easiest way to do this, RcppArmadillo is a C++ wrapper for the linear algebra library Armadillo that provides an interface between Rcpp and Armadillo objects.
Some time ago, we looked at Chain Ladder calculations in R for large data sets. This time round we will compare what happens to the calculation time when we carry out the chain ladder matrix calculations in C++ and call these functions from R using Rcpp.
The C++ code we would like to call sits in a file called cL.cpp
, on the R script we load the Rcpp package.
require(Rcpp)
In the cL.cpp
file, we load the libraries
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
#include <iostream>
#include <vector>
#include <iterator>
#include <algorithm> // most of the algorithms
#include <numeric> // some numeric algorithm
using namespace std;
using namespace arma;
using namespace Rcpp;
To simulate the claims traingle we will use the same R code as in our previous blog repeated here:
# Age-To-Age Factors
ageFact <- seq(1.9, 1, by = -.1)
# Inflation Rate
infRate <- 1.02
# Function to reverse matrix columns
revCols <- function(x){
x[,ncol(x):1]
}
# Similar to jitter
shake <- function(vec, sigmaScale = 100)
{
rnorm(n = length(vec), mean = vec, sd = vec/sigmaScale)
}
# Row generation funtion
GenerateRow <- function(iDev, dFactors = cumprod(ageFact), dInflationRate = 1.02, initClaim = 154)
{
shake(initClaim)*shake(c(1, dFactors))*(dInflationRate^iDev)
}
# Function to generate a claims matrix
GenerateTriangle <- function(iSize, ...)
{
indices = 1:iSize
mClaimTri = t(sapply(indices, GenerateRow, ...))
# Reverse columns to get the claims triangle
mClaimTri = revCols(mClaimTri)
# Assign nan to lower triangle
mClaimTri[lower.tri(mClaimTri)] = NA
mClaimTri = revCols(mClaimTri)
return(mClaimTri)
}
The code for Chain Ladder analysis on the R side is is essentially the same as in the previous blog but is repeated here for your convenience. One of the functions GetFactorR()
is used to calculate the age-to-age factors and the other GetChainSquareR()
is used to complete the traingle to a “square”.
# Get claims factor at a particular column index
GetFactorR <- function(index, mTri)
{
fact = matrix(mTri[-c((nrow(mTri) - index + 1):nrow(mTri)), index:(index + 1)], ncol = 2)
fact = c(sum(fact[,1]), sum(fact[,2]))
return(fact[2]/fact[1])
}
# Function to carry out Chain Ladder on a claims triangle
GetChainSquareR <- function(mClaimTri)
{
nCols <- ncol(mClaimTri)
dFactors = sapply(1:(nCols - 1), GetFactorR, mTri = mClaimTri)
dAntiDiag = diag(revCols(mClaimTri))[2:nCols]
for(index in 1:length(dAntiDiag))
{
mClaimTri[index + 1, (nCols - index + 1):nCols] = dAntiDiag[index]*cumprod(dFactors[(nCols - index):(nCols - 1)])
}
mClaimTri
}
There are four C++ functions, the first calculates the age-to-age factor:
double GetFactor(int index, mat mTri)
{
int nRow = mTri.n_rows, nCol = mTri.n_cols;
mat subMat = mTri.submat(0, index, nRow - (index + 2), index + 1);
rowvec colSums = arma::sum(subMat, 0);
double inFact = colSums(1)/colSums(0);
return inFact;
}
The second calculated the factors over the whole matrix
vec GetFactors(mat mTri)
{
int nRow = mTri.n_rows, nCol = mTri.n_cols;
vec dFactors(nCol - 1);
for(int i=0; i < nCol - 1; ++i)
{
dFactors(i) = GetFactor(i, mTri);
}
return dFactors;
}
The third calculates the cumulative product over a vector
// cummulative product
vec cumprod(vec mvec)
{
int nElem = mvec.n_elem;
double cprod = mvec(0);
for(int i = 1; i < nElem; ++i)
{
cprod *= mvec(i);
mvec(i) = cprod;
}
return mvec;
}
The fourth function completes our claims triangle
// [[Rcpp::export]]
SEXP GetChainSquareCpp(SEXP mClaimTri)
{
NumericMatrix nMat(mClaimTri);
int nRow = nMat.nrow(), nCol = nMat.ncol();
mat armMat(nMat.begin(), nRow, nCol, FALSE);
vec dFactors = GetFactors(armMat);
mat revMat = fliplr(armMat);
vec dAntiDiag = diagvec(revMat);
dAntiDiag = dAntiDiag.subvec(1, nCol - 1);
double dMult;
vec prodVec;
for(int index = 0; index < dAntiDiag.n_elem; ++index)
{
dMult = dAntiDiag(index);
prodVec = dFactors.subvec(nCol - index - 2, nCol - 2);
prodVec = cumprod(prodVec);
armMat(index + 1, span(nCol - index - 1, nCol - 1)) = dMult*prodVec.st();
}
return wrap(armMat);
}
We now compile the C++ code and incorporate the functions in our R environment and load the microbenchmarks package:
sourceCpp("cL.cpp")
require(microbenchmark)
We generate a triangle and run the benchmark:
x <- GenerateTriangle(11)
microbenchmark(GetChainSquareR(x), GetChainSquareCpp(x), times = 10000L)
Unit: microseconds
expr min lq median uq max neval
GetChainSquareR(x) 177.885 187.923 193.4030 205.044 3087.584 10000
GetChainSquareCpp(x) 5.121 6.045 8.3935 9.128 179.754 10000
Evidently a significant speed-up is achived by calling C++ from R.
There are times when not much needs to be said about the benefits of using certain packages/techniques. It is common to use C/C++ to speed up calculations or to call other C++ libraries for various applications from some domain specific language. Rcpp is fantastic for this purpose. It provides an easy to use interface that allows seamless integration between R and C++. It gives you the benefits of C++ speed but makes it easy to integrate this with R.