## Big data Chain Ladder analysis using activeH5

### Introduction

Active Analytics Ltd has recently released an R package called **activeH5** for big data storage and access allowing **very fast access to data frames and matrices on disk and in memory**, more details can be
found on the product launch blog. Today we compare the performance of the **activeH5** package with two previous posts on Chain Ladder analysis in **R** and **Python** using the rhdf5 and h5py
packages. If you have read the previous blogs the triangle simulators have been updated and so their code and times have changed.

### Creating the claims triangles

This is the code to simulate the claims traingles, feel free to skip to the times below the code block.

```
# Load the package
require(activeH5)
# Age-To-Age Factors
ageFact <- seq(1.9, 1, by = -.1)
# Inflation Rate
infRate <- 1.02
# Reversing columns
revCols <- function(x){
x[,ncol(x):1]
}
# This shake function is faster than jitter and equivalent to my Jitter() function in Python
shake <- function(vec, sigmaScale = 100)
{
rnorm(n = length(vec), mean = vec, sd = vec/sigmaScale)
}
# Alternative Row generation funtion
GenerateRow <- function(iDev, dFactors = cumprod(ageFact), dInflationRate = 1.02, initClaim = 154)
{
shake(initClaim)*shake(c(1, dFactors))*(dInflationRate^iDev)
}
# Function to generate a claims matrix
GenerateTriangle <- function(iSize, ...)
{
indices = 1:iSize
mClaimTri = t(sapply(indices, GenerateRow, ...))
# Reverse columns to get the claims triangle
mClaimTri = revCols(mClaimTri)
# Assign nan to lower triangle
mClaimTri[lower.tri(mClaimTri)] = NA
mClaimTri = revCols(mClaimTri)
return(mClaimTri)
}
# Function to write claims matrix to file
WriteToH5File <- function(dSName, filePath)
{
mClaims = GenerateTriangle(11)
h5WriteDoubleMat(dSName, mClaims, dim(mClaims), filePath)
return(invisible())
}
# The H5 file
claimsH5 = "data/ClaimsTri.h5"
# Create the H5 file
h5CreateFile(claimsH5, 1)
# The matrix names
matNames <- paste("ch", 1:2000, sep = "")
# Simulation time and time taken to write to file
system.time(sapply(matNames, WriteToH5File, filePath = claimsH5))
```

```
This is the time taken to run
# user system elapsed
# 0.850 0.039 0.888
```

Compare the above to that objects using rhdf5 (**7.795 s**) and to that obtained using Python's h5py package (**1.108 s**). As you can see the performance between **activeH5** and Python's h5py is comparable
in this case the time taken when using the activeH5 package was a little less (**0.888 s**).

### Analysis the claims triangles

Here we analyse the claims triangles stored on disk and write them to another file using the **activeH5** package. Again feel free to skip to the time benchmark below the code block.

```
# Function for calculating age-to-age factors
GetFactor <- function(index, mTri)
{
fact = matrix(mTri[-c((nrow(mTri) - index + 1):nrow(mTri)), index:(index + 1)], ncol = 2)
fact = c(sum(fact[,1]), sum(fact[,2]))
return(fact[2]/fact[1])
}
GetChainSquare <- function(mClaimTri)
{
nCols <- ncol(mClaimTri)
dFactors = sapply(1:(nCols - 1), GetFactor, mTri = mClaimTri)
dAntiDiag = diag(revCols(mClaimTri))[2:nCols]
for(index in 1:length(dAntiDiag))
mClaimTri[index + 1, (nCols - index + 1):nCols] = dAntiDiag[index]*cumprod(dFactors[(nCols - index):(nCols - 1)])
mClaimTri
}
WriteClaimsSquare <- function(dSName)
{
cSquare <- GetChainSquare(h5ReadDoubleMat(dSName, claimsH5))
h5WriteDoubleMat(dSName, cSquare, dim(cSquare), squareH5)
}
# Writing the claims square
squareH5 = "data/ClaimsSquare.h5"
# Create the H5 file
h5CreateFile(squareH5, 1)
# Simulation time and time taken to write to file
system.time(sapply(matNames, WriteClaimsSquare))
```

```
This is the time taken to run
# user system elapsed
# 0.850 0.039 0.888
```

Compare the above to that objects using rhdf5 (**7.795 s**) and to that obtained using Python's h5py package (1.108 s). As you can see the performance between **activeH5** and Python's h5py is comparable in this case
the time taken when using the **activeH5** package was a little less (**0.888 s**).

### Analysis the claims triangles

Here we analyse the claims triangles stored on disk and write them to another file using the **activeH5** package. Again feel free to skip to the time benchmark below the code block.

```
# Function for calculating age-to-age factors
GetFactor <- function(index, mTri)
{
fact = matrix(mTri[-c((nrow(mTri) - index + 1):nrow(mTri)), index:(index + 1)], ncol = 2)
fact = c(sum(fact[,1]), sum(fact[,2]))
return(fact[2]/fact[1])
}
GetChainSquare <- function(mClaimTri)
{
nCols <- ncol(mClaimTri)
dFactors = sapply(1:(nCols - 1), GetFactor, mTri = mClaimTri)
dAntiDiag = diag(revCols(mClaimTri))[2:nCols]
for(index in 1:length(dAntiDiag))
mClaimTri[index + 1, (nCols - index + 1):nCols] = dAntiDiag[index]*cumprod(dFactors[(nCols - index):(nCols - 1)])
mClaimTri
}
WriteClaimsSquare <- function(dSName)
{
cSquare <- GetChainSquare(h5ReadDoubleMat(dSName, claimsH5))
h5WriteDoubleMat(dSName, cSquare, dim(cSquare), squareH5)
}
# Writing the claims square
squareH5 = "data/ClaimsSquare.h5"
# Create the H5 file
h5CreateFile(squareH5, 1)
# Simulation time and time taken to write to file
system.time(sapply(matNames, WriteClaimsSquare))
```

```
This is the time taken to run.
# user system elapsed
# 1.229 0.068 1.299
```

Here we see that the time taken is **1.299 s** in comparison to the rhdf5 at **12.986 s** (an order of magnitude slower) and h5py at **1.224 s** (similar performance if a little faster).

### Summary

This blog post barely scratches the features available in the **activeH5** package, its purpose is to give an idea of the relative performance of the package to other big data equivalents out there.
More blogs and demonstrations of the capabilities of this package will follow. Active Analytics has a roadmap for building and releasing a series of package for big data analysis, **activeH5** will be the bedrock
of these packages and its purpose will be to provide the various data structures and storage and access performance needed for these package.

Thank you

### Data Science Consulting & Software Training

Active Analytics Ltd. is a data science consultancy, and Open Source Statistical Software Training company. Please contact us for more details or to comment on the blog.

**Dr. Chibisi Chima-Okereke, R Training, Statistics and Data Analysis.**