When we are young, we are taught to count from one … 1, 2, 3, … and so on. In mathematics the position of items in a vector or matrix start with number one and not zero. When implementing algorithms from mathematical equations in a language like C where the position identifier starts from the integer zero things can get a little confusing. It sounds like a trivial issue but a lot of effort can be spent making sure that algorithms are correct and bugs easily occur requiring re-writing of algorithms or resulting in release of buggy software. Python and C/C++ share the same numbering system - both start from zero while R and Julia programming languages number vectors from one.
To illustrate this issue, let’s take a simple example. The equation below calculates the mean squared error of the reserves from the Mack Chain Ladder paper (top of page 220).
$$ \begin{equation} \hat{mse(\hat{R})} = \sum_{i=2}^{I}{{(mse(\hat{R})) + \hat{C}_{iI}(\sum_{j=i+1}^{I-1}{\hat{{C}}_{jI})} \sum_{k=I+1-i}^{I-1}{\frac{2\hat{\sigma}_{k}^{2} \hat{f}_{k}^{2}}{\sum^{I-k}_{n=1}{C_{nk}}}}}} \qquad \ldots (1) \end{equation} $$
Consider an implementation in Julia below:
for i = 2:I
term1 = term2 = 0
for j = (i + 1):I
term1 += ciI[j]
end
term1 *= ciI[i]
for k = (I + 1 - i):(I - 1)
term2 += (2*sigmas[k]/(f[k]^2))/ECnk[k]
end
tmseR += mseR[i] + term1*term2
end
Each of the items ciI, sigmas
\( (\sigma^2) \), f, ECnk, and mseR
are vectors. The sum \(\sum_{n=1}^{I-k}{C_{nk}}\) is pre-summed over n
to give ECnk
and so is also a vector in index k
. You can see that this implementation is a natural interpretation of the equation. The equivalent implementation in C is this:
int i, j, k;
double term1, term2;
for(i = 1; i < I; ++i)
{
term1 = term2 = 0;
for(j = i + 1; j < I; ++j)
{
term1 += ciI[j];
}
term1 *= ciI[i];
for(k = I - 1 - i; k < I - 1; ++k)
{
term2 += (2.*sigmas[k]/pow(fact[k], 2))/ECnk[k];
}
tmseR += mseR[i] + term1*term2;
}
There are three potential points of confusion:
"<"
or ">"
sign as the limit, you could use "<="
or ">="
. You learn from books to use the former set and so when you stare at the for loop you have to “remind” yourself that less than the limit is okay - you don’t want to hit the limit because you started numbering from zero and to do so will result in undefined behaviour.++i
that when you should write --i
you may forget and create a loop that doesn’t end - oops!i
from one not two to get the correct vector position - fine. In the first inner loop the numbering of j
starts from (i + 1)
just as in the paper which is fine since we have already started numbering i
from one. The correct starting point for the second inner loop is not immediately obvious but if you get out a pen and paper, you’ll find that the correct starting point is (I - 1 - i)
quite different from the paper which sports (I + 1 - i)
.With more complex algorithms this can get worse. So at Active Analytics we like counting from one like R and Julia - it saves us alot of time. The advantage with Julia is that since for
loops are fast, you can be very literal with your algorithm in relation to equations which means much faster development times. In R though the literal numbering is nice, there are fewer chances to make full use of it since for
loops are slow, see a previous blog article for more details.