2021-05-18
D has always had a close association with the C programming language, and it is relatively straightforward to call C functions from D. Recently, a pull request has been initiated on the GitHub repository of the D reference compiler DMD by Walter Bright the originator of the language titled add ImportC compiler
. This PR would allow D programmers to import and compile C files directly with D code, enabling a frictionless interface calling C from D. This is significant for many applications that integrate or are looking to integrate C libraries including the area of numerical computing. Many fundamental algorithms from BLAS, LAPACK, to FFTW have well supported open source C implementations, and the proposed changes would ease integration and allow programmers to keep up with the latest changes and adopt new features easily. In addition, such a feature would ease the transition of prospective C programmers considering using D as a safe and more productive language.
2021-01-06
A previous article looked at Kernel Matrix calculations in Nim. In this article, we look at Kernel Matrix calculations in the Chapel programming language, and offer some tips on performance optimization. Chapel is a static compiled programming language developed by Cray and is aimed at high performance programming for both multicore computers and clusters with an emphasis on data and task parallelism in scientific computing. It is heavily influenced by FORTRAN, in that it has matrix/tensor style arrays with which the user can run element-element arithmetic operations. Chapel has both records (value types similar to C/C++ structs) and single dispatch reference classes. More information on the Chapel programming language is given in the official website. The particulars of Kernel Matrix calculations are discussed in the previous article and will not be repeated here.
2020-12-18
Bidiagonalization is used in various applications in linear algebra. It transforms a matrix A in R^{m x n}, m ≥ n into a bidiagonal matrix, described as having non-zero values only in the diagonal and above or below the diagonal with zero elsewhere. A previous article implemented the QR decomposition by using Householder transformations to induce zeros in particular parts of the matrix, resulting in the upper triangular matrix R along with the orthogonal transformation matrix Q. Householder bidiagonalization uses Householder transformations to transform the matrix A to a bidiagonal matrix B R^{m imes n}, m ≥ n where the bottom m - n rows are zero. This is done by generating the appropriate orthogonal transformations such that A = U B V^{T} where U and V are orthogonal matrices. The matrices U and V are the accumulated Householder transformations. This article is a short note presenting the implementation of bidiagonalization in the D programming language. Householder Bidiagonalization is also known as the Golub-Kahan Bidiagonalization and these names will be used interchangeably in the text to refer to the same transformation.
2020-12-10
This article presents a basic technique for singular value decomposition (SVD). It starts by presenting the One-Sided-Jacobi-Rotation for SVD of square (but not necessarily symmetric) matrices. It then shows that this method can be used to obtain the SVD of rectangular matrices if a QR decomposition is carried out before the one sided Jacobi method is used to calculate the SVD of the R part of the QR decomposition. The article finishes by presenting some numerical outputs.
2020-12-03
This article presents various methods for calculating eigendecompositions. These methods are Power, Inverse, ‘Pure’ QR, QR with Shifts, and the Jacobi methods, and they will all be implemented in the D programming language. Previous articles have discussed different matrix factorization techniques. These factorizations transform the matrix into a more useful form that allows us to solve various problems. Eigendecomposition is no different, we transform a symmetric matrix (A
) into a diagonal matrix by carrying out transformations on A
. The methods we present mostly rely on inducing zeros in the matrix by rotations or reflections. They use Givens rotations and Householder reflections (that we implemented in previous articles), and as some of the method names suggest we also use the QR method to carry out some of the transformations (which are themselves often rotations and reflections). These transformations are accumulated as they are applied to the matrix. The accumulated transformations are the eigenvectors in our decomposition, while the diagonalized matrix are the eigenvalues. The eigenvalue diagonal matrix is therefore a transformed version of the initial matrix (A
). This means that if we apply a function to the eigenvalues, and carry out a reverse transformation using the eigenvector matrix, we end up with a matrix that has been altered in such a way that it is as if we applied the function directly to the initial matrix itself. For example, a matrix inverse can be obtained by reverse transforming inverted eigenvalues.
2020-11-26
This article focuses on the implementation of LU and Cholesky decomposition using the D programming language. The main purpose of these matrix factorizations is to convert linear equations in the form Ax = b
, where A in R^{n x n}
, x in R^{n}
, and b in R^{n}
into triangular systems that can be solved by either back or forward substitution. In the case of an LU decomposition solver (Gaussian Elimination), the only constraint is that the matrix A
be of full rank, though the decomposition itself can be applied to any full rank matrix of dimension A in R^{m x n}
. Cholesky decomposition is constrained to factor square symmetric matrices only.
2020-11-18
In a previous article, Householder and Gram-Schmidt QR decompositions were implemented in D. This article presents the implementation of Givens QR decomposition. QR decomposition A = QR
is used in innumerable applications. One fundamental use is to solve the linear equation Ax = b
(for x
), by transforming the matrix A
into an upper triangular matrix R
which we know how to solve for (using back substitution). This is done by re-contextualizing the problem as Rx = Q^{T}b
which can be written as Rx = y
. Since y
is known, it is easy to solve for x
.
2020-11-10
Nim is a static programming language that produces either C, or C++ code for further compilation or Javascript code for interpretation. It is relatively new (created in 2008), its syntax is inspired by languages such as Python, Pascal, and Ada, it has strong metaprogramming capabilities, and even support for multiple dispatch. Nim not only supports UFCS (universal function call syntax) but also allows functions to be called with or without brackets giving the user a lot of flexibility in their programming style. This article takes a quick look at the language by implementing a Kernel matrix calculation function that allows a variety of kernels to be passed to the main calculating function.
2020-11-04
This article presents techniques in linear algebra and nonlinear optimization, it pulls on some Active Analytics articles presented in the past which implement various techniques in D and uses them to construct a linear solve, and Newton and Quasi-Newton optimizers. It begins by first using techniques created in a previous article on QR decomposition A = QR and converts these algorithms into functions for solving a linear system for x where Ax = b. We then take this solver and use it while implementing gradient and Newton-based optimization techniques for minimizing a function f(x) …
2020-10-28
This article presents the implementation of QR decomposition in D. Some background implementation details will be left out, for example implementation of the matrix structure used, and elements of common linear algebra algorithms. The gemm
BLAS algorithm will be called from the C Openblas library. The article focuses on the implementation of the Gram-Schmidt, its modified from, and the Householder QR decomposition, the referred texts give a good theoretical and intuitive account for the QR decomposition of a matrix (A = QR).
2020-10-28
The mapply
function in R takes in a varying number of arguments each of which is an R (atomic) vector
or a list
. It also takes in a function as one of its arguments. The idea is that a single element is taken from each argument and represents the input to the function. For instance the first element is taken from each of the vectors and this collection of elements becomes the input to the function which returns an item. This process is repeated over the vector(s) and results in a vector or list of output elements.
2020-10-24
This article is a follow-up to a previous article on bracketing functions in numeric optimization problems, in which two techniques for creating initial brackets or sections from a point were presented including several methods for refining the brackets to give progressively narrower sections. In this article, we take this one step further and show how to incorporate interpolation methods discussed in the previous article to line search methods.
2020-10-16
Numerical integration is an important tool in many applications, this article is an introduction to implementing several integration algorithms for univariable functions. The article finishes with a description of how to implement the Gauss Quadrature in D, which is the standard method used to carry out numerical integration. C’s LAPACKE library is used to carry out eigenvalue decomposition in order to evaluate the coefficients and points of the quadrature. In the process the reader will observe that calling C libraries from D is simple. As usual, the article focuses on the implementation details, we do not compare the accuracy of the various methods, only show their outputs because it is apparent that as the standard method for integration, Gauss Quadrature is the most accurate.
2020-10-12
Bracketing (also known as sectioning) is a technique used in numerical optimization procedures to obtain an interval containing a local monotonic minimum (maximum) point. This technique is used in functions containing only one variable. The application of this technique in the wider optimization context is in optimizing a line search parameter used in multivariable optimization problems. In this article we will focus on techniques for bracketing functions with a single variable, and we show how these techniques are implemented in the D language.
2020-10-03
The D programming language semantics lends itself to building a wide range of application types, one of these is technical and scientific computing. However, the number of packages available for numerical computing is low. This article demonstrates how to create a basic numerical differentiation scheme in D using finite differences to calculate first and second partial derivatives when given an expression. We use template functions and some basic template declarations in this article, so if you are unfamiliar with these concepts please visit a previous article that covers them.
2020-09-24
Working with modules is a basic and fundamental part of programming, and provides the programmer with an valuable source of predefined functionality. The import
directive in D provides an easy way of accessing these resources, and this article outlines the various ways of using it.
2020-09-19
In a similar way to C and C++ where one dimensional arrays are a block of memory traversed by a pointer, in D, arrays are defined by a block of memory, a pointer, but they also have a length which make them easier to safely traverse. In languages such as C, C++, and D, the arrays that are built into the language are not the same as arrays in languages such as R, Numpy, or Julia, which are composed of a single block of memory addressed in different ways to create the effect of being multidimensional arrays. In the aforementioned native languages, multidimensional arrays are effectively nested arrays, for instance in D, a two dimensional array is an array where each item is itself, an array of elements (though packages like Mir implement single memory block multidimensional arrays, and Julia differentiates nested from multidimensional arrays).
2020-09-11
Tuples in D are a useful flexible alternative to structs and classes, some might say that they are a “poor man’s object”, but they are more than that. They are a convenient and easy to use container for items that are not atomic in type, however, they have two important issues that the programmer needs to be aware of, or else tuples can seem difficult to work with. The first issue is that tuples can not be returned by functions. The second is that they can not be iterated over like a regular array, because they are not arrays. The items in a tuple are not atomic and so can have unequal sizes, therefore their indexing access patterns are different from arrays. This article shows how to construct and iterate over tuples in a trouble free manner.
2020-09-03
The ability to have generality and specificity, and to generate code, provides the programmer with power and brevity. Here, we focus on generic and metaprogramming, both of which are resolved at compile time. In the sections dedicated to generic programming, we show how templates with type and value parameters, can increase the flexibility and reusability of code, how they can be made to have generality, and be used to target specific conditions. In the sections dedicated to metaprogramming, we introduce an array of tools that allows us to generate and manipulate code, which can be executed not only at run time, but also at compile time.
2020-08-21
IDX files store multi-dimensional array data where each element could be one of a specific range of types. The user does not necessarily know the type of elements in the array. Static programming languages require type information for all variables at compile time. Usually some kind of interface that effectively masks the type to ANY is used to get around this problem meaning the array has to be interacted with using dynamic methods which are slow. This article shows how this limitation can be circumvented in D by reading the data file at compile time, and using D’s generic programming techniques to obtain the data type and return the data itself.
2017-03-13
C and FORTRAN have widely used numeric libraries that are heavily relied upon. D’s generic and meta-programming features, as well as multi-paradigm nature and clear syntax makes it a great choice for various applications. In addition D has easy interoperability with both C and FORTRAN. This article outlines the basics of how to call C and FORTRAN from D.
2017-01-15
D is a static programming language that some regard as an upgrade to C++, D is also influenced by Java and various other programming languages. It aims to have the performance and safety of compiled languages while allowing the user the expressiveness of a dynamic programming language like Python. In this article we cover some basic D programming techniques.
2014-12-15
This article outlines a simple scheme for computing generalized linear models for large datasets. It works by doing a map-reduce style set of matrix operations over the rows of the X and Y matrices, allowing the algorithm to be parallelized over row-wise blocks of the matrix operations finally aggregating the outputs to form the grammian which is then inverted.
2014-12-07
In this article, we create a fast algorithm in C++ for reading count data for differential expression and call it from R using the popular RCpp interface package. It is then benchmarked against R’s read.delim function and is shown to be markedly (~ x3) faster.
2014-12-05
Here we create a simple utility for searching for items in files from R itself. The example introduces use of the grep function in R and shows how the lapply function is used to poweful effect.
2014-12-04
Short article about the performance profiling package profr which creates nicely formatted performance information in comparison to R’s native Rprof function and allows the call stack to be represented graphically using it’s own plot method.
2014-12-01
This article investigates one of the sharp edges of R, the sapply() function which can cause problems if used in production code because it produces unexpected outputs. With lapply you are sure the output will be a list of some kind, with vapply the user specified the nature of each output element but even then the nature of the output object is unknown.
2014-11-20
Evolve Analytics are retail energy data experts that provide software services. I worked with Evolve Analytics to assist in delivering user interfaces for specified design requirements, and to create a tool for identifying out of trend meter readings, which are important in determining accurate Annual Quantity calculations.
2014-09-05
activeH5 is a package that eases the difficulty of I/O of larger amounts of data in R and is built on HDF5. It allows the user to do I/O of data.frames and matrices to/from file and memory.
2014-08-14
Short benchmark that shows that Julia 0.3 has improved performance on ChainLadder calculations from version 0.2.
2014-08-13
Griping about counting from 0 rather than 1 … When we are young, we are taught to count from one … 1, 2, 3, … and so on. In mathematics the position of items in a vector or matrix start with number one and not zero. When implementing algorithms from mathematical equations in a language like C where the position identifier starts from the integer zero things can get a little confusing.
2014-08-11
This article presents and example of using Julia’s ccall function that allows a user to call c code from Julia. It presents a fast implementation for ChainLadder in C and shows how it is called from Julia. For the specific data size, the process runs in 2 microseconds.
2014-07-24
R said to be a functional programming language but it has no native chain/pipe operator. The magrittr package provides this functionality. This article looks under the hood of operators in R and does a simple implementation of the magrittr operator and also takes a quick look at object orientated programming in R.
2014-07-23
This article compared the performance of random number generators in R and Python.
2014-07-06
This article compares performance of R, Julia and C++ called from R using the RCpp interface package using the Chain Ladder actuarial calculation. It shows that Julia as faster performance than R (~ x7) but is still (~ x3) slower than the C++ implementation.
2014-06-23
RCpp is a convenient user friendly tool that allows C++ code to interoperate with R. This article compares the performance of R to C++ running the Chain Ladder calculation using the RCpp interface and the C++ Linear Algebra Armadillio library.
2014-06-08
Data injest can be a slow process particularly when your data loads are strickly serial. Modern hard disks for example SSDs allow parallell data reads. This article presents a simple tutorial of the foreach and doParallel package in parallelising text file reads.
2014-05-30
Can we create more value by combining different tools and programming languages? This is a fun speculative approach to that process.
2014-05-24
Want to call R code from PHP or Python? This article shows how to build a simple server in R using ZMQ and then how to use this server to call R functions from PHP and Python.
2014-05-18
Want to create web graphics from outputs from R analysis? This article shows how to bring R, Javascript, PHP, and MySQL together to create static graphics using R’s ggplot2 package and dynamic graphics using Javascript’s D3.
2014-03-23
activeReg is an R package for running GLMs for data sets that are too large to fit into memory and for using memory more efficiently by parallelising the calculations over blocks of the design X and target Y. A demo of basic multiple regression and logistic regression is presented.
2014-03-22
Generating graphical plots for differential expression of Drosophila development.
2014-03-22
Simple tutorial for multinomial regression analysis in R using data for suicide ideation as part of work done for charity.
2014-03-22
Mining tweets and plotting word clouds in Python using the twitter package, and pytagcloud libraries.
2014-03-22
Data science programming language wars, R vs Python - why at this point in time R still beats python in statistical computing.
2014-01-26
Using the activeH5 package to speed up I/O for large datasets in R.
2014-01-26
This article demonstrates how to process and store large number of calculations in HDF5 format in R using the rhdf5 package. The calculation used is the Chain Ladder algorithm.
2014-01-26
This article demonstrates how to execute and store a large number of Chain Ladder calculations in Python using the h5py package.
2013-07-25
This week I will present some material on using the Metropolis-Hastings algorithm to carry out GLM-like actuarial pricing in R. We use the motorins data set from the faraway package and compare the output with using a standard glm() function in R. Monte Carlo methods are pretty popular nowadays particularly for analytically intractable problems however we will present a simple model. The paper by Chib and Greenberg outlines the Metropolis-Hastings method.
2013-07-10
Review of the book “R in a Nutshell: A Desktop Quick Reference, by Joseph Adler” published by O’Reilly. I have found this book very useful over the years. I still have it sitting on my shelf and reference it even now.
2013-07-04
The ChainLadder R package provides traditional tools for projecting the ultimate loss of a book of business. Here we outline outputs from a nonlinear regression model and a state space model. The data used in this presentation was obtained from the auto claims in the ChainLadder package.
2013-07-03
The Rcpp package allows R programmers to easily integrate C++ code and libraries with R. This allows C++ functionality to be called from R, which is useful for writing high performance R code or interfacing various libraries available in C++ with R. At Active Analytics we like this package a lot; it is very well maintained and its frequent improvement updates increases its ease of use.
2013-05-31
Using R for creating 3D plots from laser profiling videos.
2013-05-22
This article outlines some exploratory plots that a pricing analyst might use when looking at data. These are plots allow claims frequency and severity to be viewed in one and two-way type plots, providing some insight to potentially effective rating factors before statistical tests and modelling process.
2013-05-22
I/O benchmarks in R for raw binary, RData and CSV files.
2013-05-22
Showing off ggplot2 charts with UK Crime data.
2013-05-22
Active Analytics Ltd has just done some charity work for Peer Corps Trust Fund based in Tanzanian to move some research from Stata to R. The organization is migrating from proprietary software to take advantage of cost benefits of open source solutions. The research concerned suicide expression among school-attending adolescents in a middle-income sub-Saharan country using multinomial regression. A blog explaining the R code with the work done on the piece is located here.
2013-05-22
There are many approaches to handling errors while writing code, if you are creating a library that will be used by many users or you simply want the code you have created not to stop processing when it hits an error, then the try() and tryCatch() functions in R are your friends. In this blog entry, I will be focusing on the tryCatch() function even though what I present can be done using the try() function with subsequent conditionals.
2013-05-22
Pulling data from Yahoo Finance using the CSV API and plotting the data in R using ggplot2.