Implementing LU and Cholesky Decomposition in D

Author: Dr Chibisi Chima-Okereke Created: November 26, 2020 00:24:47 GMT Published: November 26, 2020 00:24:47 GMT

Introduction

This article focuses on the implementation of LU and Cholesky decomposition using the D programming language. The main purpose of these matrix factorizations is to convert linear equations in the form \(Ax = b\), where \(A \in \mathbb{R}^{n \times n}\), \(x \in \mathbb{R}^{n}\), and \(b \in \mathbb{R}^{n}\) into triangular systems that can be solved by either back or forward substitution. In the case of an LU decomposition solver (Gaussian Elimination), the only constraint is that the matrix \(A\) be of full rank, though the decomposition itself can be applied to any full rank matrix of dimension \(A \in \mathbb{R}^{m \times n}\). Cholesky decomposition is constrained to factor square symmetric matrices only.

References:

Gaussian Elimination

LU decomposition is a matrix formalization of Gaussian Elimination, that operates on an augmented matrix, a column-wise concatenation \(A|b\). It creates an upper triangle in the \(A\) part of the augmented matrix by repeatedly multiplying specific rows by appropriate factors and subtracting them from other rows. This eliminates coefficients in \(A\) in such a way as to end up having the appropriate upper triangular structure. At the end of the elimination process, the linear equation becomes \(Ux = c\), where \(U\) is an upper triangular matrix (formed from \(A\)) and \(c\) is the result of \(b\) having taken part in the elimination process.

A linear system of equations can be represented by:

$$ \begin{matrix} a_{11}x_{1} & + & a_{12}x_{2} & + \ldots & + & a_{1n}x_{n} & = & b_{1} \\
a_{21}x_{1} & + & a_{22}x_{2} & + \ldots & + & a_{2n}x_{n} & = & b_{2} \\
\vdots & + & \vdots & + \ldots & + & \vdots & = & \vdots \\
a_{n1}x_{1} & + & a_{n2}x_{2} & + \ldots & + & a_{nn}x_{n} & = & b_{n} \\
\end{matrix} $$

Since \(a_{1,*}\) will end up as the only row with complete coefficients, this row is used to eliminate the first coefficients in the other rows. Iterating over the rows \(k = 2, \ldots, n\), we multiply row \(1\) by \(\frac{a_{k,1}}{a_{1,1}}\), this makes the first coefficient of the altered first row equal to \(a_{k,1}\). This altered row is subtracted from \(k^{th}\) row inducing a zero in the first position of that row. We then take the original row \(1\) and multiply it by \(\frac{a_{k + 1, 1}}{a_{1, 1}}\) and subtract it from row \(k + 1\), continuing to do this all the way down the rows till row \(n\). Row \(2\) then becomes the factor row and we multiply it by \(\frac{a_{k,2}^{2}}{a_{2,2}^{2}}\), and carry out the subtractions as before for \(k = 3, \ldots, n\), this process is repeated over and over again until we get an upper triangular matrix \(U\) and the associated altered vector \(c\). Which can be solved by back substitution, the topic and implementation is covered in another article. The \(L\) portion of LU decomposition is located in the subdiagonal of the matrix \(A^{(n)}\) (\(k = n\) at the end of the algorithm). Gaussian Elimination is the solver and the associated decomposition is LU: \(A = LU\).

In this article we use an alternative name of the aforementioned backsubstitution implementation. There is also a function for the accompanying forward substitution. In the code below, only the signatures are shown here since the implementation for forward substitution is simply the converse of back substitution. Instead of naming the solvers back and forward substitution, the naming has been based on solving upper/lower triangular matrices respectively.

enum UPLO{Upper, Lower}
alias Upper = UPLO.Upper;
alias Lower = UPLO.Lower;

T[] substitution(UPLO uplo, T)(Matrix!(T) U, T[] y)
if(uplo == Upper)
{
  //...
}

T[] substitution(UPLO uplo, T)(Matrix!(T) L, T[] y)
if(uplo == Lower)
{
  //...
}

Partial pivoting

At the \(k^{th}\) step of Gaussian Elimination, equation \(k\) must be divided by \(a_{k, k}^{(k)}\), and if \(a_{k, k}^{(k)}\) is zero, Gaussian Elimination will fail and if it is close to zero it will produce numerical errors and cause the algorithm to become unstable. To mitigate this, at the beginning of each set of elimination for a column, we swap the current \(k^{th}\) row with a row \(l\) such that \(l \in {k, k + 1, \ldots, n}\) with \(|a_{l,k}| = max|a_{i, k}^{(k)}|\), meaning select the row having the largest absolute value in the column \(k\). Row \(l\) is swapped with row \(k\), which obviously changes the structure of the matrix \(A^{(k)}\). To account for this, a permutation matrix \(P\) is used to record the row swaps. It starts off as the identity matrix, but each time rows in \(A^{(k)}\) are swapped, the corresponding rows in \(P\) are also swapped. In practice the permutation swaps are recorded in a vector of indices which can be used to permute \(A\) as required. The LU decomposition then becomes \(PA = LU\).

Cholesky Decomposition

Cholesky decomposition is related to LU decomposition. Now \(A = LU\), and \(U\) is an upper triangular matrix that includes the diagonal (\(L\) is subdiagonal). For a symmetric matrix \(A\), \(U = DL^{T}\), and \(A = LDL^{T}\), where \(D\) is a diagonal matrix. The Cholesky decomposition is then \(A = (LD^{1/2})(LD^{1/2})^{T} = \tilde{L}\tilde{L}^{T}\). More formally \(A = LL^{T}\) (obviously not the same \(L\) as in \(LU\)). The standard Cholesky decomposition requires using a square root for \(D^{1/2}\), meaning that all its elements must be positive - the matrix \(A\) should be positive definite. An alternative decomposition is \(A = LDL\) which does not require using a square root but means that all the elements of \(D\) must be non-zero since they divide elements.

The implementations of the matrix factorizations described above now follows. As usual, these algorithms are for study purposes not for use in real world applications.

Implementing LU Decomposition

The implementation of LU decomposition as a solver is given below. It gives the user the option of overwriting \(A\) and \(b\) matrix with \(U\) and \(c\).

auto luSolve(bool overwrite = false, T)(Matrix!(T) A, T[] b)
{
  immutable auto n = A.ncol;
  
  //Overwrite options
  static if(overwrite)
  {
    Matrix!(T) U = A;
    T[] c = b;
  }else{
    Matrix!(T) U = A.dup;
    T[] c = b.dup;
  }
  //Calculating U and c
  foreach(k; 0..(n - 1))
  {
    immutable T d = 1/U[k, k];
    foreach(i; (k + 1)..n)
    {
      U[i, k] = U[i, k]*d;
      c[i] = c[i] - c[k]*U[i, k];
      foreach(j; (k + 1)..n)
      {
        U[i, j] -= U[i, k]*U[k, j];
      }
    }
  }
  //Obtaining L
  auto L = eye!(T)(n);
  for(long i = 0; i < n; ++i)
  {
    for(long j = i; j < n; ++j)
    {
      if(i == j)
        continue;
      L[j, i] = U[j, i];
      //Setting subdiagonals to zero
      U[j, i] = 0;
    }
  }
  "U:\n".writeln(U);
  "L:\n".writeln(L);
  //Back substitution solver
  return substitution!(Upper)(U, c);
}

LU decomposition with partial pivoting is given below:

auto lu(LUAlgorithm method, bool overwrite = false, T)(
          ref Matrix!(T) Ap)
{
  Matrix!(T) A;
  static if(overwrite)
  {
    A = Ap;
  }else{
    A = Ap.dup;
  }
  immutable auto m = A.nrow;
  immutable auto n = A.ncol;
  auto L = eye!(T)(m);
  auto U = zeros!(T)(m, n);
  /* 
    Within the algorithm, 
    pivots are stored in a vector
  */
  auto p = seq!(size_t)(0, m - 1, 1);
  long r = 0;
  for(long k = 0; k < n; ++k)
  {
    /*
      Getting the index for the appropriate row
      to be pivoted.

      The which function returns the index of the 
      item using pairwise comparisons
    */
    auto mu = which!((a, b) => abs(a) > abs(b))(A[k][r..$]);
    mu += r;
    if(A[mu, k] != 0)
    {
      //Row swaps
      auto tmp = A[mu, k..$];
      A[mu, k..$] = A[r, k..$];
      A[r, k..$] = tmp;
      if(r > 0)
      {
        tmp = L[mu, 0..r];
        L[mu, 0..r] = L[r, 0..r];
        L[r, 0..r] = tmp;
      }
      
      //Swaps for pivot vector
      auto swp = p[mu];
      p[mu] = p[r];
      p[r] = swp;
      
      //Calculations for L and U
      L[(r + 1)..$, r] = A[(r + 1)..$, k]/A[r, k];
      U[r, k..$] = A[r, k..$];
      for(long i = r + 1; i < m; ++i)
      {
        for(long j = k + 1; j < n; ++j)
        {
          A[i, j] -= L[i, r]*U[r, j];
        }
      }
      r += 1;
    }
  }
  /* 
    Recovering the permutation matrix from 
    the vector
  */
  auto P = zeros!(T)(m, m);
  for(long i = 0; i < m; ++i)
  {
    auto row = new T[m];
    row[] = 0;
    row[p[i]] = 1;
    P[i, 0..$] = matrix(row, 1, m);
  }
  return LU!(T)(P, L, U);
}

The object LU in the above code is simply:

struct LU(T)
if(isFloatingPoint!(T))
{
  Matrix!(T) P;
  Matrix!(T) L;
  Matrix!(T) U;
}

Implementation of Cholesky Decomposition

Below is an enumeration that denotes the two Cholesky decomposition variants that are implemented:

enum Cholesky{Vanilla, LDL}
alias Vanilla = Cholesky.Vanilla;
alias LDL = Cholesky.LDL;

The basic (Vanilla) cholesky decomposition is given below. It is written in such a way that \(L\) is written in the lower triangle of the matrix.

auto cholesky(Cholesky method, bool overwrite = false, T)(Matrix!(T) Ap)
if(method == Vanilla)
{
  immutable auto n = Ap.ncol;
  static if(overwrite)
  {
    Matrix!(T) A = Ap;
  }else{
    Matrix!(T) A = Ap.dup;
  }
  for(long j = 0; j < n; ++j)
  {
    for(long k = 0; k < j; ++k)
    {
      A[j, j] -= A[j, k]*A[j, k];
    }
    // Square root potentially causing issues
    // as discussed above
    A[j, j] = sqrt(A[j, j]);
    for(long i = (j + 1); i < n; ++i)
    {
      for(long k = 0; k < j; ++k)
      {
        A[i, j] -= A[i, k]*A[j, k];
      }
      A[i, j] /= A[j, j];
    }
  }
  //Collecting L for result
  Matrix!(T) L = eye!(T)(n);
  foreach(i; 0..n)
  {
    foreach(j; i..n)
    {
      L[j, i] = A[j, i];
    }
  }
  return L;
}

Below is the LDL variant - adapted from the Gaussian Elimination code:

auto cholesky(Cholesky method, bool overwrite = false, T)(Matrix!(T) A)
if(method == LDL)
{
  immutable auto n = A.ncol;
  static if(overwrite)
  {
    Matrix!(T) U = A;
  }else{
    Matrix!(T) U = A.dup;
  }
  T[] d = new T[n];
  // Similar to code in luSolve
  foreach(k; 0..(n - 1))
  {
    // Collect the diagonal
    d[k] = U[k, k];
    foreach(i; (k + 1)..n)
    {
      U[i, k] = U[i, k]/d[k];
      foreach(j; (k + 1)..n)
      {
        U[i, j] -= U[i, k]*U[k, j];
      }
    }
  }
  d[$ - 1] = U[$ - 1, $ - 1];
  // Gather elements for L
  auto L = eye!(T)(n);
  for(long i = 0; i < n; ++i)
  {
    for(long j = i; j < n; ++j)
    {
      if(i == j)
        continue;
      L[j, i] = U[j, i];
    }
  }
  
  /*
    This is a "voldermort type" in D. It is not available for
    construction outside this function.
  */
  struct Cholesky(T)
  {
    Matrix!(T) L;
    T[] d;
  }
  return Cholesky!(T)(L, d);
}

Some outputs

Here some numbers are put through the algorithms written for demonstration purposes. Random generated data:

auto A = createRandomMatrix!(double)(7, 7);
auto x = createRandomMatrix!(double)(7, 1);
auto b = gemm(A, x).array;
"A:\n".writeln(A);
"x:\n".writeln(x);
"b:\n".writeln(b);

output:

A:
Matrix(7 x 7)
0.78760895  0.16791495  0.51998612  0.90592727  0.35697508  0.89517028  0.63818117  
0.94644681  0.65230649  0.70400738  0.74370928  0.93061186  0.95699091  0.51571648  
0.7791607   0.021011612 0.95554588  0.88994508  0.54347486  0.20869079  0.67902032  
0.30326116  0.76147071  0.22517496  0.36843291  0.9923756   0.13921349  0.30278154  
0.85278273  0.15511374  0.053926307 0.6661747   0.94426026  0.8441975   0.68269395  
0.082958874 0.17310216  0.39019568  0.82435547  0.93347347  0.094591039 0.95515174  
0.25554523  0.69263844  0.52331036  0.32446879  0.45709359  0.32742983  0.49883607  

x:
Matrix(7 x 1)
0.40713286  
0.45848681  
0.79254078  
0.84800791  
0.17773008  
0.59358732  
0.060270934 

b:
[2.21126, 2.63757, 2.10024, 1.24075, 1.73605, 1.40107, 1.41717]

Next solve using Gaussian Elimination:

auto xOut = luSolve(A, b);
"x:\n".writeln(x.array);
"Gaussian Elimination Solver:\n".writeln(xOut);

output:

U:
Matrix(7 x 7)
0.7876     0.1679     0.52       0.9059     0.357      0.8952     0.6382     
0          0.4505     0.07916    -0.3449    0.5016     -0.1187    -0.2512    
0          0          0.4666     -0.1174    0.3519     -0.7151    -0.03321   
0          0          0          0.5286     0.1525     -0.1712    0.4386     
0          0          0          0          1.101      -1.055     0.3243     
0          0          0          0          0          1.027      0.1629     
0          0          0          0          0          0          0.3834     

L:
Matrix(7 x 7)
1          0          0          0          0          0          0          
1.202      1          0          0          0          0          0          
0.9893     -0.3221    1          0          0          0          0          
0.385      1.547      -0.2089    1          0          0          0          
1.083      -0.05925   -1.081     -0.8741    1          0          0          
0.1053     0.345      0.6603     1.751      0.2029     1          0          
0.3245     1.416      0.5196     1.097      -0.6534    0.07343    1          

x:
[0.407133, 0.458487, 0.792541, 0.848008, 0.17773, 0.593587, 0.0602709]
Gaussian Elimination Solver:
[0.407133, 0.458487, 0.792541, 0.848008, 0.17773, 0.593587, 0.0602709]

\(PA = LU\) decomposition:

auto plu = A.lu!(OuterProduct);
"PLU decomp: ".writeln(plu);
import std.algorithm: reduce;
//Comparing elements of PA with LU
"PLU error: ".writeln(
    mapply!((a1, a2) => pow(a1 - a2, 2.0))(
         gemm(plu.L, plu.U).array, 
         gemm(plu.P, A).array)
         .reduce!((a1, a2) => a1 + a2)
         .sqrt);

output:

PLU decomp: LU!double(Matrix(7 x 7)
0          1          0          0          0          0          0          
0          0          0          1          0          0          0          
0          0          0          0          1          0          0          
0          0          0          0          0          1          0          
1          0          0          0          0          0          0          
0          0          1          0          0          0          0          
0          0          0          0          0          0          1          
, Matrix(7 x 7)
1          0          0          0          0          0          0          
0.3204     1          0          0          0          0          0          
0.901      -0.7831    1          0          0          0          0          
0.08765    0.2098     -0.5658    1          0          0          0          
0.8322     -0.6786    0.1139     0.4626     1          0          0          
0.8232     -0.934     -0.6468    0.5876     -0.4156    1          0          
0.27       0.9349     -0.5745    0.07403    0.2895     -0.1709    1          
, Matrix(7 x 7)
0.9464     0.6523     0.704      0.7437     0.9306     0.957      0.5157     
0          0.5525     -0.0004036 0.1301     0.6942     -0.1674    0.1375     
0          0          -0.5807    0.09797    0.6494     -0.1492    0.3257     
0          0          0          0.7873     1.074      -0.03858   1.065      
0          0          0          0          -0.517     0.02       -0.2276    
0          0          0          0          0          -0.801     -0.127     
0          0          0          0          0          0          0.3834     
)
PLU error: 3.55513e-16

Outputs for Cholesky, firstly the “vanilla” solver:

A = createRandomMatrix!(double)(7, 7);
//Make the random matrix symmetric
A = gemm!(CblasTrans, CblasNoTrans)(A, A);
"A: \n".writeln(A);
auto L = cholesky!(Vanilla)(A);
"Cholesky: \n".writeln(L);
auto calcA = gemm!(CblasNoTrans, CblasTrans)(L, L);
"LL^T:\n".writeln(calcA);
//Comparing elements of A with LL^T
"Cholesky Error (Vanilla): ".writeln(
    map!((x1, x2) => abs(x1 - x2))(A.array, calcA.array)
    .reduce!((x1, x2) => x1 + x2)
);

output:

Cholesky: 
Matrix(7 x 7)
1.73       0          0          0          0          0          0          
1.327      0.3089     0          0          0          0          0          
1.239      -0.1535    0.5974     0          0          0          0          
1.804      0.4317     0.1312     0.6613     0          0          0          
1.1        0.1185     0.4336     -0.3596    0.3032     0          0          
1.529      0.2632     0.1535     0.5298     0.1423     0.3863     0          
1.501      0.4592     -0.1566    0.5664     0.4249     -0.3999    0.1552     

LL^T:
Matrix(7 x 7)
2.9932443   2.2959181   2.1435358   3.120934    1.9031914   2.6449977   2.5977037   
2.2959181   1.8564814   1.5967375   2.5272337   1.4964281   2.1101046   2.1343782   
2.1435358   1.5967375   1.9155117   2.2470581   1.6037649   1.9454282   1.696221    
3.120934    2.5272337   2.2470581   3.8950476   1.8545822   3.2419384   3.260778    
1.9031914   1.4964281   1.6037649   1.8545822   1.6334361   1.6321453   1.5633707   
2.6449977   2.1101046   1.9454282   3.2419384   1.6321453   2.880207    2.598331    
2.5977037   2.1343782   1.696221    3.260778    1.5633707   2.598331    3.1751308   

Cholesky Error (Vanilla): 2.44249e-15

then the LDL solver:

//We get the same A matrix from calculating LDL^T
"LDL:\n".writeln(A.cholesky!(LDL));

output:

A = LDL^T:
Matrix(7 x 7)
2.9932443   2.2959181   2.1435358   3.120934    1.9031914   2.6449977   2.5977037   
2.2959181   1.8564814   1.5967375   2.5272337   1.4964281   2.1101046   2.1343782   
2.1435358   1.5967375   1.9155117   2.2470581   1.6037649   1.9454282   1.696221    
3.120934    2.5272337   2.2470581   3.8950476   1.8545822   3.2419384   3.260778    
1.9031914   1.4964281   1.6037649   1.8545822   1.6334361   1.6321453   1.5633707   
2.6449977   2.1101046   1.9454282   3.2419384   1.6321453   2.880207    2.598331    
2.5977037   2.1343782   1.696221    3.260778    1.5633707   2.598331    3.1751308   

//This is the LDL object holding the lower matrix and d diagonal
LDL:
Cholesky!double(Matrix(7 x 7)
1          0          0          0          0          0          0          
0.767      1          0          0          0          0          0          
0.7161     -0.4969    1          0          0          0          0          
1.043      1.398      0.2196     1          0          0          0          
0.6358     0.3837     0.7258     -0.5438    1          0          0          
0.8837     0.8519     0.2569     0.801      0.4695     1          0          
0.8679     1.486      -0.2622    0.8564     1.401      -1.035     1          
, [2.99324, 0.0954357, 0.356905, 0.437381, 0.0919369, 0.149217, 0.0240929])

That’s it. Thank you for reading!