Sparse Matrices

This chapter describes functions for the construction and manipulation of sparse matrices, matrices which are populated primarily with zeros and contain only a few non-zero elements. Sparse matrices often appear in the solution of partial differential equations. It is beneficial to use specialized data structures and algorithms for storing and working with sparse matrices, since dense matrix algorithms and structures can be very slow and use huge amounts of memory when applied to sparse matrices.

The header file gsl_spmatrix.h contains the prototypes for the sparse matrix functions and related declarations.

Overview

These routines provide support for constructing and manipulating sparse matrices in GSL, using an API similar to gsl_matrix. The basic structure is called gsl_spmatrix. There are three supported storage formats for sparse matrices: the triplet, compressed column storage (CCS) and compressed row storage (CRS) formats. The triplet format stores triplets (i,j,x) for each non-zero element of the matrix. This notation means that the (i,j) element of the matrix A is A_{ij} = x. Compressed column storage stores each column of non-zero values in the sparse matrix in a continuous memory block, keeping pointers to the beginning of each column in that memory block, and storing the row indices of each non-zero element. Compressed row storage stores each row of non-zero values in a continuous memory block, keeping pointers to the beginning of each row in the block and storing the column indices of each non-zero element. The triplet format is ideal for adding elements to the sparse matrix structure while it is being constructed, while the compressed storage formats are better suited for matrix-matrix multiplication or linear solvers.

gsl_spmatrix

This structure is defined as:

typedef struct
{
  size_t size1;
  size_t size2;
  size_t *i;
  double *data;
  size_t *p;
  size_t nzmax;
  size_t nz;
  gsl_spmatrix_tree *tree_data;
  void *work;
  size_t sptype;
} gsl_spmatrix;

This defines a size1-by-size2 sparse matrix. The number of non-zero elements currently in the matrix is given by nz. For the triplet representation, i, p, and data are arrays of size nz which contain the row indices, column indices, and element value, respectively. So if data[k] = A(i,j), then i = i[k] and j = p[k].

For compressed column storage, i and data are arrays of size nz containing the row indices and element values, identical to the triplet case. p is an array of size size2 + 1 where p[j] points to the index in data of the start of column j. Thus, if data[k] = A(i,j), then i = i[k] and p[j] <= k < p[j+1].

For compressed row storage, i and data are arrays of size nz containing the column indices and element values, identical to the triplet case. p is an array of size size1 + 1 where p[i] points to the index in data of the start of row i. Thus, if data[k] = A(i,j), then j = i[k] and p[i] <= k < p[i+1].

The parameter tree_data is a binary tree structure used in the triplet representation, specifically a balanced AVL tree. This speeds up element searches and duplicate detection during the matrix assembly process. The parameter work is additional workspace needed for various operations like converting from triplet to compressed storage. sptype indicates the type of storage format being used (triplet, CCS or CRS).

The compressed storage format defined above makes it very simple to interface with sophisticated external linear solver libraries which accept compressed storage input. The user can simply pass the arrays i, p, and data as the inputs to external libraries.

Allocation

The functions for allocating memory for a sparse matrix follow the style of malloc() and free(). They also perform their own error checking. If there is insufficient memory available to allocate a matrix then the functions call the GSL error handler with an error code of GSL_ENOMEM in addition to returning a null pointer.

gsl_spmatrix * gsl_spmatrix_alloc(const size_t n1, const size_t n2)

This function allocates a sparse matrix of size n1-by-n2 and initializes it to all zeros. If the size of the matrix is not known at allocation time, both n1 and n2 may be set to 1, and they will automatically grow as elements are added to the matrix. This function sets the matrix to the triplet representation, which is the easiest for adding and accessing matrix elements. This function tries to make a reasonable guess for the number of non-zero elements (nzmax) which will be added to the matrix by assuming a sparse density of 10\%. The function gsl_spmatrix_alloc_nzmax() can be used if this number is known more accurately. The workspace is of size O(nzmax).

gsl_spmatrix * gsl_spmatrix_alloc_nzmax(const size_t n1, const size_t n2, const size_t nzmax, const size_t sptype)

This function allocates a sparse matrix of size n1-by-n2 and initializes it to all zeros. If the size of the matrix is not known at allocation time, both n1 and n2 may be set to 1, and they will automatically grow as elements are added to the matrix. The parameter nzmax specifies the maximum number of non-zero elements which will be added to the matrix. It does not need to be precisely known in advance, since storage space will automatically grow using gsl_spmatrix_realloc() if nzmax is not large enough. Accurate knowledge of this parameter reduces the number of reallocation calls required. The parameter sptype specifies the storage format of the sparse matrix. Possible values are

GSL_SPMATRIX_TRIPLET

This flag specifies triplet storage.

GSL_SPMATRIX_CCS

This flag specifies compressed column storage.

GSL_SPMATRIX_CRS

This flag specifies compressed row storage.

The allocated gsl_spmatrix structure is of size O(nzmax).

int gsl_spmatrix_realloc(const size_t nzmax, gsl_spmatrix * m)

This function reallocates the storage space for m to accomodate nzmax non-zero elements. It is typically called internally by gsl_spmatrix_set() if the user wants to add more elements to the sparse matrix than the previously specified nzmax.

void gsl_spmatrix_free(gsl_spmatrix * m)

This function frees the memory associated with the sparse matrix m.

Accessing Matrix Elements

double gsl_spmatrix_get(const gsl_spmatrix * m, const size_t i, const size_t j)

This function returns element (i, j) of the matrix m. The matrix may be in triplet or compressed format.

int gsl_spmatrix_set(gsl_spmatrix * m, const size_t i, const size_t j, const double x)

This function sets element (i, j) of the matrix m to the value x. The matrix must be in triplet representation.

double * gsl_spmatrix_ptr(gsl_spmatrix * m, const size_t i, const size_t j)

This function returns a pointer to the (i, j) element of the matrix m. If the (i, j) element is not explicitly stored in the matrix, a null pointer is returned.

Initializing Matrix Elements

Since the sparse matrix format only stores the non-zero elements, it is automatically initialized to zero upon allocation. The function gsl_spmatrix_set_zero() may be used to re-initialize a matrix to zero after elements have been added to it.

int gsl_spmatrix_set_zero(gsl_spmatrix * m)

This function sets (or resets) all the elements of the matrix m to zero.

Reading and Writing Matrices

int gsl_spmatrix_fwrite(FILE * stream, const gsl_spmatrix * m)

This function writes the elements of the matrix m to the stream stream in binary format. The return value is 0 for success and GSL_EFAILED if there was a problem writing to the file. Since the data is written in the native binary format it may not be portable between different architectures.

int gsl_spmatrix_fread(FILE * stream, gsl_spmatrix * m)

This function reads into the matrix m from the open stream stream in binary format. The matrix m must be preallocated with the correct storage format, dimensions and have a sufficiently large nzmax in order to read in all matrix elements, otherwise GSL_EBADLEN is returned. The return value is 0 for success and GSL_EFAILED if there was a problem reading from the file. The data is assumed to have been written in the native binary format on the same architecture.

int gsl_spmatrix_fprintf(FILE * stream, const gsl_spmatrix * m, const char * format)

This function writes the elements of the matrix m line-by-line to the stream stream using the format specifier format, which should be one of the %g, %e or %f formats for floating point numbers. The function returns 0 for success and GSL_EFAILED if there was a problem writing to the file. The input matrix m may be in any storage format, and the output file will be written in MatrixMarket format.

gsl_spmatrix * gsl_spmatrix_fscanf(FILE * stream)

This function reads sparse matrix data in the MatrixMarket format from the stream stream and stores it in a newly allocated matrix which is returned in triplet format. The function returns 0 for success and GSL_EFAILED if there was a problem reading from the file. The user should free the returned matrix when it is no longer needed.

Copying Matrices

int gsl_spmatrix_memcpy(gsl_spmatrix * dest, const gsl_spmatrix * src)

This function copies the elements of the sparse matrix src into dest. The two matrices must have the same dimensions and be in the same storage format.

Exchanging Rows and Columns

int gsl_spmatrix_transpose_memcpy(gsl_spmatrix * dest, const gsl_spmatrix * src)

This function copies the transpose of the sparse matrix src into dest. The dimensions of dest must match the transpose of the matrix src. Also, both matrices must use the same sparse storage format.

int gsl_spmatrix_transpose(gsl_spmatrix * m)

This function replaces the matrix m by its transpose, preserving the storage format of the input matrix. Currently, only triplet matrix inputs are supported.

int gsl_spmatrix_transpose2(gsl_spmatrix * m)

This function replaces the matrix m by its transpose, but changes the storage format for compressed matrix inputs. Since compressed column storage is the transpose of compressed row storage, this function simply converts a CCS matrix to CRS and vice versa. This is the most efficient way to transpose a compressed storage matrix, but the user should note that the storage format of their compressed matrix will change on output. For triplet matrices, the output matrix is also in triplet storage.

Matrix Operations

int gsl_spmatrix_add(gsl_spmatrix * c, const gsl_spmatrix * a, const gsl_spmatrix * b)

This function computes the sum c = a + b. The three matrices must have the same dimensions and be stored in a compressed format.

int gsl_spmatrix_scale(gsl_spmatrix * m, const double x)

This function scales all elements of the matrix m by the constant factor x. The result m(i,j) \leftarrow x m(i,j) is stored in m.

Matrix Properties

size_t gsl_spmatrix_nnz(const gsl_spmatrix * m)

This function returns the number of non-zero elements in m.

int gsl_spmatrix_equal(const gsl_spmatrix * a, const gsl_spmatrix * b)

This function returns 1 if the matrices a and b are equal (by comparison of element values) and 0 otherwise. The matrices a and b must be in the same sparse storage format for comparison.

Finding Maximum and Minimum Elements

int gsl_spmatrix_minmax(const gsl_spmatrix * m, double * min_out, double * max_out)

This function returns the minimum and maximum elements of the matrix m, storing them in min_out and max_out, and searching only the non-zero values.

Compressed Format

GSL supports compressed column storage (CCS) and compressed row storage (CRS) formats.

gsl_spmatrix * gsl_spmatrix_ccs(const gsl_spmatrix * T)

This function creates a sparse matrix in compressed column format from the input sparse matrix T which must be in triplet format. A pointer to a newly allocated matrix is returned. The calling function should free the newly allocated matrix when it is no longer needed.

gsl_spmatrix * gsl_spmatrix_crs(const gsl_spmatrix * T)

This function creates a sparse matrix in compressed row format from the input sparse matrix T which must be in triplet format. A pointer to a newly allocated matrix is returned. The calling function should free the newly allocated matrix when it is no longer needed.

Conversion Between Sparse and Dense Matrices

The gsl_spmatrix structure can be converted into the dense gsl_matrix format and vice versa with the following routines.

int gsl_spmatrix_d2sp(gsl_spmatrix * S, const gsl_matrix * A)

This function converts the dense matrix A into sparse triplet format and stores the result in S.

int gsl_spmatrix_sp2d(gsl_matrix * A, const gsl_spmatrix * S)

This function converts the sparse matrix S into a dense matrix and stores the result in A. S must be in triplet format.

Examples

The following example program builds a 5-by-4 sparse matrix and prints it in triplet, compressed column, and compressed row format. The matrix which is constructed is

\left(
  \begin{array}{cccc}
    0 & 0 & 3.1 & 4.6 \\
    1 & 0 & 7.2 & 0 \\
    0 & 0 & 0 & 0 \\
    2.1 & 2.9 & 0 & 8.5 \\
    4.1 & 0 & 0 & 0
  \end{array}
\right)

The output of the program is:

printing all matrix elements:
A(0,0) = 0
A(0,1) = 0
A(0,2) = 3.1
A(0,3) = 4.6
A(1,0) = 1
.
.
.
A(4,0) = 4.1
A(4,1) = 0
A(4,2) = 0
A(4,3) = 0
matrix in triplet format (i,j,Aij):
(0, 2, 3.1)
(0, 3, 4.6)
(1, 0, 1.0)
(1, 2, 7.2)
(3, 0, 2.1)
(3, 1, 2.9)
(3, 3, 8.5)
(4, 0, 4.1)
matrix in compressed column format:
i = [ 1, 3, 4, 3, 0, 1, 0, 3, ]
p = [ 0, 3, 4, 6, 8, ]
d = [ 1, 2.1, 4.1, 2.9, 3.1, 7.2, 4.6, 8.5, ]
matrix in compressed row format:
i = [ 2, 3, 0, 2, 0, 1, 3, 0, ]
p = [ 0, 2, 4, 4, 7, 8, ]
d = [ 3.1, 4.6, 1, 7.2, 2.1, 2.9, 8.5, 4.1, ]

We see in the compressed column output, the data array stores each column contiguously, the array i stores the row index of the corresponding data element, and the array p stores the index of the start of each column in the data array. Similarly, for the compressed row output, the data array stores each row contiguously, the array i stores the column index of the corresponding data element, and the p array stores the index of the start of each row in the data array.

#include <stdio.h>
#include <stdlib.h>

#include <gsl/gsl_spmatrix.h>

int
main()
{
  gsl_spmatrix *A = gsl_spmatrix_alloc(5, 4); /* triplet format */
  gsl_spmatrix *B, *C;
  size_t i, j;

  /* build the sparse matrix */
  gsl_spmatrix_set(A, 0, 2, 3.1);
  gsl_spmatrix_set(A, 0, 3, 4.6);
  gsl_spmatrix_set(A, 1, 0, 1.0);
  gsl_spmatrix_set(A, 1, 2, 7.2);
  gsl_spmatrix_set(A, 3, 0, 2.1);
  gsl_spmatrix_set(A, 3, 1, 2.9);
  gsl_spmatrix_set(A, 3, 3, 8.5);
  gsl_spmatrix_set(A, 4, 0, 4.1);

  printf("printing all matrix elements:\n");
  for (i = 0; i < 5; ++i)
    for (j = 0; j < 4; ++j)
      printf("A(%zu,%zu) = %g\n", i, j,
             gsl_spmatrix_get(A, i, j));

  /* print out elements in triplet format */
  printf("matrix in triplet format (i,j,Aij):\n");
  gsl_spmatrix_fprintf(stdout, A, "%.1f");

  /* convert to compressed column format */
  B = gsl_spmatrix_ccs(A);

  printf("matrix in compressed column format:\n");
  printf("i = [ ");
  for (i = 0; i < B->nz; ++i)
    printf("%zu, ", B->i[i]);
  printf("]\n");

  printf("p = [ ");
  for (i = 0; i < B->size2 + 1; ++i)
    printf("%zu, ", B->p[i]);
  printf("]\n");

  printf("d = [ ");
  for (i = 0; i < B->nz; ++i)
    printf("%g, ", B->data[i]);
  printf("]\n");

  /* convert to compressed row format */
  C = gsl_spmatrix_crs(A);

  printf("matrix in compressed row format:\n");
  printf("i = [ ");
  for (i = 0; i < C->nz; ++i)
    printf("%zu, ", C->i[i]);
  printf("]\n");

  printf("p = [ ");
  for (i = 0; i < C->size1 + 1; ++i)
    printf("%zu, ", C->p[i]);
  printf("]\n");

  printf("d = [ ");
  for (i = 0; i < C->nz; ++i)
    printf("%g, ", C->data[i]);
  printf("]\n");

  gsl_spmatrix_free(A);
  gsl_spmatrix_free(B);
  gsl_spmatrix_free(C);

  return 0;
}

References and Further Reading

The algorithms used by these functions are described in the following sources,