Sparse Matrices

This chapter describes functions for the construction and manipulation of sparse matrices, matrices which are populated primarily with zeros and contain only a few non-zero elements. Sparse matrices often appear in the solution of partial differential equations. It is beneficial to use specialized data structures and algorithms for storing and working with sparse matrices, since dense matrix algorithms and structures can be prohibitively slow and use huge amounts of memory when applied to sparse matrices.

The header file gsl_spmatrix.h contains the prototypes for the sparse matrix functions and related declarations.

Data types

All the functions are available for each of the standard data-types. The versions for double have the prefix gsl_spmatrix, Similarly the versions for single-precision float arrays have the prefix gsl_spmatrix_float. The full list of available types is given below,

Prefix

Type

gsl_spmatrix

double

gsl_spmatrix_float

float

gsl_spmatrix_long_double

long double

gsl_spmatrix_int

int

gsl_spmatrix_uint

unsigned int

gsl_spmatrix_long

long

gsl_spmatrix_ulong

unsigned long

gsl_spmatrix_short

short

gsl_spmatrix_ushort

unsigned short

gsl_spmatrix_char

char

gsl_spmatrix_uchar

unsigned char

gsl_spmatrix_complex

complex double

gsl_spmatrix_complex_float

complex float

gsl_spmatrix_complex_long_double

complex long double

Sparse Matrix Storage Formats

GSL currently supports three storage formats for sparse matrices: the coordinate (COO) representation, compressed sparse column (CSC) and compressed sparse row (CSR) formats. These are discussed in more detail below. In order to illustrate the different storage formats, the following sections will reference this M-by-N sparse matrix, with M=4 and N=5:

\begin{pmatrix}
  9 & 0 & 0 & 0 & -3 \\
  4 & 7 & 0 & 0 & 0 \\
  0 & 8 & -1 & 8 & 0 \\
  4 & 0 & 5 & 6 & 0
\end{pmatrix}

The number of non-zero elements in the matrix, also abbreviated as nnz is equal to 10 in this case.

Coordinate Storage (COO)

The coordinate storage format, also known as triplet format, stores triplets (i,j,x) for each non-zero element of the matrix. This notation means that the (i,j) element of the matrix A is A_{ij} = x. The matrix is stored using three arrays of the same length, representing the row indices, column indices, and matrix data. For the reference matrix above, one possible storage format is:

data

9

7

4

8

-3

-1

8

5

6

4

row

0

1

1

2

0

2

2

3

3

3

col

0

1

0

1

4

2

3

2

3

0

Note that this representation is not unique - the coordinate triplets may appear in any ordering and would still represent the same sparse matrix. The length of the three arrays is equal to the number of non-zero elements in the matrix, nnz, which in this case is 10. The coordinate format is extremely convenient for sparse matrix assembly, the process of adding new elements, or changing existing elements, in a sparse matrix. However, it is generally not suitable for the efficient implementation of matrix-matrix products, or matrix factorization algorithms. For these applications it is better to use one of the compressed formats discussed below.

In order to faciliate efficient sparse matrix assembly, GSL stores the coordinate data in a balanced binary search tree, specifically an AVL tree, in addition to the three arrays discussed above. This allows GSL to efficiently determine whether an entry (i,j) already exists in the matrix, and to replace an existing matrix entry with a new value, without needing to search unsorted arrays.

Compressed Sparse Column (CSC)

Compressed sparse column storage stores each column of non-zero values in the sparse matrix in a continuous memory block, keeping pointers to the beginning of each column in that memory block, and storing the row indices of each non-zero element. For the reference matrix above, these arrays look like

data

9

4

4

7

8

-1

5

8

6

-3

row

0

1

3

1

2

2

3

2

3

0

col_ptr

0

3

5

7

9

10

The data and row arrays are of length nnz and are the same as the COO storage format. The col_ptr array has length N+1, and col_ptr[j] gives the index in data of the start of column j. Therefore, the j-th column of the matrix is stored in data[col_ptr[j]], data[col_ptr[j] + 1], …, data[col_ptr[j+1] - 1]. The last element of col_ptr is nnz.

Compressed Sparse Row (CSR)

Compressed row storage stores each row of non-zero values in a continuous memory block, keeping pointers to the beginning of each row in the block and storing the column indices of each non-zero element. For the reference matrix above, these arrays look like

data

9

-3

4

7

8

-1

8

4

5

6

col

0

4

0

1

1

2

3

0

2

3

row_ptr

0

2

4

7

10

The data and col arrays are of length nnz and are the same as the COO storage format. The row_ptr array has length M+1, and row_ptr[i] gives the index in data of the start of row i. Therefore, the i-th row of the matrix is stored in data[row_ptr[i]], data[row_ptr[i] + 1], …, data[row_ptr[i+1] - 1]. The last element of row_ptr is nnz.

Overview

These routines provide support for constructing and manipulating sparse matrices in GSL, using an API similar to gsl_matrix. The basic structure is called gsl_spmatrix.

type gsl_spmatrix

This structure is defined as:

typedef struct
{
  size_t size1;
  size_t size2;
  int *i;
  double *data;
  int *p;
  size_t nzmax;
  size_t nz;
  [ ... variables for binary tree and memory management ... ]
  size_t sptype;
} gsl_spmatrix;

This defines a size1-by-size2 sparse matrix. The number of non-zero elements currently in the matrix is given by nz. For the triplet representation, i, p, and data are arrays of size nz which contain the row indices, column indices, and element value, respectively. So if data[k] = A(i,j), then i = i[k] and j = p[k].

For compressed column storage, i and data are arrays of size nz containing the row indices and element values, identical to the triplet case. p is an array of size size2 + 1 where p[j] points to the index in data of the start of column j. Thus, if data[k] = A(i,j), then i = i[k] and p[j] <= k < p[j+1].

For compressed row storage, i and data are arrays of size nz containing the column indices and element values, identical to the triplet case. p is an array of size size1 + 1 where p[i] points to the index in data of the start of row i. Thus, if data[k] = A(i,j), then j = i[k] and p[i] <= k < p[i+1].

There are additional variables in the gsl_spmatrix structure related to binary tree storage and memory management. The GSL implementation of sparse matrices uses balanced AVL trees to sort matrix elements in the triplet representation. This speeds up element searches and duplicate detection during the matrix assembly process. The gsl_spmatrix structure also contains additional workspace variables needed for various operations like converting from triplet to compressed storage. sptype indicates the type of storage format being used (COO, CSC or CSR).

The compressed storage format defined above makes it very simple to interface with sophisticated external linear solver libraries which accept compressed storage input. The user can simply pass the arrays i, p, and data as the inputs to external libraries.

Allocation

The functions for allocating memory for a sparse matrix follow the style of malloc() and free(). They also perform their own error checking. If there is insufficient memory available to allocate a matrix then the functions call the GSL error handler with an error code of GSL_ENOMEM in addition to returning a null pointer.

gsl_spmatrix *gsl_spmatrix_alloc(const size_t n1, const size_t n2)

This function allocates a sparse matrix of size n1-by-n2 and initializes it to all zeros. If the size of the matrix is not known at allocation time, both n1 and n2 may be set to 1, and they will automatically grow as elements are added to the matrix. This function sets the matrix to the triplet representation, which is the easiest for adding and accessing matrix elements. This function tries to make a reasonable guess for the number of non-zero elements (nzmax) which will be added to the matrix by assuming a sparse density of 10\%. The function gsl_spmatrix_alloc_nzmax() can be used if this number is known more accurately. The workspace is of size O(nzmax).

gsl_spmatrix *gsl_spmatrix_alloc_nzmax(const size_t n1, const size_t n2, const size_t nzmax, const size_t sptype)

This function allocates a sparse matrix of size n1-by-n2 and initializes it to all zeros. If the size of the matrix is not known at allocation time, both n1 and n2 may be set to 1, and they will automatically grow as elements are added to the matrix. The parameter nzmax specifies the maximum number of non-zero elements which will be added to the matrix. It does not need to be precisely known in advance, since storage space will automatically grow using gsl_spmatrix_realloc() if nzmax is not large enough. Accurate knowledge of this parameter reduces the number of reallocation calls required. The parameter sptype specifies the storage format of the sparse matrix. Possible values are

GSL_SPMATRIX_COO

This flag specifies coordinate (triplet) storage.

GSL_SPMATRIX_CSC

This flag specifies compressed sparse column storage.

GSL_SPMATRIX_CSR

This flag specifies compressed sparse row storage.

The allocated gsl_spmatrix structure is of size O(nzmax).

int gsl_spmatrix_realloc(const size_t nzmax, gsl_spmatrix *m)

This function reallocates the storage space for m to accomodate nzmax non-zero elements. It is typically called internally by gsl_spmatrix_set() if the user wants to add more elements to the sparse matrix than the previously specified nzmax.

Input matrix formats supported: COO, CSC, CSR

void gsl_spmatrix_free(gsl_spmatrix *m)

This function frees the memory associated with the sparse matrix m.

Input matrix formats supported: COO, CSC, CSR

Accessing Matrix Elements

double gsl_spmatrix_get(const gsl_spmatrix *m, const size_t i, const size_t j)

This function returns element (i, j) of the matrix m.

Input matrix formats supported: COO, CSC, CSR

int gsl_spmatrix_set(gsl_spmatrix *m, const size_t i, const size_t j, const double x)

This function sets element (i, j) of the matrix m to the value x.

Input matrix formats supported: COO

double *gsl_spmatrix_ptr(gsl_spmatrix *m, const size_t i, const size_t j)

This function returns a pointer to the (i, j) element of the matrix m. If the (i, j) element is not explicitly stored in the matrix, a null pointer is returned.

Input matrix formats supported: COO, CSC, CSR

Initializing Matrix Elements

Since the sparse matrix format only stores the non-zero elements, it is automatically initialized to zero upon allocation. The function gsl_spmatrix_set_zero() may be used to re-initialize a matrix to zero after elements have been added to it.

int gsl_spmatrix_set_zero(gsl_spmatrix *m)

This function sets (or resets) all the elements of the matrix m to zero. For CSC and CSR matrices, the cost of this operation is O(1). For COO matrices, the binary tree structure must be dismantled, so the cost is O(nz).

Input matrix formats supported: COO, CSC, CSR

Reading and Writing Matrices

int gsl_spmatrix_fwrite(FILE *stream, const gsl_spmatrix *m)

This function writes the elements of the matrix m to the stream stream in binary format. The return value is 0 for success and GSL_EFAILED if there was a problem writing to the file. Since the data is written in the native binary format it may not be portable between different architectures.

Input matrix formats supported: COO, CSC, CSR

int gsl_spmatrix_fread(FILE *stream, gsl_spmatrix *m)

This function reads into the matrix m from the open stream stream in binary format. The matrix m must be preallocated with the correct storage format, dimensions and have a sufficiently large nzmax in order to read in all matrix elements, otherwise GSL_EBADLEN is returned. The return value is 0 for success and GSL_EFAILED if there was a problem reading from the file. The data is assumed to have been written in the native binary format on the same architecture.

Input matrix formats supported: COO, CSC, CSR

int gsl_spmatrix_fprintf(FILE *stream, const gsl_spmatrix *m, const char *format)

This function writes the elements of the matrix m line-by-line to the stream stream using the format specifier format, which should be one of the %g, %e or %f formats for floating point numbers. The function returns 0 for success and GSL_EFAILED if there was a problem writing to the file. The input matrix m may be in any storage format, and the output file will be written in MatrixMarket format.

Input matrix formats supported: COO, CSC, CSR

gsl_spmatrix *gsl_spmatrix_fscanf(FILE *stream)

This function reads sparse matrix data in the MatrixMarket format from the stream stream and stores it in a newly allocated matrix which is returned in COO format. The function returns 0 for success and GSL_EFAILED if there was a problem reading from the file. The user should free the returned matrix when it is no longer needed.

Copying Matrices

int gsl_spmatrix_memcpy(gsl_spmatrix *dest, const gsl_spmatrix *src)

This function copies the elements of the sparse matrix src into dest. The two matrices must have the same dimensions and be in the same storage format.

Input matrix formats supported: COO, CSC, CSR

Exchanging Rows and Columns

int gsl_spmatrix_transpose_memcpy(gsl_spmatrix *dest, const gsl_spmatrix *src)

This function copies the transpose of the sparse matrix src into dest. The dimensions of dest must match the transpose of the matrix src. Also, both matrices must use the same sparse storage format.

Input matrix formats supported: COO, CSC, CSR

int gsl_spmatrix_transpose(gsl_spmatrix *m)

This function replaces the matrix m by its transpose, but changes the storage format for compressed matrix inputs. Since compressed column storage is the transpose of compressed row storage, this function simply converts a CSC matrix to CSR and vice versa. This is the most efficient way to transpose a compressed storage matrix, but the user should note that the storage format of their compressed matrix will change on output. For COO matrix inputs, the output matrix is also in COO storage.

Input matrix formats supported: COO, CSC, CSR

Matrix Operations

int gsl_spmatrix_scale(gsl_spmatrix *m, const double x)

This function scales all elements of the matrix m by the constant factor x. The result m(i,j) \leftarrow x m(i,j) is stored in m.

Input matrix formats supported: COO, CSC, CSR

int gsl_spmatrix_scale_columns(gsl_spmatrix *A, const gsl_vector *x)

This function scales the columns of the M-by-N sparse matrix A by the elements of the vector x, of length N. The j-th column of A is multiplied by x[j]. This is equivalent to forming

A \rightarrow A X

where X = \textrm{diag}(x).

Input matrix formats supported: COO, CSC, CSR

int gsl_spmatrix_scale_rows(gsl_spmatrix *A, const gsl_vector *x)

This function scales the rows of the M-by-N sparse matrix A by the elements of the vector x, of length M. The i-th row of A is multiplied by x[i]. This is equivalent to forming

A \rightarrow X A

where X = \textrm{diag}(x).

Input matrix formats supported: COO, CSC, CSR

int gsl_spmatrix_add(gsl_spmatrix *c, const gsl_spmatrix *a, const gsl_spmatrix *b)

This function computes the sum c = a + b. The three matrices must have the same dimensions.

Input matrix formats supported: CSC, CSR

int gsl_spmatrix_dense_add(gsl_matrix *a, const gsl_spmatrix *b)

This function adds the elements of the sparse matrix b to the elements of the dense matrix a. The result a(i,j) \leftarrow a(i,j) + b(i,j) is stored in a and b remains unchanged. The two matrices must have the same dimensions.

Input matrix formats supported: COO, CSC, CSR

int gsl_spmatrix_dense_sub(gsl_matrix *a, const gsl_spmatrix *b)

This function subtracts the elements of the sparse matrix b from the elements of the dense matrix a. The result a(i,j) \leftarrow a(i,j) - b(i,j) is stored in a and b remains unchanged. The two matrices must have the same dimensions.

Input matrix formats supported: COO, CSC, CSR

Matrix Properties

const char *gsl_spmatrix_type(const gsl_spmatrix *m)

This function returns a string describing the sparse storage format of the matrix m. For example:

printf ("matrix is '%s' format.\n", gsl_spmatrix_type (m));

would print something like:

matrix is 'CSR' format.

Input matrix formats supported: COO, CSC, CSR

size_t gsl_spmatrix_nnz(const gsl_spmatrix *m)

This function returns the number of non-zero elements in m.

Input matrix formats supported: COO, CSC, CSR

int gsl_spmatrix_equal(const gsl_spmatrix *a, const gsl_spmatrix *b)

This function returns 1 if the matrices a and b are equal (by comparison of element values) and 0 otherwise. The matrices a and b must be in the same sparse storage format for comparison.

Input matrix formats supported: COO, CSC, CSR

double gsl_spmatrix_norm1(const gsl_spmatrix *A)

This function returns the 1-norm of the m-by-n matrix A, defined as the maximum column sum,

||A||_1 = \textrm{max}_{1 \le j \le n} \sum_{i=1}^m |A_{ij}|

Input matrix formats supported: COO, CSC, CSR

Finding Maximum and Minimum Elements

int gsl_spmatrix_minmax(const gsl_spmatrix *m, double *min_out, double *max_out)

This function returns the minimum and maximum elements of the matrix m, storing them in min_out and max_out, and searching only the non-zero values.

Input matrix formats supported: COO, CSC, CSR

int gsl_spmatrix_min_index(const gsl_spmatrix *m, size_t *imin, size_t *jmin)

This function returns the indices of the minimum value in the matrix m, searching only the non-zero values, and storing them in imin and jmin. When there are several equal minimum elements then the first element found is returned.

Input matrix formats supported: COO, CSC, CSR

Compressed Format

These routines calculate a compressed matrix from a coordinate representation.

int gsl_spmatrix_csc(gsl_spmatrix *dest, const gsl_spmatrix *src)

This function creates a sparse matrix in compressed sparse column format from the input sparse matrix src which must be in COO format. The compressed matrix is stored in dest.

Input matrix formats supported: COO

int gsl_spmatrix_csr(gsl_spmatrix *dest, const gsl_spmatrix *src)

This function creates a sparse matrix in compressed sparse row format from the input sparse matrix src which must be in COO format. The compressed matrix is stored in dest.

Input matrix formats supported: COO

gsl_spmatrix *gsl_spmatrix_compress(const gsl_spmatrix *src, const int sptype)

This function allocates a new sparse matrix, and stores src into it using the format specified by sptype. The input sptype can be one of GSL_SPMATRIX_COO, GSL_SPMATRIX_CSC, or GSL_SPMATRIX_CSR. A pointer to the newly allocated matrix is returned, and must be freed by the caller when no longer needed.

Conversion Between Sparse and Dense Matrices

The gsl_spmatrix structure can be converted into the dense gsl_matrix format and vice versa with the following routines.

int gsl_spmatrix_d2sp(gsl_spmatrix *S, const gsl_matrix *A)

This function converts the dense matrix A into sparse COO format and stores the result in S.

Input matrix formats supported: COO

int gsl_spmatrix_sp2d(gsl_matrix *A, const gsl_spmatrix *S)

This function converts the sparse matrix S into a dense matrix and stores the result in A.

Input matrix formats supported: COO, CSC, CSR

Examples

The following example program builds a 5-by-4 sparse matrix and prints it in coordinate, compressed column, and compressed row format. The matrix which is constructed is

\left(
  \begin{array}{cccc}
    0 & 0 & 3.1 & 4.6 \\
    1 & 0 & 7.2 & 0 \\
    0 & 0 & 0 & 0 \\
    2.1 & 2.9 & 0 & 8.5 \\
    4.1 & 0 & 0 & 0
  \end{array}
\right)

The output of the program is:

printing all matrix elements:
A(0,0) = 0
A(0,1) = 0
A(0,2) = 3.1
A(0,3) = 4.6
A(1,0) = 1
.
.
.
A(4,0) = 4.1
A(4,1) = 0
A(4,2) = 0
A(4,3) = 0
matrix in triplet format (i,j,Aij):
(0, 2, 3.1)
(0, 3, 4.6)
(1, 0, 1.0)
(1, 2, 7.2)
(3, 0, 2.1)
(3, 1, 2.9)
(3, 3, 8.5)
(4, 0, 4.1)
matrix in compressed column format:
i = [ 1, 3, 4, 3, 0, 1, 0, 3, ]
p = [ 0, 3, 4, 6, 8, ]
d = [ 1, 2.1, 4.1, 2.9, 3.1, 7.2, 4.6, 8.5, ]
matrix in compressed row format:
i = [ 2, 3, 0, 2, 0, 1, 3, 0, ]
p = [ 0, 2, 4, 4, 7, 8, ]
d = [ 3.1, 4.6, 1, 7.2, 2.1, 2.9, 8.5, 4.1, ]

We see in the compressed column output, the data array stores each column contiguously, the array i stores the row index of the corresponding data element, and the array p stores the index of the start of each column in the data array. Similarly, for the compressed row output, the data array stores each row contiguously, the array i stores the column index of the corresponding data element, and the p array stores the index of the start of each row in the data array.

#include <stdio.h>
#include <stdlib.h>

#include <gsl/gsl_spmatrix.h>

int
main()
{
  gsl_spmatrix *A = gsl_spmatrix_alloc(5, 4); /* triplet format */
  gsl_spmatrix *B, *C;
  size_t i, j;

  /* build the sparse matrix */
  gsl_spmatrix_set(A, 0, 2, 3.1);
  gsl_spmatrix_set(A, 0, 3, 4.6);
  gsl_spmatrix_set(A, 1, 0, 1.0);
  gsl_spmatrix_set(A, 1, 2, 7.2);
  gsl_spmatrix_set(A, 3, 0, 2.1);
  gsl_spmatrix_set(A, 3, 1, 2.9);
  gsl_spmatrix_set(A, 3, 3, 8.5);
  gsl_spmatrix_set(A, 4, 0, 4.1);

  printf("printing all matrix elements:\n");
  for (i = 0; i < 5; ++i)
    for (j = 0; j < 4; ++j)
      printf("A(%zu,%zu) = %g\n", i, j,
             gsl_spmatrix_get(A, i, j));

  /* print out elements in triplet format */
  printf("matrix in triplet format (i,j,Aij):\n");
  gsl_spmatrix_fprintf(stdout, A, "%.1f");

  /* convert to compressed column format */
  B = gsl_spmatrix_ccs(A);

  printf("matrix in compressed column format:\n");
  printf("i = [ ");
  for (i = 0; i < B->nz; ++i)
    printf("%d, ", B->i[i]);
  printf("]\n");

  printf("p = [ ");
  for (i = 0; i < B->size2 + 1; ++i)
    printf("%d, ", B->p[i]);
  printf("]\n");

  printf("d = [ ");
  for (i = 0; i < B->nz; ++i)
    printf("%g, ", B->data[i]);
  printf("]\n");

  /* convert to compressed row format */
  C = gsl_spmatrix_crs(A);

  printf("matrix in compressed row format:\n");
  printf("i = [ ");
  for (i = 0; i < C->nz; ++i)
    printf("%d, ", C->i[i]);
  printf("]\n");

  printf("p = [ ");
  for (i = 0; i < C->size1 + 1; ++i)
    printf("%d, ", C->p[i]);
  printf("]\n");

  printf("d = [ ");
  for (i = 0; i < C->nz; ++i)
    printf("%g, ", C->data[i]);
  printf("]\n");

  gsl_spmatrix_free(A);
  gsl_spmatrix_free(B);
  gsl_spmatrix_free(C);

  return 0;
}

References and Further Reading

The algorithms used by these functions are described in the following sources,