Efficient sparse storage of matrices

History

Traditionally a complete data structure (full matrices) has been used for the matrices appearing in Hartree-Fock and DFT implementations. However, by using this approach linear scaling can never be achieved since the storage grows quadratically with system size.

Challacombe presented a new data structure where the matrices are divided into atom blocks and sparsity is considered on the block level (Comp. Phys. Comm. 128, 93 (2000), J. Chem. Phys. 110, 2332 (1999)). This requires that all basis functions on a single atom are grouped together into a single block. In this way, the access overhead is reduced since one can address entire blocks rather than single elements. Another advantage is the ability to use the highly optimized Basic Linear Algebra Subprograms (BLAS) for basic submatrix manipulations rather than to operate on single matrix elements. Later on, it was observed that the submatrices coming from single-atom blocks would be too small to reach peak performance of BLAS routines. Therefore a multi-atom blocked matrix data structure was proposed and the performance increased (J. Comp. Chem. 24, 618 (2003)). It should be noted that this method exposes a trade-off. On one hand, one wants to have as large blocks as possible to reduce the overhead, on the other -- since the screening is done on a per-block basis -- large blocks will store more redundant zero elements.

Hierarchic data structure

In all reports above the common compressed sparse row (CSR) representation was used as underlying data structure for the blocked sparse representation. However, the optimal blocksize for the submatrices is usually large enough for the overhead in the data structure to become of secondary importance. Therefore it is possible to use a simpler underlying data structure than the CSR to make the implementation of matrix manipulations significantly faster, easier and more maintainable, without sacrificing performance.

In the hierarchic data structure we use a simple column-wise representation for the storage of the submatrices. This data structure can also be used to obtain several levels in the hierarchic representation. This is achieved by the use of a generic programming approach in the C++ programming language where the type of the matrix elements is left open until compile time. This means that a matrix can contain matrices instead of real numbers and an arbitrary number of levels in the hierarchy can be obtained. In the figure, for example, a three level hierarchy is illustrated with a matrix which contains matrices which contains matrices which contains real numbers.

The point with the introduction of several levels is to reduce the overhead since the use of a single level would result in a cubically scaling overhead for matrix matrix multiplication. The gain comes from that zero matrices at higher levels do not need to be referenced in algorithms since their value already is known.

Sparsity


A sparse matrix is a matrix with enough zero entries that it is worth using an algorithm that avoids storing or operating on the zero entries.

J. Demmel Applied Numerical Linear Algebra (1997)


The question is if there are enough zero entries in the matrices occurring in Hartree-Fock and DFT implementations that it is worth using a sparse representation. For small systems the answer is no since all parts of the system interact. As the system size increases the sparsity grows and it becomes beneficial to use a sparse data structure.