The University of South Carolina is offering licensing opportunities for a double precision floating point stream accumulator.

This invention is a SpMV architecture based on a novel streaming reduction circuit and a specialized cache optimized for CSR data. This architecture is implemented on the Convey HC-1, a self-contained heterogeneous system containing a Xeon-based host and an FPGA-based co-processor board with four user programmable Virtex5-LX330 FPGAs. A CSR sparse matrix-vector multiplier was configured and its implementation was analyzed on the Convey HC-1 reconfigurable computer. This invention represents a new streaming reduction circuit design and an on-chip memory architecture optimized for CSR-formatted sparse matrix data. Test results show performance that exceeds that of the Tesla GPU.


Sparse Matrix Vector Multiplication (SpMV) describes solving y = Ax where y and x are vectors and A is a large matrix populated mostly with zero entries. Due to the sparseness of the matrix, it is often neither practical nor feasible to store every entry of the matrix in a traditional dense representation, so compressed sparse representations, such as compressed sparse row (CSR) format, are often used to represent the matrices in memory.

SpMV is frequently employed in scientific and engineering applications and is the kernel for iterative linear system solvers, such as the conjugant gradient method.

  1. A novel streaming reduction architecture for floating point accumulation.
  2. A novel on-chip cache design optimized for streaming compressed sparse row (CSR) matrices.
  3. End-to-end integration with the HC-1 system, programming model, and runtime environment.
Title App Type Country Serial No. Patent No. File Date Issued Date Expire Date Patent Status
System and Method for Sparse Matrix Vector Multiplication Processing Utility United States 13/456,657 8,862,653 4/26/2012 10/14/2014 4/24/2033  
