Segmented Merge: A New Primitive for Parallel Sparse Matrix Computations

Haonan Ji, Shibo Lu, Kaixi Hou, Hao Wang, Zhou Jin*, Weifeng Liu, Brian Vinter

*Corresponding author for this work

Research output: Contribution to journal/Conference contribution in journal/Contribution to newspaperJournal articleResearchpeer-review

7 Citations (Scopus)

Abstract

Segmented operations, such as segmented sum, segmented scan and segmented sort, are important building blocks for parallel irregular algorithms. We in this work propose a new parallel primitive called segmented merge. Its function is in parallel merging q sub-segments to p segments, both of possibly nonuniform lengths which easily cause the load balancing and the vectorization problems on massively parallel processors, such as GPUs. Our algorithm resolves these problems by first recording the boundaries of segments and sub-segments, then assigning roughly the same number of elements for GPU threads, and finally iteratively merging the sub-segments in each segment in the form of binary tree until there is only one sub-segment in each segment. We implement the segmented merge primitive on GPUs and demonstrate its efficiency on parallel sparse matrix transposition (SpTRANS) and sparse matrix–matrix multiplication (SpGEMM) operations. We conduct a comparative experiment with NVIDIA vendor library on two GPUs. The experimental results show that our algorithm achieve on average 3.94× (up to 13.09×) and 2.89× (up to 109.15×) speedup on SpTRANS and SpGEMM, respectively.

Original languageEnglish
JournalInternational Journal of Parallel Programming
Volume49
Issue5
Pages (from-to)732-744
Number of pages13
ISSN0885-7458
DOIs
Publication statusPublished - Oct 2021

Keywords

  • GPU
  • Parallel computing
  • Segmented merge
  • Sparse matrix

Fingerprint

Dive into the research topics of 'Segmented Merge: A New Primitive for Parallel Sparse Matrix Computations'. Together they form a unique fingerprint.

Cite this