A distributed approach for accelerating sparse-matrix arithmetic operations for high-dimensional feature selection

Abstract

Matrix computations are both fundamental and ubiquitous in computational science, and as a result, they are frequently used in numerous disciplines of scientific computing and engineering. Due to the high computational complexity of matrix operations, which makes them critical to the performance of a large number of applications, their efficient execution in distributed environments becomes a crucial issue. This work proposes a novel approach for distributing sparse matrix arithmetic operations on computer clusters aiming at speeding-up the processing of high-dimensional matrices. The approach focuses on how to split such operations into independent parallel tasks by considering the intrinsic characteristics that distinguish each type of operation and the particular matrices involved. The approach was applied to the most commonly used arithmetic operations between matrices. The performance of the presented approach was evaluated considering a high-dimensional text feature selection approach and two real-world datasets. Experimental evaluation showed that the proposed approach helped to significantly reduce the computing times of big-scale matrix operations, when compared to serial and multi-thread implementations as well as several linear algebra software libraries.

Publication
Knowledge and Information Systems