Laboratory for Internet and Innovative Technologies

Dense matrix-matrix multiplication algorithm is widely used in large scientific applications, and often it is an important factor of the overall performance of the application. Therefore, optimizing this algorithm, both for parallel and serial execution would give an overall performance boost. In this paper we overview the most used dense matrix multiplication optimization techniques applicable for multicore processors. These methods can speedup the multicore parallel execution focusing on reducing the number of memory accesses and improving the algorithm according to hardware architecture and organization.


Nenad Anchev, Marjan Gusev, Sasko Ristov, and Blagoj Atanasovski


HPC, CPU, Cache, Performance

Full Paper

The paper is published in Information Technology Interfaces (ITI), Proceeding of ITI 2013 35th International Conference on, pp. 71-76. IEEE, ISSN: 1330-1012, ISBN: 978-953-7138-30-1, IEEE Catalog Number: CFP13498-PRT, 2013