Laboratory for Internet and Innovative Technologies

Introducing multilevel cache memory reduces the gap between the CPU and main memory and speeds up the program execution. The speedup in modern multiprocessors can scale up to linear speedup according to Gustafson's law. Each CPU core usually possesses private L1 and L2 cache memory and shares L3 cache memory in multi-core processor architectures. Furthermore, private or shared cache memory could have significant impact to the algorithm performance in parallel implementation. Private cache increases the overall cache size used during the execution. On the other hand, shared cache reduces cache misses if all CPU cores use the same data. In this paper we analyze the matrix vector multiplication algorithm performance for sequential and parallel implementation in multi-chip multi-core multiprocessor in order to determine the CPU affinity that provides the best performance. We also realize theoretical analysis to determine the problem size regions where selecting appropriate CPU affinity can produce the best performance using the same resources.


Goran Velkoski, Sasko Ristov, and Marjan Gusev


CPU Cache, Matrix - Vector Multiplication, Performance, Speed

Full Paper

The paper is published in Information Technology Interfaces (ITI), Proceeding of ITI 2013 35th International Conference on, pp. 95-100. IEEE, ISSN: 1330-1012, ISBN: 978-953-7138-30-1, IEEE Catalog Number: CFP13498-PRT, 2013