Laboratory for Internet and Innovative Technologies

Cache memory speeds up the memory access for memory demanding and cache intensive algorithms. Introducing different cache levels and greater cache sizes in modern multiprocessor architectures reduces cache misses.Both matrix-matrix and matrix-vector multiplications are computation and cache intensive algorithms, as well as memory demanding algorithms and their execution time directly depends on CPU cache architecture and organization.This paper focuses on matrix-vector multiplication performance while executed on modern shared memory multiprocessor with shared last level L3 cache and dedicated L1 and L2 cache for each CPU core. Our goal is to analyze dense matrix-vector algorithm behavior and to check the hypothesis if the performance of today’s virtualized servers organized either in private data-centers or in cloud computing is usually worse than traditional servers with host operating systems without using virtualization. Both sequential and parallel executions are implemented in traditional and cloud environments using the same platform and hardware infrastructure for each test case.


Sasko Ristov, Marjan Gusev, and Goran Velkoski


Cloud Computing, HPC, Performance, Speedup

Full Paper

The technical report is published as a paper in IEEE ICIT 2013, Jordan, 8-10.5.2013