WAYS TO ACHIEVE INTERNAL PARALLELISM OF TASKS IN MULTITHREADED COMPUTING
DOI:
https://doi.org/10.32782/IT/2022-3-4Keywords:
compute performance; paralleling compiler; internal parallelism; circle optimizationAbstract
The article analyzes the evolution of the architecture of computer systems and identifies a way to improve the performance of calculations. It consists in the use of multi-threaded technologies as a set of hardware, software and methodological solutions. Multi-core processors are considered as the object of research. Each of the cores has its own cache memory, and at this level they can be considered as computing devices with distributed memory. At the same time, with large amounts of computation, this memory is not enough, so the shared memory is used, that is, they are transformed into devices with a common memory. Therefore, in terms of memory access, multicore processors can be considered as hybrid computing devices. The goal of the work is to analyze approaches to optimizing transformations of calculations in the form of iterative cycles with subsequent experimental verification of their effectiveness on computers with multi-core processors. The novelty of the approach lies in the fact that traditionally arithmetic cycles are considered in optimization problems, which is a more particular problem. The role of the compiler is highlighted in code optimization by searching for internal parallelism. Attention is paid to cycles as the most promising object for parallelization. The indicator of «Depth of Inter-Step Communication in the Loop» (DISCL) is proposed, and the conditions for parallelization of cycles are formulated. The problem of load balancing during parallelization is considered. The results of experiments on the use of MPI technology for parallelization of some computational tasks, which use iterative cycles (sequences and numerical integration), are presented. As a criterion for the performance of calculations, invariant to the characteristics of various computer systems, the indicator λ is proposed. It designates the relative part of parallel computing in their total volume.
References
Gordon Earle Moor. Cramming more components onto integrated circuits. Electronics Magazine. 1965. vol. 39(8). P. 114-117.
Flynn Michel. J. Very high speed computers. IEEE. 1966. 54(12). P. 1901 - 1909.
Wolf, Wayne Hendrix. Modern VLSI Design, 4th ed. Boston: Prentice Hall, 2002. 638 с.
Гергель В.П. Фурсов В.А. Лекции по параллельным вычислениям. Самара: Изд-во Самар. гос. аэрокосм. ун-та, 2009. 164 с.
Гордеев Э.Н. Введение в теорию сложности алгоритмов. М: МГТУ им. Н.Э.Баумана, 2011. 49 с.
Chandra Rohit, Menon Ramesh, Dagum Leo, Kohr David, Maydan Dror, McDonald Jeff. Parallel Programming in OpenMP. Morgan Kaufmann, 2000. 229 с.
Gropp William, Lusk Ewing, Skjellum Anthony. Using MPI: Portable Programming with the Message Passing Interface. The MIT Press, 1999. 367 с.
Воеводин В.В., Воеводин Вл. В. Параллельные вычисления. Петербург: БХВ, 2002. 608 с.
Amdahl Gene M. The validity of the single processor approach to achieving large-scale computing capabilities. In Proceedings of AFIPS Spring Joint Computer Conference, Atlantic City, N.J., AFIPS Press. 1967. 30. P. 483-85.
Pavlov Valerii. Nullifying rules influence on speed in context free grammar LL(1). Journal of Theoretical and Applied Computer Science. Polish Academy of Sciences, Gdansk Branch, Computer Science Commission. 2016. 2. P. 3-15.
Aho Alfred V., Ullman Jeffrey D.. The theory of parsing, translation and compiling, Volume 2. Prentice-Hall, 1973. 542
Aho Alfred V., Sethi Ravi, Ullman Jeffrey D. Compilers: Principles, Techniques, and Tools. ADDISON-WESLEY, 1986. 796 с.
Thakur V., Kumar S., Load Balancing Algorithm An Analytical Study. The IUP Journal of Computer Sciences. 2014. VIII(2). P. 25-34.
Runge Karl. Analytische Geometrie der Ebene. Leipzig: B.G. Teubner, 1908. 217 с.
Iserles Arieh, A First Course in the Numerical Analysis of Differential Equations. Cambridge University Press, 1996. 480 с.