The IRMA Community
Newsletters
Research IRM
Click a keyword to search titles using our InfoSci-OnDemand powered search:
|
Adjusting Thread Parallelism Dynamically to Accelerate Dynamic Programming with Irregular Workload Distribution on GPGPUs
|
Author(s): Chao-Chin Wu (National Changhua University of Education, Changhua, Taiwan), Jenn-Yang Ke (Tatung University, Taipei, Taiwan), Heshan Lin (Virginia Tech, Blacksburg, VA, USA)and Syun-Sheng Jhan (Ling Tung University, Taichung, Taiwan)
Copyright: 2014
Volume: 6
Issue: 1
Pages: 20
Source title:
International Journal of Grid and High Performance Computing (IJGHPC)
Editor(s)-in-Chief: Emmanuel Udoh (Sullivan University, USA)and Ching-Hsien Hsu (Asia University, Taiwan)
DOI: 10.4018/ijghpc.2014010101
Purchase
|
Abstract
Dynamic Programming (DP) is an important and popular method for solving a wide variety of discrete optimization problems such as scheduling, string-editing, packaging, and inventory management. DP breaks problems into simpler subproblems and combines their solutions into solutions to original ones. This paper focuses on one type of dynamic programming called Nonserial Polyadic Dynamic Programming (NPDP). To run NPDP applications efficiently on an emerging General-Purpose Graphic Processing Unit (GPGPU), the authors have to exploit more parallelism to fully utilize the computing power of the hundreds of processing units in it. However, the parallelism degree varies significantly in different phases of the NPDP applications. To address the problem, the authors propose a method that can adjust the thread-level parallelism to provide a sufficient and steadier parallelism degree for different phases. If a phase has insufficient parallelism, the authors split threads into subthreads. On the other hand, the authors can limit the total number of threads in a phase by merging threads. The authors also examine the difference between the conventional problem of finding the minimum on a GPU and the NPDP-featured problem of finding the minimums of many independent sets on a GPU. Finally, the authors examine how to design an appropriate data structure to apply the memory coalescing optimization technique. The experimental results demonstrate our method can obtain the best speedup of 13.40 over the algorithm published previously.
Related Content
Honglong Xu, Zhonghao Liang, Kaide Huang, Guoshun Huang, Yan He.
© 2024.
17 pages.
|
Sherin Eliyas, P. Ranjana.
© 2024.
10 pages.
|
Shuang Li, Xiaoguo Yao.
© 2024.
16 pages.
|
Jialan Sun.
© 2024.
21 pages.
|
Mei Gong, Bingli Mo.
© 2024.
15 pages.
|
Qian He, Ke Wang.
© 2024.
19 pages.
|
Sunil Kumar, Rashmi Mishra, Tanvi Jain, Achyut Shankar.
© 2024.
12 pages.
|
|
|