The IRMA Community
Newsletters
Research IRM
Click a keyword to search titles using our InfoSci-OnDemand powered search:
|
Navigating the Landscape of Distributed Computing Frameworks for Machine and Deep Learning: Overcoming Challenges and Finding Solutions
Abstract
For a number of reasons, distributed computing is crucial to machine learning and deep learning models. In the beginning, it makes it possible to train big models that won't fit in a single machine's memory. Second, by distributing the burden over several machines, it expedites the training process. Thirdly, it enables the management of vast amounts of data that may be dispersed across multiple devices or kept remotely. The system can continue processing data even if one machine fails because of distributed computing, which further improves fault tolerance. This chapter summarizes major frameworks Tensorflow, Pytorch, Apache spark Hadoop, and Horovod that are enabling developers to design and implement distributed computing models using large datasets. Some of the challenges faced by the distributed computing models are communication overhead, fault tolerance, load balancing, scalability and security, and the solutions are proposed to overcome the abovementioned challenges.
Related Content
|
G. Boopathy, Balaji Ganesan, P. Sivaprakasam, T. Kumaran.
© 2026.
42 pages.
|
|
G. Prasad.
© 2026.
14 pages.
|
|
Kishorebabu Dasari, Sujana Parry, Srinivas Mekala.
© 2026.
30 pages.
|
|
Chikesh Ranjan, Jonnalagadda Srinivas, P. S. Balaji, Kaushik Kumar.
© 2026.
24 pages.
|
|
G. Ananthi, S. Mehala Shevani, P. Priyadharshini Devi.
© 2026.
24 pages.
|
|
G. Prasad, Snehal Malik, Aadya Gupta, Yash Nigam.
© 2026.
26 pages.
|
|
Dhirendra Patel, M. L. Azad.
© 2026.
36 pages.
|
|
|