Can a Student Large Language Model Perform as Well as Its Teacher?

View Sample PDF

Author(s): Sia Gholami (Independent Researcher, USA)and Marwan Omar (Illinois Institute of Technology, USA)
Copyright: 2024
Pages: 18
Source title: Innovations, Securities, and Case Studies Across Healthcare, Business, and Technology
Source Author(s)/Editor(s): Darrell Norman Burrell (Marymount University, USA)
DOI: 10.4018/979-8-3693-1906-2.ch007

Keywords: Medical Information Science Reference / Medical Technologies / Medicine & Healthcare

Purchase

View Can a Student Large Language Model Perform as Well as Its Teacher? on the publisher's website for pricing and purchasing information.

Abstract

The burgeoning complexity of contemporary deep learning models, while achieving unparalleled accuracy, has inadvertently introduced deployment challenges in resource-constrained environments. Through meticulous examination, the authors elucidate the critical determinants of successful distillation, including the architecture of the student model, the caliber of the teacher, and the delicate balance of hyperparameters. While acknowledging its profound advantages, they also delve into the complexities and challenges inherent in the process. The exploration underscores knowledge distillation's potential as a pivotal technique in optimizing the trade-off between model performance and deployment efficiency.

The IRMA Community

Research IRM

Can a Student Large Language Model Perform as Well as Its Teacher?

Purchase

Abstract

Related Content

IRMA Sponsors