Options
Dynamic Deep Learning Compression and Configurable Efficiency
Author(s)
Date Issued
2025
Date Available
2025-10-29T10:25:09Z
Abstract
With the rapid advancement of AI technology, ever-growing model sizes have made efficiency a critical consideration in system design. Traditional approaches to efficiency optimization typically remove non-essential parts of a model, but doing so inevitably reduces its effective capacity and constrains its performance. Moreover, these techniques often produce a model with fixed efficiency, rendering it incapable of adapting to varying needs across different real-world environments. Although some methods introduce dynamic model architectures to address this issue, they usually require specialized network designs and dedicated training methods that not only complicate development but also struggle to match the performance of regular models, thereby limiting their practical utility. To overcome these challenges, this paper proposes a novel efficiency methodology of deep learning that combines the ideas of Early Exit and Mixture-of-Experts. Specifically, we organize multiple independent models of varying scales under the same task, and route each input sample to an appropriately sized model based on its difficulty, enabling a dynamic reduction in computational cost. Additionally, the proposed approach allows adjusting the model's overall efficiency preferences after training or even during runtime with a simple configuration change, thereby adapting the system to changing requirements in real time. Furthermore, each model within the system can be developed, replaced, or added independently, enabling the integration of multiple advanced models of different sizes. This flexibility supports a broad range of efficiency adjustments while significantly reducing system construction and upgrading costs. Meanwhile, experimental results show that our framework matches or even surpasses the efficiency of highly optimized state-of-the-art models. Building upon a conventional Early Exit system, this work gradually breaks down the key design considerations and reorganizes them in a modular manner to create a flexible and scalable framework. This framework simultaneously supports Dynamic Compression and Configurable Efficiency, offering a practical reference for implementing these features in real-world AI systems.
Type of Material
Doctoral Thesis
Qualification Name
Doctor of Philosophy (Ph.D.)
Publisher
University College Dublin. School of Electrical and Electronic Engineering
Copyright (Published Version)
2025 the Author
Language
English
Status of Item
Peer reviewed
This item is made available under a Creative Commons License
File(s)
Loading...
Name
PhD_Thesis_Final.pdf
Size
7.6 MB
Format
Adobe PDF
Checksum (MD5)
3fa52cee6aa8ab2e7573207d1d08626b
Owning collection