The Internet-of-things (IoT) is increasingly being used in a wide range of fields, including smart cities, and autonomous driving, and so on. Despite the numerous benefits, IoT still faces significant challenges in terms of communication, storage and computational resources. These challenges become even more critical when dealing with time-critical artificial intelligence(AI)-based IoT applications that require high service reliability and accuracy.
The aim of this PhD dissertation is to design novel parallelization strategies and offloading schemes to accelerate the inference of deep neural network (DNN) in mobile edge computing (MEC) networks thereby meeting the deadline and ensuring inference accuracy of time-critical IoT applications.
The first part of the dissertation focus on horizontal collaboration of DNN inference tasks by designing novel parallelization strategies to accelerate the distributed inference in MEC networks. Firstly, a seamless layer-wise parallelization scheme, HALP, is proposed to maximize the parallelization between communication and computation processes to reduce the overall completion time and achieve around 2x inference acceleration for VGG-16. Subsequently, a dynamic programming-based fused-layer parallelization scheme, DPFP, is proposed to select the optimal subset of collaborative Edge servers (ESs) for distributed CNN inference and optimally partition a CNN model into multiple fused blocks, effectively reducing computation time and communication overhead and achieving inference acceleration by 71% and 73% for pre-trained ResNet-50 and VGG-16 models, respectively. Additionally, these two proposed schemes are shown to perform faster inference than state-of-the-art works and achieve high service reliability under stochastic wireless channel and time-varying image sizes, which highlights their effectiveness in meeting strict service deadline.
The second part of the dissertation focus on vertical collaboration of DNN inference tasks by fully offloading the computational task in dynamic multi-access MEC networks with uncertain communication time and the time-varying available computational capacities of ESs. A dynamic inference method, the early-exit mechanism, is employed to terminate inference early to meet the deadline requirements of time-critical IoT applications. Then, a graph reinforcement learning-based early-exit scheme, GRLE, is proposed to effectively learn the graph-like representation of the MEC information and make offloading decision. GRLE is shown to achieve higher inference accuracy than the state-of-the-art work under different dynamic scenarios, which demonstrates its effectiveness for offloading decision-making in dynamic MEC networks.
The third part of the dissertation focuses on vertical collaboration of DNN inference tasks in device-edge co-inference system by designing novel semantic compression schemes for visual data to accelerate the inference. First, a novel autoencoder-based CNN architecture (AECNN), is proposed to effectively extract the meaningful information thereby compressing the intermediate data in a device-edge co-inference system. AECNN is shown to compress the intermediate data up to more than 256x with only about 4% accuracy loss, which outperforms the state-of-the-art work. Additionally, AECNN performs inference faster than full offloading, especially in scenarios with poor wireless channel conditions, which shows its effectiveness in ensuring higher accuracy within time constraints. Subsequently, a spatiotemporal attention-based autoencoder architecture (STAE), is proposed to compress raw video data by effectively extracting the informative frames and the meaningful pixels in each frame for video action recognition task. STAE is shown to compress the video data by 104x with only 5% accuracy loss on vision transformer (ViT)-base model, and achieves faster inference and higher inference accuracy than the state-of-the-art work under time-varying wireless channel conditions, which highlights its effectiveness in guaranteeing higher accuracy within time constraints.