Publikation: Bidrag til bog/antologi/rapport/proceeding › Konferencebidrag i proceedings › Forskning › peer review
Distributed Deep Learning Inference Acceleration using Seamless Collaboration in Edge Computing. / Li, Nan; Iosifidis, Alexandros; Zhang, Qi.
ICC 2022 - IEEE International Conference on Communications. IEEE, 2022. s. 3667-3672.Publikation: Bidrag til bog/antologi/rapport/proceeding › Konferencebidrag i proceedings › Forskning › peer review
}
TY - GEN
T1 - Distributed Deep Learning Inference Acceleration using Seamless Collaboration in Edge Computing
AU - Li, Nan
AU - Iosifidis, Alexandros
AU - Zhang, Qi
PY - 2022/5
Y1 - 2022/5
N2 - This paper studies inference acceleration using distributed convolutional neural networks (CNNs) in collaborative edge computing. To ensure inference accuracy in inference task partitioning, we consider the receptive-field when performing segment-based partitioning. To maximize the parallelization between the communication and computing processes, thereby minimizing the total inference time of an inference task, we design a novel task collaboration scheme in which the overlapping zone of the sub-tasks on secondary edge servers (ESs) is executed on the host ES, named as HALP. We further extend HALP to the scenario of multiple tasks. Experimental results show that HALP can accelerate CNN inference in VGG-16 by 1.7-2.0x for a single task and 1.7-1.8x for 4 tasks per batch on GTX 1080TI and JETSON AGX Xavier, which outperforms the state-of-the-art work MoDNN. Moreover, we evaluate the service reliability under time-variant channel, which shows that HALP is an effective solution to ensure high service reliability with strict service deadline.
AB - This paper studies inference acceleration using distributed convolutional neural networks (CNNs) in collaborative edge computing. To ensure inference accuracy in inference task partitioning, we consider the receptive-field when performing segment-based partitioning. To maximize the parallelization between the communication and computing processes, thereby minimizing the total inference time of an inference task, we design a novel task collaboration scheme in which the overlapping zone of the sub-tasks on secondary edge servers (ESs) is executed on the host ES, named as HALP. We further extend HALP to the scenario of multiple tasks. Experimental results show that HALP can accelerate CNN inference in VGG-16 by 1.7-2.0x for a single task and 1.7-1.8x for 4 tasks per batch on GTX 1080TI and JETSON AGX Xavier, which outperforms the state-of-the-art work MoDNN. Moreover, we evaluate the service reliability under time-variant channel, which shows that HALP is an effective solution to ensure high service reliability with strict service deadline.
KW - Delay constraint
KW - Distributed CNNs
KW - Edge computing
KW - Inference acceleration
KW - Receptive-field
KW - Service reliability
U2 - 10.1109/ICC45855.2022.9839083
DO - 10.1109/ICC45855.2022.9839083
M3 - Article in proceedings
SP - 3667
EP - 3672
BT - ICC 2022 - IEEE International Conference on Communications
PB - IEEE
Y2 - 16 May 2022 through 20 May 2022
ER -