Aarhus Universitets segl

Qi Zhang

Distributed Deep Learning Inference Acceleration using Seamless Collaboration in Edge Computing

Publikation: Bidrag til bog/antologi/rapport/proceedingKonferencebidrag i proceedingsForskningpeer review

Standard

Distributed Deep Learning Inference Acceleration using Seamless Collaboration in Edge Computing. / Li, Nan; Iosifidis, Alexandros; Zhang, Qi.

ICC 2022 - IEEE International Conference on Communications. IEEE, 2022. s. 3667-3672.

Publikation: Bidrag til bog/antologi/rapport/proceedingKonferencebidrag i proceedingsForskningpeer review

Harvard

Li, N, Iosifidis, A & Zhang, Q 2022, Distributed Deep Learning Inference Acceleration using Seamless Collaboration in Edge Computing. i ICC 2022 - IEEE International Conference on Communications. IEEE, s. 3667-3672, IEEE International Conference on Communications, Seoul, Sydkorea, 16/05/2022. https://doi.org/10.1109/ICC45855.2022.9839083

APA

CBE

MLA

Vancouver

Li N, Iosifidis A, Zhang Q. Distributed Deep Learning Inference Acceleration using Seamless Collaboration in Edge Computing. I ICC 2022 - IEEE International Conference on Communications. IEEE. 2022. s. 3667-3672 doi: 10.1109/ICC45855.2022.9839083

Author

Bibtex

@inproceedings{f98961b568904b65a4118c278700e3b4,
title = "Distributed Deep Learning Inference Acceleration using Seamless Collaboration in Edge Computing",
abstract = "This paper studies inference acceleration using distributed convolutional neural networks (CNNs) in collaborative edge computing. To ensure inference accuracy in inference task partitioning, we consider the receptive-field when performing segment-based partitioning. To maximize the parallelization between the communication and computing processes, thereby minimizing the total inference time of an inference task, we design a novel task collaboration scheme in which the overlapping zone of the sub-tasks on secondary edge servers (ESs) is executed on the host ES, named as HALP. We further extend HALP to the scenario of multiple tasks. Experimental results show that HALP can accelerate CNN inference in VGG-16 by 1.7-2.0x for a single task and 1.7-1.8x for 4 tasks per batch on GTX 1080TI and JETSON AGX Xavier, which outperforms the state-of-the-art work MoDNN. Moreover, we evaluate the service reliability under time-variant channel, which shows that HALP is an effective solution to ensure high service reliability with strict service deadline.",
keywords = "Delay constraint, Distributed CNNs, Edge computing, Inference acceleration, Receptive-field, Service reliability",
author = "Nan Li and Alexandros Iosifidis and Qi Zhang",
year = "2022",
month = may,
doi = "10.1109/ICC45855.2022.9839083",
language = "English",
pages = "3667--3672",
booktitle = "ICC 2022 - IEEE International Conference on Communications",
publisher = "IEEE",
note = "null ; Conference date: 16-05-2022 Through 20-05-2022",
url = "https://icc2022.ieee-icc.org/",

}

RIS

TY - GEN

T1 - Distributed Deep Learning Inference Acceleration using Seamless Collaboration in Edge Computing

AU - Li, Nan

AU - Iosifidis, Alexandros

AU - Zhang, Qi

PY - 2022/5

Y1 - 2022/5

N2 - This paper studies inference acceleration using distributed convolutional neural networks (CNNs) in collaborative edge computing. To ensure inference accuracy in inference task partitioning, we consider the receptive-field when performing segment-based partitioning. To maximize the parallelization between the communication and computing processes, thereby minimizing the total inference time of an inference task, we design a novel task collaboration scheme in which the overlapping zone of the sub-tasks on secondary edge servers (ESs) is executed on the host ES, named as HALP. We further extend HALP to the scenario of multiple tasks. Experimental results show that HALP can accelerate CNN inference in VGG-16 by 1.7-2.0x for a single task and 1.7-1.8x for 4 tasks per batch on GTX 1080TI and JETSON AGX Xavier, which outperforms the state-of-the-art work MoDNN. Moreover, we evaluate the service reliability under time-variant channel, which shows that HALP is an effective solution to ensure high service reliability with strict service deadline.

AB - This paper studies inference acceleration using distributed convolutional neural networks (CNNs) in collaborative edge computing. To ensure inference accuracy in inference task partitioning, we consider the receptive-field when performing segment-based partitioning. To maximize the parallelization between the communication and computing processes, thereby minimizing the total inference time of an inference task, we design a novel task collaboration scheme in which the overlapping zone of the sub-tasks on secondary edge servers (ESs) is executed on the host ES, named as HALP. We further extend HALP to the scenario of multiple tasks. Experimental results show that HALP can accelerate CNN inference in VGG-16 by 1.7-2.0x for a single task and 1.7-1.8x for 4 tasks per batch on GTX 1080TI and JETSON AGX Xavier, which outperforms the state-of-the-art work MoDNN. Moreover, we evaluate the service reliability under time-variant channel, which shows that HALP is an effective solution to ensure high service reliability with strict service deadline.

KW - Delay constraint

KW - Distributed CNNs

KW - Edge computing

KW - Inference acceleration

KW - Receptive-field

KW - Service reliability

U2 - 10.1109/ICC45855.2022.9839083

DO - 10.1109/ICC45855.2022.9839083

M3 - Article in proceedings

SP - 3667

EP - 3672

BT - ICC 2022 - IEEE International Conference on Communications

PB - IEEE

Y2 - 16 May 2022 through 20 May 2022

ER -