-
Categories
-
Pharmaceutical Intermediates
-
Active Pharmaceutical Ingredients
-
Food Additives
- Industrial Coatings
- Agrochemicals
- Dyes and Pigments
- Surfactant
- Flavors and Fragrances
- Chemical Reagents
- Catalyst and Auxiliary
- Natural Products
- Inorganic Chemistry
-
Organic Chemistry
-
Biochemical Engineering
- Analytical Chemistry
-
Cosmetic Ingredient
- Water Treatment Chemical
-
Pharmaceutical Intermediates
Promotion
ECHEMI Mall
Wholesale
Weekly Price
Exhibition
News
-
Trade Service
On October 17-21, 2022, the 28th ACM Mobile Computing and Communication Systems Conference (MobiCom 2022, CCFA class) was held
in Sydney, Australia 。 The paper "Mandheling: Mixed-precision On-Device DNN Training with DSP Offloading" by the system software team of the School of Computer Science on the use of heterogeneous computing resources for end-side in-situ training in ubiquitous computing environment was presented online, which received great attention and extensive discussion
from the participants.
Illustration of the paper
In the ubiquitous computing environment, intelligence has become one of the important basic features of system software, and the main way to achieve intelligence is to train high-quality machine learning models
.
In recent years, compared with large-scale model training on cloud computing and data centers, training deep neural network models (DNNs) locally on mobile devices, that is, on-device training on devices on the end side or edge side has attracted attention
from academia and industry.
The in-situ training mode has its special application advantages for intelligent tasks in scenarios such as data security and privacy protection, unstable network connection, and harsh physical environment, and mainstream deep learning frameworks such as Google's TensorFlow and Alibaba's MNN all provide support
for end-to-end DNN training.
One of the main challenges of end-side in situ training is the limited
resource capacity of the end-side device.
Different from the traditional method, which mainly utilizes computing resources such as CPU and GPU, this paper proposes a software-defined offload technology and system Mandheling
for digital signal processor, an important heterogeneous computing resource on the opposite side, to support mixed-precision training 。 On the one hand, traditional DNN training is mainly performed on the FP32 data format, but recent studies have found that the weights and activations generated by some mixed-precision algorithms can effectively reduce the cost of training time resources while ensuring convergence accuracy if expressed in low-precision data formats such as INT8 and INT16
.
DSPs, on the other hand, are particularly well suited for integer operations such as INT8 matrix multiplication, and the Hexagon 698 DSP is sufficient to perform 128 INT8 operations
in one cycle.
Based on the idea of "software definition", Mandheling proposes four innovative technologies that give full play to the advantages of DSP in integer computing to support efficient in-situ training of mixed-precision models: (1) propose CPU-DSP collaborative scheduling strategy to reduce the cost of operators that are not friendly to DSP; (2) An adaptive rescaling algorithm is proposed to reduce the overhead of backpropagating dynamicrescaling; (3) The batch-splitting algorithm is proposed to improve the efficiency of DSP cache; (4) A DSP calculation subgraph multiplexing mechanism is proposed to eliminate the preparation overhead
on DSP.
Experimental results show that compared with the end-side DNN training engines of TFLite and MNN, Mandheling reduces the training time of each batch by an average of 5.
5 times and the energy consumption by an average of 8.
9 times
.
In the end-to-end training task, compared with the FP32 precision baseline, Mandheling improves the convergence time by 10.
7 times, reduces the energy consumption by 13.
1 times, and loses model accuracy by only 2%.
The first author of this paper is Xu Daliang, a 2021 doctoral student, and the supervisor is researcher
Liu Weizhe.
It is worth mentioning that this research group is one of the earliest teams in the world to study in-situ training intelligent systems in ubiquitous computing environment, and the results have been continuously published in MobiCom 2018, WWW 2019 (the first WWW Best Paper Award for Chinese scholars), MobiSys 2020, WWW 2021, MobiSys 2022 and other top conferences, and have produced positive application effects and impacts in cooperation with State Grid, China Railways, KikaTech
。
ACM MobiCom is the top academic conference on computer network systems and a Class A conference
recommended by CCF.
MobiCom 2022 received 314 submissions and 56 accepted, with an acceptance rate of 18%.