TY - JOUR
T1 - A Parallel Digital VLSI Architecture for Integrated Support Vector Machine Training and Classification
AU - Wang, Qian
AU - Li, Peng
AU - Kim, Yongtae
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/8/1
Y1 - 2015/8/1
N2 - This paper presents a parallel digital VLSI architecture for combined support vector machine (SVM) training and classification. For the first time, cascade SVM, a powerful training algorithm, is leveraged to significantly improve the scalability of hardware-based SVM training and develop an efficient parallel VLSI architecture. The presented architecture achieves excellent scalability by spreading the training workload of a given data set over multiple SVM processing units with minimal communication overhead. Hardware-friendly implementation of the cascade algorithm is employed to achieve low hardware overhead and allow for training over data sets of variable size. In the proposed parallel cascade architecture, a multilayer system bus and multiple distributed memories are used to fully exploit parallelism. In addition, the proposed architecture is rather flexible and can be tailored to realize hybrid use of hardware parallel processing and temporal reuse of processing resources, leading to good tradeoffs between throughput, silicon overhead and power dissipation. Several parallel cascade SVM processors have been designed with a commercial 90-nm CMOS technology, which provide up to a 561 × training time speedup and a significant estimated 21859× energy reduction compared with the software SVM algorithm running on a 45-nm commercial general-purpose CPU.
AB - This paper presents a parallel digital VLSI architecture for combined support vector machine (SVM) training and classification. For the first time, cascade SVM, a powerful training algorithm, is leveraged to significantly improve the scalability of hardware-based SVM training and develop an efficient parallel VLSI architecture. The presented architecture achieves excellent scalability by spreading the training workload of a given data set over multiple SVM processing units with minimal communication overhead. Hardware-friendly implementation of the cascade algorithm is employed to achieve low hardware overhead and allow for training over data sets of variable size. In the proposed parallel cascade architecture, a multilayer system bus and multiple distributed memories are used to fully exploit parallelism. In addition, the proposed architecture is rather flexible and can be tailored to realize hybrid use of hardware parallel processing and temporal reuse of processing resources, leading to good tradeoffs between throughput, silicon overhead and power dissipation. Several parallel cascade SVM processors have been designed with a commercial 90-nm CMOS technology, which provide up to a 561 × training time speedup and a significant estimated 21859× energy reduction compared with the software SVM algorithm running on a 45-nm commercial general-purpose CPU.
KW - Digital integrated circuits
KW - multicore processing
KW - parallel architectures
KW - support vector machines
KW - system buses
UR - http://www.scopus.com/inward/record.url?scp=85028228408&partnerID=8YFLogxK
U2 - 10.1109/TVLSI.2014.2343231
DO - 10.1109/TVLSI.2014.2343231
M3 - Article
AN - SCOPUS:85028228408
SN - 1063-8210
VL - 23
SP - 1471
EP - 1484
JO - IEEE Transactions on Very Large Scale Integration (VLSI) Systems
JF - IEEE Transactions on Very Large Scale Integration (VLSI) Systems
IS - 8
M1 - 6876212
ER -