New Filter2D Accelerator on the Versal Platform Powered by the AI Engine

Zhang, Wenbo; Wang, Tianshuo; Liu, Yiqi; Li, Yiming; Bao, Zhenshan

doi:10.1007/978-981-99-7872-4_24

Wenbo Zhang¹²,
Tianshuo Wang¹²,
Yiqi Liu¹²,
Yiming Li¹² &
…
Zhenshan Bao¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14103))

Included in the following conference series:

International Symposium on Advanced Parallel Processing Technologies

989 Accesses
3 Citations

Abstract

Filter2D, as a fundamental operator of CNN, has vital optimization and acceleration significance in computer vision (CV) applications, so it is designed as the CCFSys-CCC2023 competition CV track. Based on the CCC2023 competition designated Versal ACAP Architecture, we proposed the AI Engine (AIE) kernel and AIE graph design scheme and reconstructed the programmable logic (PL) and Processing System (PS) accordingly. Results show that, compared to the only PS scheme, our design achieve about 104.51$\sim $139.41 speedup on the specified platform Versal ACAP, which overcame all other 50+ group and won the championship of CCC2023.

Supported by AMD University Program.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A concept-aware explainability method for convolutional neural networks

Article Open access 11 January 2025

GEMM-Like Convolution for Deep Learning Inference on the Xilinx Versal

Tackling visual and conceptual complexity of problem-oriented modeling of requirements

Article 11 March 2024

References

AMD/Xilinx: AI engine API and intrinsics user guide
Google Scholar
AMD/Xilinx: AI engine white paper
Google Scholar
AMD/Xilinx: CCFSys-CCC2023. https://ccfsys-ccc.github.io/2023/
AMD/Xilinx: CCFSys-CCC2023. https://www.amd-haccs.io/index.html
AMD/Xilinx: Versal ACAP. http://www.xilinx.com/versal
AMD/Xilinx: Versal ACAP AI engine architecture manual - AM009. https://docs.xilinx.com/r/en-US/am009-versal-ai-engine/
Bai, L., Zhao, Y.M., Huang, X.M.: A CNN accelerator on FPGA using depthwise separable convolution. IEEE Trans. Circuits Syst. II-Express Briefs 65(10), 1415–1419 (2018). https://doi.org/10.1109/tcsii.2018.2865896. Go to ISI: //WOS:000446155600027
Chen, X.M., Han, Y.H., Wang, Y., IEEE: communication lower bound in convolution accelerators. In: 26th IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 529–541. International Symposium on High-Performance Computer Architecture-Proceedings (2020). https://doi.org/10.1109/hpca47549.2020.00050. Go to ISI: //WOS:000531494100040
Deng, H.P., et al.: 3D-VNPU: a flexible accelerator for 2D/3D CNNs on FPGA. In: 29th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 181–185. Annual IEEE Symposium on Field-Programmable Custom Computing Machines (2021). https://doi.org/10.1109/fccm51124.2021.00029. Go to ISI: //WOS:000681289100021
Gilan, A.A., Emad, M., Alizadeh, B.: FPGA-based implementation of a real-time object recognition system using convolutional neural network. IEEE Trans. Circuits Syst. II-Exp. Briefs 67(4), 755–759 (2020). https://doi.org/10.1109/tcsii.2019.2922372. Go to ISI ://WOS:000522403100031
Jia, X., et al.: XVDPU: a high performance CNN accelerator on the versal platform powered by the AI engine. In: 2022 32nd International Conference on Field-Programmable Logic and Applications, FPL, pp. 209–217. International Conference on Field Programmable Logic and Applications, AMD; Intel; Groq; Twosigma; Lattice Semicond; XILINX; Maxeler; Two Sigma (2022). https://doi.org/10.1109/FPL57034.2022.00041. 32nd International Conference on Field-Programmable Logic and Applications (FPL), Belfast, North Ireland, Aug 29-Sep 02, 2022
Thomas K, A., Poddar, S., Mondal, H.K.: A CNN hardware accelerator using triangle-based convolution. J. Emerg. Technol. Comput. Syst. 18(4), Article 78 (2022). https://doi.org/10.1145/3544975
Kelefouras, V., Keramidas, G.: Design and implementation of 2D convolution on X86/X64 processors. IEEE Trans. Parallel Distrib. Syst. 33(12), 3800–3815 (2022). https://doi.org/10.1109/tpds.2022.3171471. Go to ISI: //WOS:000831139000004
Kim, H., Song, W.J.: Las: locality-aware scheduling for GEMM-accelerated convolutions in GPUs. IEEE Trans. Parallel Distrib. Syst. 34(5), 1479–1494 (2023). https://doi.org/10.1109/TPDS.2023.3247808
Article Google Scholar
Li, G.D., Min, L.Q., Zang, H.Y.: Color edge detections based on cellular neural network. Int. J. Bifurcation Chaos 18(4), 1231–1242 (2008). https://doi.org/10.1142/s0218127408020963. Go to ISI: //WOS:000257292300022
Lym, S., Lee, D., O’Connor, M., Chatterjee, N., Erez, M.: Delta: GPU performance model for deep learning applications with in-depth memory system traffic analysis. In: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 293–303. IEEE International Symposium on Performance Analysis of Systems and Software-ISPASS (2019). https://doi.org/10.1109/ispass.2019.00041. Go to ISI: //WOS:000470201600033
Ma, Y.F., Cao, Y., Vrudhula, S., Seo, J.S.: Optimizing the convolution operation to accelerate deep neural networks on FPGA. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 26(7), 1354–1367 (2018). https://doi.org/10.1109/tvlsi.2018.2815603. Go to ISI: //WOS:000437031400013
Mo, H., et al.: 9.2 a 28nm 12.1 TOPS/W dual-mode CNN processor using effective-weight-based convolution and error-compensation-based prediction. In: 2021 IEEE International Solid-State Circuits Conference (ISSCC), vol. 64, pp. 146–148 (2021). https://doi.org/10.1109/ISSCC42613.2021.9365943
Moradifar, M., Shahbahrami, A.: Performance improvement of gaussian filter using simd technology. In: 2020 International Conference on Machine Vision and Image Processing (MVIP), pp. 1–6 (2020). https://doi.org/10.1109/MVIP49855.2020.9116883
Ye, J.Y., Shen, Z.Y., Behrani, P., Ding, F., Shi, Y.Q.: Detecting usm image sharpening by using CNN. Signal Process.-Image Commun. 68, 258–264 (2018). https://doi.org/10.1016/j.image.2018.04.016. Go to ISI: //WOS:000447572100023
Zhuang, J., et al.: CHARM: composing heterogeneous accelerators for matrix multiply on versal ACAP architecture. In: Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA 2023, pp. 153–164. Association for Computing Machinery, New York (2023). https://doi.org/10.1145/3543622.3573210

Download references

Acknowledgement

We thank the support the AMD/Xilinx for board and software donation and support from AMD/Xilinx Heterogeneous Accelerated Compute Cluster at NUS. We thank all the reviewers and CCFSys-CCC committee for their valuable feedback.

Author information

Authors and Affiliations

Faculty of Information Technology, Beijing University of Technology, Beijing, China
Wenbo Zhang, Tianshuo Wang, Yiqi Liu, Yiming Li & Zhenshan Bao

Authors

Wenbo Zhang
View author publications
Search author on:PubMed Google Scholar
Tianshuo Wang
View author publications
Search author on:PubMed Google Scholar
Yiqi Liu
View author publications
Search author on:PubMed Google Scholar
Yiming Li
View author publications
Search author on:PubMed Google Scholar
Zhenshan Bao
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Wenbo Zhang.

Editor information

Editors and Affiliations

Shanghai Jiao Tong University, Shanghai, China
Chao Li
Tsinghua University, Beijing, Beijing, China
Zhenhua Li
National University of Defense Technology, Nanjing, China
Li Shen
Shanghai Jiao Tong University, Shanghai, China
Fan Wu
Nankai University, Tianjin, China
Xiaoli Gong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, W., Wang, T., Liu, Y., Li, Y., Bao, Z. (2024). New Filter2D Accelerator on the Versal Platform Powered by the AI Engine. In: Li, C., Li, Z., Shen, L., Wu, F., Gong, X. (eds) Advanced Parallel Processing Technologies. APPT 2023. Lecture Notes in Computer Science, vol 14103. Springer, Singapore. https://doi.org/10.1007/978-981-99-7872-4_24

Download citation

DOI: https://doi.org/10.1007/978-981-99-7872-4_24
Published: 08 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7871-7
Online ISBN: 978-981-99-7872-4
eBook Packages: Computer ScienceComputer Science (R0)Springer Nature Proceedings Computer Science

Keywords

Publish with us

Policies and ethics

Profiles

Wenbo Zhang View author profile

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

New Filter2D Accelerator on the Versal Platform Powered by the AI Engine