Abstract
Filter2D, as a fundamental operator of CNN, has vital optimization and acceleration significance in computer vision (CV) applications, so it is designed as the CCFSys-CCC2023 competition CV track. Based on the CCC2023 competition designated Versal ACAP Architecture, we proposed the AI Engine (AIE) kernel and AIE graph design scheme and reconstructed the programmable logic (PL) and Processing System (PS) accordingly. Results show that, compared to the only PS scheme, our design achieve about 104.51\(\sim \)139.41 speedup on the specified platform Versal ACAP, which overcame all other 50+ group and won the championship of CCC2023.
Supported by AMD University Program.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
AMD/Xilinx: AI engine API and intrinsics user guide
AMD/Xilinx: AI engine white paper
AMD/Xilinx: CCFSys-CCC2023. https://ccfsys-ccc.github.io/2023/
AMD/Xilinx: CCFSys-CCC2023. https://www.amd-haccs.io/index.html
AMD/Xilinx: Versal ACAP. http://www.xilinx.com/versal
AMD/Xilinx: Versal ACAP AI engine architecture manual - AM009. https://docs.xilinx.com/r/en-US/am009-versal-ai-engine/
Bai, L., Zhao, Y.M., Huang, X.M.: A CNN accelerator on FPGA using depthwise separable convolution. IEEE Trans. Circuits Syst. II-Express Briefs 65(10), 1415–1419 (2018). https://doi.org/10.1109/tcsii.2018.2865896. Go to ISI: //WOS:000446155600027
Chen, X.M., Han, Y.H., Wang, Y., IEEE: communication lower bound in convolution accelerators. In: 26th IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 529–541. International Symposium on High-Performance Computer Architecture-Proceedings (2020). https://doi.org/10.1109/hpca47549.2020.00050. Go to ISI: //WOS:000531494100040
Deng, H.P., et al.: 3D-VNPU: a flexible accelerator for 2D/3D CNNs on FPGA. In: 29th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 181–185. Annual IEEE Symposium on Field-Programmable Custom Computing Machines (2021). https://doi.org/10.1109/fccm51124.2021.00029. Go to ISI: //WOS:000681289100021
Gilan, A.A., Emad, M., Alizadeh, B.: FPGA-based implementation of a real-time object recognition system using convolutional neural network. IEEE Trans. Circuits Syst. II-Exp. Briefs 67(4), 755–759 (2020). https://doi.org/10.1109/tcsii.2019.2922372. Go to ISI ://WOS:000522403100031
Jia, X., et al.: XVDPU: a high performance CNN accelerator on the versal platform powered by the AI engine. In: 2022 32nd International Conference on Field-Programmable Logic and Applications, FPL, pp. 209–217. International Conference on Field Programmable Logic and Applications, AMD; Intel; Groq; Twosigma; Lattice Semicond; XILINX; Maxeler; Two Sigma (2022). https://doi.org/10.1109/FPL57034.2022.00041. 32nd International Conference on Field-Programmable Logic and Applications (FPL), Belfast, North Ireland, Aug 29-Sep 02, 2022
Thomas K, A., Poddar, S., Mondal, H.K.: A CNN hardware accelerator using triangle-based convolution. J. Emerg. Technol. Comput. Syst. 18(4), Article 78 (2022). https://doi.org/10.1145/3544975
Kelefouras, V., Keramidas, G.: Design and implementation of 2D convolution on X86/X64 processors. IEEE Trans. Parallel Distrib. Syst. 33(12), 3800–3815 (2022). https://doi.org/10.1109/tpds.2022.3171471. Go to ISI: //WOS:000831139000004
Kim, H., Song, W.J.: Las: locality-aware scheduling for GEMM-accelerated convolutions in GPUs. IEEE Trans. Parallel Distrib. Syst. 34(5), 1479–1494 (2023). https://doi.org/10.1109/TPDS.2023.3247808
Li, G.D., Min, L.Q., Zang, H.Y.: Color edge detections based on cellular neural network. Int. J. Bifurcation Chaos 18(4), 1231–1242 (2008). https://doi.org/10.1142/s0218127408020963. Go to ISI: //WOS:000257292300022
Lym, S., Lee, D., O’Connor, M., Chatterjee, N., Erez, M.: Delta: GPU performance model for deep learning applications with in-depth memory system traffic analysis. In: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 293–303. IEEE International Symposium on Performance Analysis of Systems and Software-ISPASS (2019). https://doi.org/10.1109/ispass.2019.00041. Go to ISI: //WOS:000470201600033
Ma, Y.F., Cao, Y., Vrudhula, S., Seo, J.S.: Optimizing the convolution operation to accelerate deep neural networks on FPGA. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 26(7), 1354–1367 (2018). https://doi.org/10.1109/tvlsi.2018.2815603. Go to ISI: //WOS:000437031400013
Mo, H., et al.: 9.2 a 28nm 12.1 TOPS/W dual-mode CNN processor using effective-weight-based convolution and error-compensation-based prediction. In: 2021 IEEE International Solid-State Circuits Conference (ISSCC), vol. 64, pp. 146–148 (2021). https://doi.org/10.1109/ISSCC42613.2021.9365943
Moradifar, M., Shahbahrami, A.: Performance improvement of gaussian filter using simd technology. In: 2020 International Conference on Machine Vision and Image Processing (MVIP), pp. 1–6 (2020). https://doi.org/10.1109/MVIP49855.2020.9116883
Ye, J.Y., Shen, Z.Y., Behrani, P., Ding, F., Shi, Y.Q.: Detecting usm image sharpening by using CNN. Signal Process.-Image Commun. 68, 258–264 (2018). https://doi.org/10.1016/j.image.2018.04.016. Go to ISI: //WOS:000447572100023
Zhuang, J., et al.: CHARM: composing heterogeneous accelerators for matrix multiply on versal ACAP architecture. In: Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA 2023, pp. 153–164. Association for Computing Machinery, New York (2023). https://doi.org/10.1145/3543622.3573210
Acknowledgement
We thank the support the AMD/Xilinx for board and software donation and support from AMD/Xilinx Heterogeneous Accelerated Compute Cluster at NUS. We thank all the reviewers and CCFSys-CCC committee for their valuable feedback.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhang, W., Wang, T., Liu, Y., Li, Y., Bao, Z. (2024). New Filter2D Accelerator on the Versal Platform Powered by the AI Engine. In: Li, C., Li, Z., Shen, L., Wu, F., Gong, X. (eds) Advanced Parallel Processing Technologies. APPT 2023. Lecture Notes in Computer Science, vol 14103. Springer, Singapore. https://doi.org/10.1007/978-981-99-7872-4_24
Download citation
DOI: https://doi.org/10.1007/978-981-99-7872-4_24
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7871-7
Online ISBN: 978-981-99-7872-4
eBook Packages: Computer ScienceComputer Science (R0)Springer Nature Proceedings Computer Science

