close
Skip to main content

New Filter2D Accelerator on the Versal Platform Powered by the AI Engine

  • Conference paper
  • First Online:
Advanced Parallel Processing Technologies (APPT 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14103))

Included in the following conference series:

  • 989 Accesses

  • 3 Citations

Abstract

Filter2D, as a fundamental operator of CNN, has vital optimization and acceleration significance in computer vision (CV) applications, so it is designed as the CCFSys-CCC2023 competition CV track. Based on the CCC2023 competition designated Versal ACAP Architecture, we proposed the AI Engine (AIE) kernel and AIE graph design scheme and reconstructed the programmable logic (PL) and Processing System (PS) accordingly. Results show that, compared to the only PS scheme, our design achieve about 104.51\(\sim \)139.41 speedup on the specified platform Versal ACAP, which overcame all other 50+ group and won the championship of CCC2023.

Supported by AMD University Program.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Free shipping worldwide - view details

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. AMD/Xilinx: AI engine API and intrinsics user guide

    Google Scholar 

  2. AMD/Xilinx: AI engine white paper

    Google Scholar 

  3. AMD/Xilinx: CCFSys-CCC2023. https://ccfsys-ccc.github.io/2023/

  4. AMD/Xilinx: CCFSys-CCC2023. https://www.amd-haccs.io/index.html

  5. AMD/Xilinx: Versal ACAP. http://www.xilinx.com/versal

  6. AMD/Xilinx: Versal ACAP AI engine architecture manual - AM009. https://docs.xilinx.com/r/en-US/am009-versal-ai-engine/

  7. Bai, L., Zhao, Y.M., Huang, X.M.: A CNN accelerator on FPGA using depthwise separable convolution. IEEE Trans. Circuits Syst. II-Express Briefs 65(10), 1415–1419 (2018). https://doi.org/10.1109/tcsii.2018.2865896. Go to ISI: //WOS:000446155600027

  8. Chen, X.M., Han, Y.H., Wang, Y., IEEE: communication lower bound in convolution accelerators. In: 26th IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 529–541. International Symposium on High-Performance Computer Architecture-Proceedings (2020). https://doi.org/10.1109/hpca47549.2020.00050. Go to ISI: //WOS:000531494100040

  9. Deng, H.P., et al.: 3D-VNPU: a flexible accelerator for 2D/3D CNNs on FPGA. In: 29th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 181–185. Annual IEEE Symposium on Field-Programmable Custom Computing Machines (2021). https://doi.org/10.1109/fccm51124.2021.00029. Go to ISI: //WOS:000681289100021

  10. Gilan, A.A., Emad, M., Alizadeh, B.: FPGA-based implementation of a real-time object recognition system using convolutional neural network. IEEE Trans. Circuits Syst. II-Exp. Briefs 67(4), 755–759 (2020). https://doi.org/10.1109/tcsii.2019.2922372. Go to ISI ://WOS:000522403100031

  11. Jia, X., et al.: XVDPU: a high performance CNN accelerator on the versal platform powered by the AI engine. In: 2022 32nd International Conference on Field-Programmable Logic and Applications, FPL, pp. 209–217. International Conference on Field Programmable Logic and Applications, AMD; Intel; Groq; Twosigma; Lattice Semicond; XILINX; Maxeler; Two Sigma (2022). https://doi.org/10.1109/FPL57034.2022.00041. 32nd International Conference on Field-Programmable Logic and Applications (FPL), Belfast, North Ireland, Aug 29-Sep 02, 2022

  12. Thomas K, A., Poddar, S., Mondal, H.K.: A CNN hardware accelerator using triangle-based convolution. J. Emerg. Technol. Comput. Syst. 18(4), Article 78 (2022). https://doi.org/10.1145/3544975

  13. Kelefouras, V., Keramidas, G.: Design and implementation of 2D convolution on X86/X64 processors. IEEE Trans. Parallel Distrib. Syst. 33(12), 3800–3815 (2022). https://doi.org/10.1109/tpds.2022.3171471. Go to ISI: //WOS:000831139000004

  14. Kim, H., Song, W.J.: Las: locality-aware scheduling for GEMM-accelerated convolutions in GPUs. IEEE Trans. Parallel Distrib. Syst. 34(5), 1479–1494 (2023). https://doi.org/10.1109/TPDS.2023.3247808

    Article  Google Scholar 

  15. Li, G.D., Min, L.Q., Zang, H.Y.: Color edge detections based on cellular neural network. Int. J. Bifurcation Chaos 18(4), 1231–1242 (2008). https://doi.org/10.1142/s0218127408020963. Go to ISI: //WOS:000257292300022

  16. Lym, S., Lee, D., O’Connor, M., Chatterjee, N., Erez, M.: Delta: GPU performance model for deep learning applications with in-depth memory system traffic analysis. In: IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 293–303. IEEE International Symposium on Performance Analysis of Systems and Software-ISPASS (2019). https://doi.org/10.1109/ispass.2019.00041. Go to ISI: //WOS:000470201600033

  17. Ma, Y.F., Cao, Y., Vrudhula, S., Seo, J.S.: Optimizing the convolution operation to accelerate deep neural networks on FPGA. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 26(7), 1354–1367 (2018). https://doi.org/10.1109/tvlsi.2018.2815603. Go to ISI: //WOS:000437031400013

  18. Mo, H., et al.: 9.2 a 28nm 12.1 TOPS/W dual-mode CNN processor using effective-weight-based convolution and error-compensation-based prediction. In: 2021 IEEE International Solid-State Circuits Conference (ISSCC), vol. 64, pp. 146–148 (2021). https://doi.org/10.1109/ISSCC42613.2021.9365943

  19. Moradifar, M., Shahbahrami, A.: Performance improvement of gaussian filter using simd technology. In: 2020 International Conference on Machine Vision and Image Processing (MVIP), pp. 1–6 (2020). https://doi.org/10.1109/MVIP49855.2020.9116883

  20. Ye, J.Y., Shen, Z.Y., Behrani, P., Ding, F., Shi, Y.Q.: Detecting usm image sharpening by using CNN. Signal Process.-Image Commun. 68, 258–264 (2018). https://doi.org/10.1016/j.image.2018.04.016. Go to ISI: //WOS:000447572100023

  21. Zhuang, J., et al.: CHARM: composing heterogeneous accelerators for matrix multiply on versal ACAP architecture. In: Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA 2023, pp. 153–164. Association for Computing Machinery, New York (2023). https://doi.org/10.1145/3543622.3573210

Download references

Acknowledgement

We thank the support the AMD/Xilinx for board and software donation and support from AMD/Xilinx Heterogeneous Accelerated Compute Cluster at NUS. We thank all the reviewers and CCFSys-CCC committee for their valuable feedback.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenbo Zhang.

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, W., Wang, T., Liu, Y., Li, Y., Bao, Z. (2024). New Filter2D Accelerator on the Versal Platform Powered by the AI Engine. In: Li, C., Li, Z., Shen, L., Wu, F., Gong, X. (eds) Advanced Parallel Processing Technologies. APPT 2023. Lecture Notes in Computer Science, vol 14103. Springer, Singapore. https://doi.org/10.1007/978-981-99-7872-4_24

Download citation

Keywords

Publish with us

Policies and ethics

Profiles

  1. Wenbo Zhang