close
Skip to main content
Log in

LogCTBL: a hybrid deep learning model for log-based anomaly detection

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

System logs are used to record the operational status of a system and significant events, and by performing anomaly detection on these logs, system faults can be rapidly and accurately identified. However, existing anomaly detection methods encounter difficulties with features that exhibit complex relationships, thereby limiting detection accuracy. Furthermore, the majority of methods depend on supervised learning, which hinders the detection of abnormal logs in large, unlabeled datasets. To address these limitations, this paper proposes a novel semi-supervised log anomaly detection model, termed LogCTBL (CNN-TCN-Bi-LSTM). Firstly, the model parses raw logs using the Drain3 tool. Secondly, it applies BERT for semantic embedding, thereby addressing the issue of log statement discreteness. Thirdly, the model further employs the HDBSCAN (hierarchical density-based spatial clustering of applications with noise) algorithm to estimate dummy tags for unlabeled data in the training set, thereby addressing the challenge of insufficient labeled data. Finally, the hybrid model is then applied to anomaly detection, and the efficacy of the proposed method is evaluated on the BGL and Thunderbird datasets. The findings demonstrate that the proposed method surpasses alternative approaches, attaining F1 scores of 99.87% and 99.78% in the two datasets, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
The alternative text for this image may have been generated using AI.
Fig. 2
The alternative text for this image may have been generated using AI.
Algorithm 1
The alternative text for this image may have been generated using AI.
Fig. 3
The alternative text for this image may have been generated using AI.
Fig. 4
The alternative text for this image may have been generated using AI.
Algorithm 2
The alternative text for this image may have been generated using AI.
Fig. 5
The alternative text for this image may have been generated using AI.
Fig. 6
The alternative text for this image may have been generated using AI.
Fig. 7
The alternative text for this image may have been generated using AI.
Fig. 8
The alternative text for this image may have been generated using AI.
Fig. 9
The alternative text for this image may have been generated using AI.
Fig. 10
The alternative text for this image may have been generated using AI.
Fig. 11
The alternative text for this image may have been generated using AI.
Fig. 12
The alternative text for this image may have been generated using AI.

Similar content being viewed by others

References

  1. Susukailo V, Opirsky I, Yaremko O (2021) Methodology of isms establishment against modern cybersecurity threats. Future intent-based networking: on the QoS robust and energy efficient heterogeneous software defined networks. Springer, Berlin, pp 257–271

    Google Scholar 

  2. Oprea A, Li Z, Yen T-F, Chin SH, Alrwais S (2015) Detection of early-stage enterprise infection by mining large-scale log data. In: 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. IEEE, pp 45–56

  3. Harada Y, Yamagata Y, Mizuno O, Choi E-H (2017) Log-based anomaly detection of cps using a statistical method. In: 2017 8th International Workshop on Empirical Software Engineering in Practice (IWESEP). IEEE, pp 1–6

  4. Hu C, Sun X, Dai H, Zhang H, Liu H (2023) Research on log anomaly detection based on sentence-BERT. Electronics 12(17):3580

    Article  Google Scholar 

  5. He P, Zhu J, Zheng Z, Lyu MR (2017) Drain: An online log parsing approach with fixed depth tree. In: 2017 IEEE International Conference on Web Services (ICWS). IEEE, pp 33–40

  6. Zhang T, Qiu H, Castellano G, Rifai M, Chen CS, Pianese F (2023) System log parsing: a survey. IEEE Trans Knowl Data Eng 35(8):8596–8614

    Google Scholar 

  7. Yu S, He P, Chen N, Wu Y (2023) Brain: Log parsing with bidirectional parallel tree. IEEE Trans Serv Comput 16(5):3224–3237

    Article  MATH  Google Scholar 

  8. Vaarandi R, Pihelgas M (2015) Logcluster-a data clustering and pattern mining algorithm for event logs. In: 2015 11th International Conference on Network and Service Management (CNSM). IEEE, pp 1–7

  9. Dai H, Li H, Chen C-S, Shang W, Chen T-H (2020) Logram: Efficient log parsing using \(n\) n-gram dictionaries. IEEE Trans Software Eng 48(3):879–892

    Google Scholar 

  10. Yu S, Chen N, Wu Y, Dou W (2023) Self-supervised log parsing using semantic contribution difference. J Syst Softw 200:111646

    Article  MATH  Google Scholar 

  11. Du M, Li F (2018) Spell: Online streaming parsing of large unstructured system logs. IEEE Trans Knowl Data Eng 31(11):2213–2227

    Article  MATH  Google Scholar 

  12. Tao S, Meng W, Cheng Y, Zhu Y, Liu Y, Du C, Han T, Zhao Y, Wang X, Yang H (2022) Logstamp: Automatic online log parsing based on sequence labelling. ACM SIGMETRICS Perform Eval Rev 49(4):93–98

    Article  Google Scholar 

  13. Yin Z, Kong X, Yin C (2024) Semi-supervised log anomaly detection based on bidirectional temporal convolution network. Comput Secur 140:103808

    Article  MATH  Google Scholar 

  14. Wang Z, Tian J, Fang H, Chen L, Qin J (2022) Lightlog: A lightweight temporal convolutional network for log anomaly detection on the edge. Comput Netw 203:108616

    Article  MATH  Google Scholar 

  15. Du M, Li F, Zheng G, Srikumar V (2017) Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp 1285–1298

  16. Zhang X, Xu Y, Lin Q, Qiao B, Zhang H, Dang Y, Xie C, Yang X, Cheng Q, Li Z., et al (2019) Robust log-based anomaly detection on unstable log data. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. pp 807–817

  17. Tang P, Guan Y (2024) Log anomaly detection based on BERT. Sig Image Video Process 18:1–11

    MATH  Google Scholar 

  18. Lee Y, Kim J, Kang P (2023) Lanobert: System log anomaly detection based on BERT masked language model. Appl Soft Comput 146:110689

    Article  MATH  Google Scholar 

  19. Wang H, Chen Y, Zhang C, Li J, Gan C, Zhang Y, Chen X (2022) Genglad: A generated graph based log anomaly detection framework. International conference on smart computing and communication. Springer, Cham, pp 11–22

    MATH  Google Scholar 

  20. Liu X, Liu W, Di X, Li J, Cai B, Ren W, Yang H (2021) Lognads: Network anomaly detection scheme based on log semantics representation. Futur Gener Comput Syst 124:390–405

    Article  MATH  Google Scholar 

  21. Xie Y, Yang K (2023) Log anomaly detection by adversarial autoencoders with graph feature fusion. IEEE Trans Reliab

  22. Wang X, Cao Q, Wang Q, Cao Z, Zhang X, Wang P (2022) Robust log anomaly detection based on contrastive learning and multi-scale mass. J Supercomput 78(16):17491–17512

    Article  MATH  Google Scholar 

  23. Qi J, Luan Z, Huang S, Fung C, Yang H, Li H, Zhu D, Qian D (2023) Logencoder: Log-based contrastive representation learning for anomaly detection. IEEE Trans Netw Serv Manage 20(2):1378–1391

    Article  Google Scholar 

  24. Huang H, Zhang X, Lu Y, Li Z, Zhou S (2024) Bstfnet: An encrypted malicious traffic classification method integrating global semantic and spatiotemporal features. Comput Mater Continua 78(3)

  25. Wang J, Zhao C, He S, Gu Y, Alfarraj O, Abugabah A (2022) Loguad: Log unsupervised anomaly detection based on word2vec. Comput Syst Sci Eng 41(3):1207

    Article  Google Scholar 

  26. Naseem U, Razzak I, Khan SK, Prasad M (2021) A comprehensive survey on word representation models: from classical to state-of-the-art word representation language models. Trans Asian Low-Resour Lang Inf Process 20(5):1–35

    Article  MATH  Google Scholar 

  27. Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M, et al. (2020) Transformers: State-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. pp 38–45

  28. Wang L, Chen P, Chen L, Mou J (2021) Ship AIS trajectory clustering: an HDBSCAN-based approach. J Mar Sci Eng 9(6):566

    Article  MATH  Google Scholar 

  29. Susanto Stiawan D, Rini DP, Arifin MAS, Idris MY, Alsharif N, Budiarto R (2023) Dimensional reduction with fast ICA for IoT botnet detection. J Appl Secur Res 18(4):665–688

    Google Scholar 

  30. He S, Zhu J, He P, Lyu MR (2020) Loghub: A large collection of system log datasets towards automated log analytics. arXiv e-prints

  31. Xu W, Huang L, Fox A, Patterson D, Jordan MI (2009) Detecting large-scale system problems by mining console logs. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles. pp 117–132

  32. Li K-L, Huang H-K, Tian S-F, Xu W (2003) Improving one-class svm for anomaly detection. In: Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No. 03EX693), vol 5. IEEE, pp 3077–3081

  33. Xu D, Wang Y, Meng Y, Zhang Z (2017) An improved data anomaly detection method based on isolation forest. In: 2017 10th International Symposium on Computational Intelligence and Design (ISCID), vol 2. IEEE, pp 287–291

  34. Meng W, Liu Y, Zhu Y, Zhang S, Pei D, Liu Y, Chen Y, Zhang R, Tao S, Sun P, et al (2019) Loganomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs. In: IJCAI, vol 19. pp 4739–4745

  35. Guo H, Yuan S, Wu X (2021) Logbert: Log anomaly detection via BERT. In: 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, pp 1–8

  36. Nedelkoski S, Bogatinovski J, Acker A, Cardoso J, Kao O (2020) Self-attentive classification-based anomaly detection in unstructured logs. In: 2020 IEEE International Conference on Data Mining (ICDM). IEEE, pp 1196–1201

  37. Almodovar C, Sabrina F, Karimi S, Azad S.: Logfit (2024) Log anomaly detection using fine-tuned language models. IEEE Transactions on Network and Service Management

  38. Lee Y, Kim J, Kang P (2023) Lanobert: System log anomaly detection based on BERT masked language model. Appl Soft Comput 146:110689

    Article  MATH  Google Scholar 

  39. He S, Zhu J, He P, Lyu M.R.: Experience report: System log analysis for anomaly detection. In: 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE). IEEE, pp 207–218

  40. Niu W, Liao X, Huang S, Li Y, Zhang X, Li B (2024) A robust wide & deep learning framework for log-based anomaly detection. Appl Soft Comput 153:111314

    Article  MATH  Google Scholar 

Download references

Funding

This research was funded by Key Laboratory Project of Enterprise Informatization and IoT Measurement and Control Technology for Universities in Sichuan Province (NO: 2022WYJ03), Central Guidance for Local Science and Technology Development Fund Project (NO: 2024ZYD0266), Tibet Science and Technology Program (NO: XZ202401YD0023).

Author information

Authors and Affiliations

Authors

Contributions

The authors confirm contribution to the paper as follows: Project supervision was done by Hong Huang, and research conception and design were done by Hong Huang and Wengang Luo. Initial draft writing was done by Hong Huang, Wengang Luo, and Yunfei Wang. Manuscript review and editing was done by Hong Huang, Wengang Luo, Yinghang Zhou, and Weitao Huang. Experimental data collection and organization were done by Wengang Luo, Yinghang Zhou, and Weitao Huang. experimental results analysis and interpretation were done by Yunfei Wang. Graph design was done by Yinghang Zhou and Weitao Huang. All authors reviewed the results and approved the final version of the manuscript.

Corresponding author

Correspondence to Wengang Luo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, H., Luo, W., Wang, Y. et al. LogCTBL: a hybrid deep learning model for log-based anomaly detection. J Supercomput 81, 448 (2025). https://doi.org/10.1007/s11227-025-06926-3

Download citation

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1007/s11227-025-06926-3

Keywords