LogCTBL: a hybrid deep learning model for log-based anomaly detection

Huang, Hong; Luo, Wengang; Wang, Yunfei; Zhou, Yinghang; Huang, Weitao

doi:10.1007/s11227-025-06926-3

LogCTBL: a hybrid deep learning model for log-based anomaly detection

Published: 30 January 2025

Volume 81, article number 448 (2025)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Hong Huang^1,2,
Wengang Luo¹,
Yunfei Wang¹,
Yinghang Zhou¹ &
…
Weitao Huang¹

591 Accesses
4 Citations
Explore all metrics

Abstract

System logs are used to record the operational status of a system and significant events, and by performing anomaly detection on these logs, system faults can be rapidly and accurately identified. However, existing anomaly detection methods encounter difficulties with features that exhibit complex relationships, thereby limiting detection accuracy. Furthermore, the majority of methods depend on supervised learning, which hinders the detection of abnormal logs in large, unlabeled datasets. To address these limitations, this paper proposes a novel semi-supervised log anomaly detection model, termed LogCTBL (CNN-TCN-Bi-LSTM). Firstly, the model parses raw logs using the Drain3 tool. Secondly, it applies BERT for semantic embedding, thereby addressing the issue of log statement discreteness. Thirdly, the model further employs the HDBSCAN (hierarchical density-based spatial clustering of applications with noise) algorithm to estimate dummy tags for unlabeled data in the training set, thereby addressing the challenge of insufficient labeled data. Finally, the hybrid model is then applied to anomaly detection, and the efficacy of the proposed method is evaluated on the BGL and Thunderbird datasets. The findings demonstrate that the proposed method surpasses alternative approaches, attaining F1 scores of 99.87% and 99.78% in the two datasets, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Taxonomy of Anomalies in Log Data

Log Anomaly Detection Using Sequential Convolution Neural Networks and Dual-LSTM Model

Article 07 March 2023

Detecting Anomalies in Cluster System Using Hybrid Deep Learning Model

References

Susukailo V, Opirsky I, Yaremko O (2021) Methodology of isms establishment against modern cybersecurity threats. Future intent-based networking: on the QoS robust and energy efficient heterogeneous software defined networks. Springer, Berlin, pp 257–271
Google Scholar
Oprea A, Li Z, Yen T-F, Chin SH, Alrwais S (2015) Detection of early-stage enterprise infection by mining large-scale log data. In: 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. IEEE, pp 45–56
Harada Y, Yamagata Y, Mizuno O, Choi E-H (2017) Log-based anomaly detection of cps using a statistical method. In: 2017 8th International Workshop on Empirical Software Engineering in Practice (IWESEP). IEEE, pp 1–6
Hu C, Sun X, Dai H, Zhang H, Liu H (2023) Research on log anomaly detection based on sentence-BERT. Electronics 12(17):3580
Article Google Scholar
He P, Zhu J, Zheng Z, Lyu MR (2017) Drain: An online log parsing approach with fixed depth tree. In: 2017 IEEE International Conference on Web Services (ICWS). IEEE, pp 33–40
Zhang T, Qiu H, Castellano G, Rifai M, Chen CS, Pianese F (2023) System log parsing: a survey. IEEE Trans Knowl Data Eng 35(8):8596–8614
Google Scholar
Yu S, He P, Chen N, Wu Y (2023) Brain: Log parsing with bidirectional parallel tree. IEEE Trans Serv Comput 16(5):3224–3237
Article MATH Google Scholar
Vaarandi R, Pihelgas M (2015) Logcluster-a data clustering and pattern mining algorithm for event logs. In: 2015 11th International Conference on Network and Service Management (CNSM). IEEE, pp 1–7
Dai H, Li H, Chen C-S, Shang W, Chen T-H (2020) Logram: Efficient log parsing using $n$ n-gram dictionaries. IEEE Trans Software Eng 48(3):879–892
Google Scholar
Yu S, Chen N, Wu Y, Dou W (2023) Self-supervised log parsing using semantic contribution difference. J Syst Softw 200:111646
Article MATH Google Scholar
Du M, Li F (2018) Spell: Online streaming parsing of large unstructured system logs. IEEE Trans Knowl Data Eng 31(11):2213–2227
Article MATH Google Scholar
Tao S, Meng W, Cheng Y, Zhu Y, Liu Y, Du C, Han T, Zhao Y, Wang X, Yang H (2022) Logstamp: Automatic online log parsing based on sequence labelling. ACM SIGMETRICS Perform Eval Rev 49(4):93–98
Article Google Scholar
Yin Z, Kong X, Yin C (2024) Semi-supervised log anomaly detection based on bidirectional temporal convolution network. Comput Secur 140:103808
Article MATH Google Scholar
Wang Z, Tian J, Fang H, Chen L, Qin J (2022) Lightlog: A lightweight temporal convolutional network for log anomaly detection on the edge. Comput Netw 203:108616
Article MATH Google Scholar
Du M, Li F, Zheng G, Srikumar V (2017) Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp 1285–1298
Zhang X, Xu Y, Lin Q, Qiao B, Zhang H, Dang Y, Xie C, Yang X, Cheng Q, Li Z., et al (2019) Robust log-based anomaly detection on unstable log data. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. pp 807–817
Tang P, Guan Y (2024) Log anomaly detection based on BERT. Sig Image Video Process 18:1–11
MATH Google Scholar
Lee Y, Kim J, Kang P (2023) Lanobert: System log anomaly detection based on BERT masked language model. Appl Soft Comput 146:110689
Article MATH Google Scholar
Wang H, Chen Y, Zhang C, Li J, Gan C, Zhang Y, Chen X (2022) Genglad: A generated graph based log anomaly detection framework. International conference on smart computing and communication. Springer, Cham, pp 11–22
MATH Google Scholar
Liu X, Liu W, Di X, Li J, Cai B, Ren W, Yang H (2021) Lognads: Network anomaly detection scheme based on log semantics representation. Futur Gener Comput Syst 124:390–405
Article MATH Google Scholar
Xie Y, Yang K (2023) Log anomaly detection by adversarial autoencoders with graph feature fusion. IEEE Trans Reliab
Wang X, Cao Q, Wang Q, Cao Z, Zhang X, Wang P (2022) Robust log anomaly detection based on contrastive learning and multi-scale mass. J Supercomput 78(16):17491–17512
Article MATH Google Scholar
Qi J, Luan Z, Huang S, Fung C, Yang H, Li H, Zhu D, Qian D (2023) Logencoder: Log-based contrastive representation learning for anomaly detection. IEEE Trans Netw Serv Manage 20(2):1378–1391
Article Google Scholar
Huang H, Zhang X, Lu Y, Li Z, Zhou S (2024) Bstfnet: An encrypted malicious traffic classification method integrating global semantic and spatiotemporal features. Comput Mater Continua 78(3)
Wang J, Zhao C, He S, Gu Y, Alfarraj O, Abugabah A (2022) Loguad: Log unsupervised anomaly detection based on word2vec. Comput Syst Sci Eng 41(3):1207
Article Google Scholar
Naseem U, Razzak I, Khan SK, Prasad M (2021) A comprehensive survey on word representation models: from classical to state-of-the-art word representation language models. Trans Asian Low-Resour Lang Inf Process 20(5):1–35
Article MATH Google Scholar
Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M, et al. (2020) Transformers: State-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. pp 38–45
Wang L, Chen P, Chen L, Mou J (2021) Ship AIS trajectory clustering: an HDBSCAN-based approach. J Mar Sci Eng 9(6):566
Article MATH Google Scholar
Susanto Stiawan D, Rini DP, Arifin MAS, Idris MY, Alsharif N, Budiarto R (2023) Dimensional reduction with fast ICA for IoT botnet detection. J Appl Secur Res 18(4):665–688
Google Scholar
He S, Zhu J, He P, Lyu MR (2020) Loghub: A large collection of system log datasets towards automated log analytics. arXiv e-prints
Xu W, Huang L, Fox A, Patterson D, Jordan MI (2009) Detecting large-scale system problems by mining console logs. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles. pp 117–132
Li K-L, Huang H-K, Tian S-F, Xu W (2003) Improving one-class svm for anomaly detection. In: Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No. 03EX693), vol 5. IEEE, pp 3077–3081
Xu D, Wang Y, Meng Y, Zhang Z (2017) An improved data anomaly detection method based on isolation forest. In: 2017 10th International Symposium on Computational Intelligence and Design (ISCID), vol 2. IEEE, pp 287–291
Meng W, Liu Y, Zhu Y, Zhang S, Pei D, Liu Y, Chen Y, Zhang R, Tao S, Sun P, et al (2019) Loganomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs. In: IJCAI, vol 19. pp 4739–4745
Guo H, Yuan S, Wu X (2021) Logbert: Log anomaly detection via BERT. In: 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, pp 1–8
Nedelkoski S, Bogatinovski J, Acker A, Cardoso J, Kao O (2020) Self-attentive classification-based anomaly detection in unstructured logs. In: 2020 IEEE International Conference on Data Mining (ICDM). IEEE, pp 1196–1201
Almodovar C, Sabrina F, Karimi S, Azad S.: Logfit (2024) Log anomaly detection using fine-tuned language models. IEEE Transactions on Network and Service Management
Lee Y, Kim J, Kang P (2023) Lanobert: System log anomaly detection based on BERT masked language model. Appl Soft Comput 146:110689
Article MATH Google Scholar
He S, Zhu J, He P, Lyu M.R.: Experience report: System log analysis for anomaly detection. In: 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE). IEEE, pp 207–218
Niu W, Liao X, Huang S, Li Y, Zhang X, Li B (2024) A robust wide & deep learning framework for log-based anomaly detection. Appl Soft Comput 153:111314
Article MATH Google Scholar

Download references

Funding

This research was funded by Key Laboratory Project of Enterprise Informatization and IoT Measurement and Control Technology for Universities in Sichuan Province (NO: 2022WYJ03), Central Guidance for Local Science and Technology Development Fund Project (NO: 2024ZYD0266), Tibet Science and Technology Program (NO: XZ202401YD0023).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Sichuan University of Science & Engineering, Yibin, 644000, Sichuan, China
Hong Huang, Wengang Luo, Yunfei Wang, Yinghang Zhou & Weitao Huang
Key Laboratory of Enterprise Informatization and IoT Measurement and Control Technology for Universities in Sichuan Province, Yibin, 644000, Sichuan, China
Hong Huang

Authors

Hong Huang
View author publications
Search author on:PubMed Google Scholar
Wengang Luo
View author publications
Search author on:PubMed Google Scholar
Yunfei Wang
View author publications
Search author on:PubMed Google Scholar
Yinghang Zhou
View author publications
Search author on:PubMed Google Scholar
Weitao Huang
View author publications
Search author on:PubMed Google Scholar

Contributions

The authors confirm contribution to the paper as follows: Project supervision was done by Hong Huang, and research conception and design were done by Hong Huang and Wengang Luo. Initial draft writing was done by Hong Huang, Wengang Luo, and Yunfei Wang. Manuscript review and editing was done by Hong Huang, Wengang Luo, Yinghang Zhou, and Weitao Huang. Experimental data collection and organization were done by Wengang Luo, Yinghang Zhou, and Weitao Huang. experimental results analysis and interpretation were done by Yunfei Wang. Graph design was done by Yinghang Zhou and Weitao Huang. All authors reviewed the results and approved the final version of the manuscript.

Corresponding author

Correspondence to Wengang Luo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Huang, H., Luo, W., Wang, Y. et al. LogCTBL: a hybrid deep learning model for log-based anomaly detection. J Supercomput 81, 448 (2025). https://doi.org/10.1007/s11227-025-06926-3

Download citation

Accepted: 07 January 2025
Published: 30 January 2025
Version of record: 30 January 2025
DOI: https://doi.org/10.1007/s11227-025-06926-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

LogCTBL: a hybrid deep learning model for log-based anomaly detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Taxonomy of Anomalies in Log Data

Log Anomaly Detection Using Sequential Convolution Neural Networks and Dual-LSTM Model

Detecting Anomalies in Cluster System Using Hybrid Deep Learning Model

Explore related subjects

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now