MedSparse: A Medical Large Model for Efficient Inference and Chain-of-Thought Generation

Zhu, Yue; Deng, Dengke; Li, Ya; Li, Xiao’er; Li, Zhuo; Luo, Pengcheng

doi:10.1007/978-981-95-5631-1_6

Yue Zhu^15,16,
Dengke Deng^15,16,
Ya Li¹⁵,
Xiao’er Li¹⁵,
Zhuo Li¹⁵ &
…
Pengcheng Luo^15,16

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 16285))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

579 Accesses

Abstract

In the medical field, HuatuoGPT-o1, as the first medical large language model (LLM) capable of complex reasoning, has demonstrated outstanding performance on multiple medical datasets that require reasoning. However, the chain-of-thought (CoT) process in HuatuoGPT-o1 generates thousands of tokens, resulting in a significant demand for computational resources and time. Our analysis reveals that these tokens contribute differently to the final answer, thus leading to the proposal of the MedSparse method. MedSparse focuses on the key steps in the CoT process, enabling HuatuoGPT-o1 to better understand the importance of these steps in the reasoning process. Unlike HuatuoGPT-o1, which relies on the final answer as a supervisory signal, MedSparse uses the key steps in CoT as the supervisory signal, allowing HuatuoGPT-o1 to learn more effectively the role of these key steps in reasoning. MedSparse compresses the CoT in a controlled manner, optimizing it in three aspects: case description, background information, and logical reasoning. Experimental results show that MedSparse significantly reduces token usage while maintaining strong reasoning performance. Specifically, when the token count is reduced to half of the original amount, reasoning speed increases by 1.76 times, while performance remains at 93% of the original. Compared with various 7B-scale general models and medical models, MedSparse consistently outperforms other models on multiple medical datasets.

Y. Zhu and D. Deng—These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Path-based knowledge reasoning with textual semantic information for medical knowledge graph completion

Article Open access 29 November 2021

Clinical and molecular characterization of isolated M1 disease in pediatric medulloblastoma: experience from the German HIT-MED studies

Article Open access 21 February 2022

Deep Multimodal Guidance for Medical Image Classification

References

Achiam, J., et al.: GPT-4 technical report. arXiv preprint arXiv:2303.08774 (2023)
Stechly, K., Valmeekam, K., Kambhampati, S.: Chain of thoughtlessness? An analysis of cot in planning. In: The Thirty-eighth Annual Conference on Neural Information Processing Systems (2024)
Google Scholar
Neha, F., Bhati, D.: A survey of deepseek models. In: Authorea Preprints (2025)
Google Scholar
Chen, J., et al.: HuatuoGPT-o1, towards medical complex reasoning with LLMs. In: arXiv preprint arXiv:2412.18925 (2024)
Cheng, J., Van Durme, B.: Compressed chain of thought: efficient reasoning through dense representations. arXiv preprint arXiv:2412.13171 (2024)
Kojima, T., et al.: Large language models are zero-shot reasoners. Adv. Neural. Inf. Process. Syst. 35, 22199–22213 (2022)
Google Scholar
Jin, D., et al.: What disease does this patient have? A large-scale open domain question answering dataset from medical exams. Appl. Sci. 11(14), 6421 (2021)
Article Google Scholar
Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: a large-scale multi-subject multi-choice dataset for medical domain question answering. In: Conference on Health, Inference, and Learning, pp. 248–260. PMLR (2022)
Google Scholar
Jin, Q., et al.: PubMedQA: a dataset for biomedical research question answering. arXiv preprint arXiv:1909.06146 (2019)
Wang, Y., et al.: MMLU-pro: a more robust and challenging multi-task language understanding benchmark. In: The Thirty-Eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track (2024)
Google Scholar
Rein, D., et al. GPQA: a graduate-level google-proof q and a benchmark. In: First Conference on Language Modeling (2024)
Google Scholar
Yao, Y., et al.: A survey on large language model (LLM) security and privacy: the good, the bad, and the ugly. In: High-Confidence Computing, p. 100211 (2024)
Google Scholar
Liu, L., et al.: A survey on medical large language models: technology, application, trustworthiness, and future directions. arXiv preprint arXiv:2406.03712 (2024)
Lightman, H., et al.: Let’s verify step by step. In: The Twelfth International Conference on Learning Representations (2023)
Google Scholar
Roumeliotis, K.I., Tselikas, N.D.: ChatGPT and open-AI models: a preliminary review. Future Internet 15(6), 192 (2023)
Article Google Scholar
Liu, T., et al.: Can language models learn to skip steps? arXiv preprint arXiv:2411.01855 (2024)
Ma, Y., et al.: What are step-level reward models rewarding? Counterintuitive findings from MCTS-boosted mathematical reasoning. arXiv preprint arXiv:2412.15904 (2024)
Jiang, H., et al.: LLMLingua: compressing prompts for accelerated inference of large language models. arXiv preprint arXiv:2310.05736 (2023)
Xia, H., et al.: TokenSkip: controllable chain-of-thought compression in LLMs. arXiv preprint arXiv:2502.12067 (2025)
Labrak, Y., et al.: BioMistral: a collection of open-source pretrained large language models for medical domains. In: arXiv preprint arXiv:2402.10373 (2024)
Shoham, O.B., Rappoport, N.: MedConceptsQA: open source medical concepts QA benchmark. Comput. Biol. Med. 182, 109089 (2024)
Article Google Scholar
Zhang, K., et al.: UltraMedical: building specialized generalists in biomedicine. Adv. Neural. Inf. Process. Syst. 37, 26045–26081 (2024)
Google Scholar
Jiang, F.: Identifying and mitigating vulnerabilities in LLM-integrated applications. MA thesis. University of Washington (2024)
Google Scholar
Young, A., et al.: Yi: open foundation models by 01. AI. arXiv preprint arXiv:2403.04652 (2024)
Wu, C., et al.: PMC-LLaMA: toward building open-source language models for medicine. J. Am. Med. Inform. Assoc. 31(9), 1833–1843 (2024)
Article Google Scholar
Team GLM, et al.: ChatGLM: a family of large language models from GLM- 130b to GLM-4 all tools. arXiv preprint arXiv:2406.12793 (2024)
Bai, J., et al.: Qwen technical report. arXiv preprint arXiv:2309.16609 (2023)
Gemma Team, et al.: Gemma 2: improving open language models at a practical size. arXiv preprint arXiv:2408.00118 (2024)
Augustin, A., et al.: A study of LoRa: long range low power networks for the internet of things. Sensors 16(9), 1466 (2016)
Article Google Scholar

Download references

Acknowledgement

This work was supported by the “Science and Technology Innovation Yongjiang 2035” Major Application Demonstration Plan Project in Ningbo (Grant No. 2024Z005).

Author information

Authors and Affiliations

Ningbo Artifcial Intelligence Institute, Shanghai Jiao Tong University, Ningbo, China
Yue Zhu, Dengke Deng, Ya Li, Xiao’er Li, Zhuo Li & Pengcheng Luo
School of Automation and Intelligent Sensing, Shanghai Jiao Tong University, Shanghai, China
Yue Zhu, Dengke Deng & Pengcheng Luo

Authors

Yue Zhu
View author publications
Search author on:PubMed Google Scholar
Dengke Deng
View author publications
Search author on:PubMed Google Scholar
Ya Li
View author publications
Search author on:PubMed Google Scholar
Xiao’er Li
View author publications
Search author on:PubMed Google Scholar
Zhuo Li
View author publications
Search author on:PubMed Google Scholar
Pengcheng Luo
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Pengcheng Luo.

Editor information

Editors and Affiliations

University of Surrey, Guildford, UK
Josef Kittler
Shanghai Jiao Tong University, Shanghai, China
Hongkai Xiong
Nanjing University of Science and Technology, Nanjing, Jiangsu, China
Jian Yang
Chinese Academy of Sciences, Beijing, China
Xilin Chen
Tsinghua University, Beijing, China
Jiwen Lu
Shanghai Jiao Tong University, Shanghai, China
Weiyao Lin
ShanghaiTech University, Shanghai, China
Jingyi Yu
Sun Yat-sen University, Guangzhou, China
Weishi Zheng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, Y., Deng, D., Li, Y., Li, X., Li, Z., Luo, P. (2026). MedSparse: A Medical Large Model for Efficient Inference and Chain-of-Thought Generation. In: Kittler, J., et al. Pattern Recognition and Computer Vision. PRCV 2025. Lecture Notes in Computer Science, vol 16285. Springer, Singapore. https://doi.org/10.1007/978-981-95-5631-1_6

Download citation

DOI: https://doi.org/10.1007/978-981-95-5631-1_6
Published: 28 January 2026
Publisher Name: Springer, Singapore
Print ISBN: 978-981-95-5630-4
Online ISBN: 978-981-95-5631-1
eBook Packages: Computer ScienceComputer Science (R0)Springer Nature Proceedings Computer Science

Keywords

Publish with us

Policies and ethics

MedSparse: A Medical Large Model for Efficient Inference and Chain-of-Thought Generation