Abstract
In the medical field, HuatuoGPT-o1, as the first medical large language model (LLM) capable of complex reasoning, has demonstrated outstanding performance on multiple medical datasets that require reasoning. However, the chain-of-thought (CoT) process in HuatuoGPT-o1 generates thousands of tokens, resulting in a significant demand for computational resources and time. Our analysis reveals that these tokens contribute differently to the final answer, thus leading to the proposal of the MedSparse method. MedSparse focuses on the key steps in the CoT process, enabling HuatuoGPT-o1 to better understand the importance of these steps in the reasoning process. Unlike HuatuoGPT-o1, which relies on the final answer as a supervisory signal, MedSparse uses the key steps in CoT as the supervisory signal, allowing HuatuoGPT-o1 to learn more effectively the role of these key steps in reasoning. MedSparse compresses the CoT in a controlled manner, optimizing it in three aspects: case description, background information, and logical reasoning. Experimental results show that MedSparse significantly reduces token usage while maintaining strong reasoning performance. Specifically, when the token count is reduced to half of the original amount, reasoning speed increases by 1.76 times, while performance remains at 93% of the original. Compared with various 7B-scale general models and medical models, MedSparse consistently outperforms other models on multiple medical datasets.
Y. Zhu and D. Deng—These authors contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Achiam, J., et al.: GPT-4 technical report. arXiv preprint arXiv:2303.08774 (2023)
Stechly, K., Valmeekam, K., Kambhampati, S.: Chain of thoughtlessness? An analysis of cot in planning. In: The Thirty-eighth Annual Conference on Neural Information Processing Systems (2024)
Neha, F., Bhati, D.: A survey of deepseek models. In: Authorea Preprints (2025)
Chen, J., et al.: HuatuoGPT-o1, towards medical complex reasoning with LLMs. In: arXiv preprint arXiv:2412.18925 (2024)
Cheng, J., Van Durme, B.: Compressed chain of thought: efficient reasoning through dense representations. arXiv preprint arXiv:2412.13171 (2024)
Kojima, T., et al.: Large language models are zero-shot reasoners. Adv. Neural. Inf. Process. Syst. 35, 22199–22213 (2022)
Jin, D., et al.: What disease does this patient have? A large-scale open domain question answering dataset from medical exams. Appl. Sci. 11(14), 6421 (2021)
Pal, A., Umapathi, L.K., Sankarasubbu, M.: MedMCQA: a large-scale multi-subject multi-choice dataset for medical domain question answering. In: Conference on Health, Inference, and Learning, pp. 248–260. PMLR (2022)
Jin, Q., et al.: PubMedQA: a dataset for biomedical research question answering. arXiv preprint arXiv:1909.06146 (2019)
Wang, Y., et al.: MMLU-pro: a more robust and challenging multi-task language understanding benchmark. In: The Thirty-Eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track (2024)
Rein, D., et al. GPQA: a graduate-level google-proof q and a benchmark. In: First Conference on Language Modeling (2024)
Yao, Y., et al.: A survey on large language model (LLM) security and privacy: the good, the bad, and the ugly. In: High-Confidence Computing, p. 100211 (2024)
Liu, L., et al.: A survey on medical large language models: technology, application, trustworthiness, and future directions. arXiv preprint arXiv:2406.03712 (2024)
Lightman, H., et al.: Let’s verify step by step. In: The Twelfth International Conference on Learning Representations (2023)
Roumeliotis, K.I., Tselikas, N.D.: ChatGPT and open-AI models: a preliminary review. Future Internet 15(6), 192 (2023)
Liu, T., et al.: Can language models learn to skip steps? arXiv preprint arXiv:2411.01855 (2024)
Ma, Y., et al.: What are step-level reward models rewarding? Counterintuitive findings from MCTS-boosted mathematical reasoning. arXiv preprint arXiv:2412.15904 (2024)
Jiang, H., et al.: LLMLingua: compressing prompts for accelerated inference of large language models. arXiv preprint arXiv:2310.05736 (2023)
Xia, H., et al.: TokenSkip: controllable chain-of-thought compression in LLMs. arXiv preprint arXiv:2502.12067 (2025)
Labrak, Y., et al.: BioMistral: a collection of open-source pretrained large language models for medical domains. In: arXiv preprint arXiv:2402.10373 (2024)
Shoham, O.B., Rappoport, N.: MedConceptsQA: open source medical concepts QA benchmark. Comput. Biol. Med. 182, 109089 (2024)
Zhang, K., et al.: UltraMedical: building specialized generalists in biomedicine. Adv. Neural. Inf. Process. Syst. 37, 26045–26081 (2024)
Jiang, F.: Identifying and mitigating vulnerabilities in LLM-integrated applications. MA thesis. University of Washington (2024)
Young, A., et al.: Yi: open foundation models by 01. AI. arXiv preprint arXiv:2403.04652 (2024)
Wu, C., et al.: PMC-LLaMA: toward building open-source language models for medicine. J. Am. Med. Inform. Assoc. 31(9), 1833–1843 (2024)
Team GLM, et al.: ChatGLM: a family of large language models from GLM- 130b to GLM-4 all tools. arXiv preprint arXiv:2406.12793 (2024)
Bai, J., et al.: Qwen technical report. arXiv preprint arXiv:2309.16609 (2023)
Gemma Team, et al.: Gemma 2: improving open language models at a practical size. arXiv preprint arXiv:2408.00118 (2024)
Augustin, A., et al.: A study of LoRa: long range low power networks for the internet of things. Sensors 16(9), 1466 (2016)
Acknowledgement
This work was supported by the “Science and Technology Innovation Yongjiang 2035” Major Application Demonstration Plan Project in Ningbo (Grant No. 2024Z005).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2026 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhu, Y., Deng, D., Li, Y., Li, X., Li, Z., Luo, P. (2026). MedSparse: A Medical Large Model for Efficient Inference and Chain-of-Thought Generation. In: Kittler, J., et al. Pattern Recognition and Computer Vision. PRCV 2025. Lecture Notes in Computer Science, vol 16285. Springer, Singapore. https://doi.org/10.1007/978-981-95-5631-1_6
Download citation
DOI: https://doi.org/10.1007/978-981-95-5631-1_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-95-5630-4
Online ISBN: 978-981-95-5631-1
eBook Packages: Computer ScienceComputer Science (R0)Springer Nature Proceedings Computer Science

