close
Skip to main content

Two Applications of Statistical Modelling to Natural Language Processing

  • Chapter
Learning from Data

Part of the book series: Lecture Notes in Statistics ((LNS,volume 112))

  • 958 Accesses

  • 6 Citations

Abstract

Each week the Columbia-Presbyterian Medical Center collects several megabytes of English text transcribed from radiologists’ dictation and notes of their interpretations of medical diagnostic x-rays. It is desired to automate the extraction of diagnoses from these natural language reports. This paper reports on two aspects of this project requiring advanced statistical methods. First, the identification of pairs of words and phrases that tend to appear together (collocate) uses a hierarchical Bayesian model that adjusts to different word and word pair distributions in different bodies of text. Second, we present an analysis of data from experiments to compare the performance of the computer diagnostic program to that of a panel of physician and lay readers of randomly sampled texts. A measure of inter-subject distance with respect to the diagnoses is defined for which estimated variances and covariances are easily computed. This allows statistical conclusions about the similarities and dissimilarities among diagnoses by the various programs and experts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Dillon W, Goldstein M (1984) Multivariate Analysis, New York: Wiley, 587pp.

    MATH  Google Scholar 

  2. Dunning, Ted (1993) Accurate methods for the statistics of surprise and coincidence, Computational Linguistics, 19: 61–74.

    Google Scholar 

  3. Friedman C, Hripcsak G, DuMouchel W, Johnson S, Clayton P (1995) Natural language processing in an operational clinical information system, Natural Language Engineering 1 (1): 1–28.

    Article  Google Scholar 

  4. Hripcsak G, Friedman C, Alderson P, DuMouchel W, Johnson S, Clayton P (1995) Unlocking clinical data from narrative reports: a study of natural language processing. Annals of Internal Medicine, 122: 681–688.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag New York, Inc.

About this chapter

Cite this chapter

DuMouchel, W., Friedman, C., Hripcsak, G., Johnson, S.B., Clayton, P.D. (1996). Two Applications of Statistical Modelling to Natural Language Processing. In: Fisher, D., Lenz, HJ. (eds) Learning from Data. Lecture Notes in Statistics, vol 112. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-2404-4_39

Download citation

  • DOI: https://doi.org/10.1007/978-1-4612-2404-4_39

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-0-387-94736-5

  • Online ISBN: 978-1-4612-2404-4

  • eBook Packages: Springer Book Archive

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Publish with us

Policies and ethics