Options
Applying natural language processing to clinical information retrieval
Author(s)
Date Issued
2014
Date Available
2015-08-14T15:22:20Z
Abstract
Medical literature, such as medical health records are increasingly digitised.As with any large growth of digital data, methods must be developed to managedata as well as to extract any important information. Information Retrieval(IR) techniques, for instance search engines, provide an intuitive medium inlocating important information among large volumes of data. With more andmore patient records being digitised, the use of search engines in a healthcaresetting provides a highly promising method for efficiently overcomingthe problem of information overload.Traditional IR approaches often perform retrieval based solely using term frequencycounts, known as a `bag-of-words' approach. While these approachesare effective in certain settings they fail to account for more complex semanticrelationships that are often more prevalent in medical literature such as negation(e.g. `absence of palpitations'), temporality (e.g. `previous admissionfor fracture') or attribution (e.g. `Father is diabetic'), or even term dependencies("colon cancer"). Furthermore, the high level of linguistic variation andsynonymy found in clinical reports gives rise to issues of vocabulary mismatchwhereby concepts in a document and query may be the same, however givendifferences in their textual representation relevant documents are missed e.g.hypertension and HNT. Given the high cost associated with errors in the medicaldomain, precise retrieval and reduction of errors is imperative.Given the growing number of shared tasks in the domain of Clinical NaturalLanguage Processing (NLP), this thesis investigates how best to integrate ClinicalNLP technologies into a Clinical Information Retrieval workflow in orderto enhance the search engine experience of healthcare professionals. To determinethis we apply three current directions in Clinical NLP research to theretrieval task. First, we integrate a Medical Entity Recognition system, developedand evaluated on I2B2 datasets, achieving an f-score of 0.85. Thesecond technique clarifies the Assertion Status of medical conditions by determiningwho is the actual experiencer of the medical condition in the report,its negation and its temporality. Standalone evaluations on I2B2 datasets, haveseen the system achieve a micro f-score of 0.91. The final NLP technique appliedis that of Concept Normalisation, whereby textual concepts are mappedto concepts in an ontology in order to avoid problems of vocabulary mismatch.While evaluation scores on the CLEF evaluation corpus are 0.509, this conceptnormalisation approach is shown in the thesis to be the most effective NLPapproach of the three explored in aiding Clinical IR performance.
Type of Material
Doctoral Thesis
Publisher
University College Dublin. School of Computer Science and Informatics
Qualification Name
Ph.D.
Copyright (Published Version)
2014 the author
Subject – LCSH
Medical records--Data processing
Information storage and retrieval systems--Medical care
Natural language processing (Computer science)
Computational linguistics
Web versions
Language
English
Status of Item
Peer reviewed
This item is made available under a Creative Commons License
File(s)
Owning collection
Views
2953
Last Month
3
3
Acquisition Date
Mar 28, 2024
Mar 28, 2024
Downloads
1111
Last Month
11
11
Acquisition Date
Mar 28, 2024
Mar 28, 2024