Applying natural language processing to clinical information retrieval

Files in This Item:
File Description SizeFormat 
Cogley_ucd_5090D_10016.pdf3.25 MBAdobe PDFDownload
Title: Applying natural language processing to clinical information retrieval
Authors: Cogley, James
Advisor: Carthy, Joe
Stokes, Nicola
Permanent link:
Date: 2014
Abstract: Medical literature, such as medical health records are increasingly digitised.As with any large growth of digital data, methods must be developed to managedata as well as to extract any important information. Information Retrieval(IR) techniques, for instance search engines, provide an intuitive medium inlocating important information among large volumes of data. With more andmore patient records being digitised, the use of search engines in a healthcaresetting provides a highly promising method for efficiently overcomingthe problem of information overload.Traditional IR approaches often perform retrieval based solely using term frequencycounts, known as a `bag-of-words' approach. While these approachesare effective in certain settings they fail to account for more complex semanticrelationships that are often more prevalent in medical literature such as negation(e.g. `absence of palpitations'), temporality (e.g. `previous admissionfor fracture') or attribution (e.g. `Father is diabetic'), or even term dependencies("colon cancer"). Furthermore, the high level of linguistic variation andsynonymy found in clinical reports gives rise to issues of vocabulary mismatchwhereby concepts in a document and query may be the same, however givendifferences in their textual representation relevant documents are missed e.g.hypertension and HNT. Given the high cost associated with errors in the medicaldomain, precise retrieval and reduction of errors is imperative.Given the growing number of shared tasks in the domain of Clinical NaturalLanguage Processing (NLP), this thesis investigates how best to integrate ClinicalNLP technologies into a Clinical Information Retrieval workflow in orderto enhance the search engine experience of healthcare professionals. To determinethis we apply three current directions in Clinical NLP research to theretrieval task. First, we integrate a Medical Entity Recognition system, developedand evaluated on I2B2 datasets, achieving an f-score of 0.85. Thesecond technique clarifies the Assertion Status of medical conditions by determiningwho is the actual experiencer of the medical condition in the report,its negation and its temporality. Standalone evaluations on I2B2 datasets, haveseen the system achieve a micro f-score of 0.91. The final NLP technique appliedis that of Concept Normalisation, whereby textual concepts are mappedto concepts in an ontology in order to avoid problems of vocabulary mismatch.While evaluation scores on the CLEF evaluation corpus are 0.509, this conceptnormalisation approach is shown in the thesis to be the most effective NLPapproach of the three explored in aiding Clinical IR performance.
Type of material: Doctoral Thesis
Publisher: University College Dublin. School of Computer Science and Informatics
Qualification Name: Ph.D.
Copyright (published version): 2014 the author
Subject LCSH: Medical records--Data processing
Information storage and retrieval systems--Medical care
Natural language processing (Computer science)
Computational linguistics
Language: en
Status of Item: Peer reviewed
Appears in Collections:Computer Science Theses

Show full item record

Google ScholarTM


This item is available under the Attribution-NonCommercial-NoDerivs 3.0 Ireland. No item may be reproduced for commercial purposes. For other possible restrictions on use please refer to the publisher's URL where this is made available, or to notes contained in the item itself. Other terms may apply.