Natural language processing (NLP) applied to medical speech and text, also know as Clinical Language Understanding (CLU), is a hot topic. It promises to improve EHR user experience and extract valuable clinical knowledge from free text about patients. In keeping with this blog’s theme, NLP/CLU can improve EHR workflow and sometimes uses sophisticated workflow technology to span between users and systems.
Here’s Dr. Nick without that pesky YouTube play button
(see below) that turns everyone into an “Arrow Head”!
Therefore I was so delighted to Skype an interview with Dr. Nick van Terheyden, Chief Medical Information Officer at Nuance Communications. Warning: several of my questions were a trifle long…sorry! But I did want to explain where I was coming from in several instances. In every case Dr. Nick, as he is known, broke down complicated NLP/CLU ideas to their essentials and explained how and why they are important to healthcare.
I’ll start with the tenth question first, since, well, I promised Dr. Nick I’d do so.
10. Most of my previous questions are pretty “geeky.” So, to compensate, from the point of view of a current or potential EHR user, what’s the most important advice you can give them?
To me, the core issue is usability and interface design. The interface and technology has struggled to take off in part because the technology has been complex, hard to master and in many instances has required extensive and repeated training to use. The combination of SR [speech recognition] and CLU technology offers the opportunity to bridge the complexity chasm, removing the major barriers to adoption by making the technology intuitive and “friendly”. We can achieve this with intelligent design that capitalizes on the power of speech as a tool to remove the need to remember gateway commands and menu trees and doesn’t just convert what you say to text but actually understands the intent and applies the context of the EMR to the interaction. We have seen the early stages of this with the Siri tool that offers a new way of interacting with our mobile phone, using the context of your calendar, the day and date, location and other information to create a more human-like technology interface that is intuitive and less intimidating.
1. Dr. van Terheyden, I see references to you as Dr. Nick. How do you prefer to be addressed?
Thanks for asking – Nick is fine but many folks refer to me as Dr. Nick…so much easier than “van Terheyden.”
2. Could you tell us a bit about your education, what you do now, and how you came to be doing it?
That’s a long story that started over 25 years ago, after I qualified at the tender age of 22 as a doctor in England. I practiced for a while in the UK and also in Australia but decided I wanted to try other things. My first step into the technology world was unrelated to medicine when I worked at Shell International as a computer programmer in their finance division on IBM mainframes programming in COBOL, JCL, CICS, DB2, TSO, CLIST and REXX, to name a few, for the financial returns for Shell operating companies. Check out the IBM Terminal behind my desk….!
I then used the skills I acquired in these roles when I transitioned my focus to some early development and incubator companies that emerged onto the scene with electronic medical record (EMR) type functions in Europe. During this time, I had the fortune of working at a greenfield site in Glasgow Scotland that built one of the first paperless medical records – many of the discoveries and concepts developed there remain applicable today.
Dr. Nick and Holly (his Golden Lab!) on YouTube
(Note: Video interview contains different, and
funnier, content than these ten questions and answers.)
Then, my career took me to the Middle East, working in Saudi Arabia, and then on to the US where I have had the fortune of working in New York, California and Maryland with a number of companies in the healthcare technology space, including most of the clinical documentation providers and speech technology vendors.
3. Are there any stereotypes about speech recognition in general and medical speech recognition in particular? What is a more accurate or useful way to think about this technology?
Speech has been available commercially for a number of years in the healthcare space and to general consumers. For a number of years it struggled to deliver value – the technology suffered from general hardware challenges and some of the challenges that exist in consumer implementations with noisy and challenging environment (your car for instance is a difficult environment with many and varied background noises and poor quality audio recording). In the clinical setting we have similar problems with noise but with added complexity of clinical workflow and fitting into the busy and complex clinical setting. In some respects, we suffered a Hollywood effect where the industry painted a picture of speech recognition that was much more than it was capable of – not just recognizing words – but understanding them.
Speech in the consumer world has moved on from a pure recognition tool to integrating some Artificial Intelligence (AI) that not only understands what is said but puts this into context. For example, some of the Nuance commercial telephony solutions include voice analysis for stress, anger and other emotions as indicators to help manage and route calls more appropriately. In healthcare, we don’t just apply voice recognition but layer on clinical language understanding that not only grasps the meaning but also tags the information, turning clinical notes and documentation into medical intelligence and making this data truly semantically interoperable.
4. What are typical medical speech recognition error rates? Cause for concern or manageable?
That’s a common question and one that a single digit or rate does not really answer. That being said, for many people out-of-the-box recognition is in the high 90’s and in many cases in excess of 98-99 percent. Even for those who do not achieve that accuracy out-of-the-box with training, it is possible for most speakers to attain a high level of accuracy. With more dictation and correction to build a personalized or speaker dependent model you can even customize the engine to individual speaking and pronunciation style.
Siri has achieved a level of accuracy and comprehension without having an individual voice and audio profile for the user. In the current version anyone can pick up an iPhone and interact with Siri. Siri has helped paint a clearer picture of what is possible with speaker independent (in other words speech recognition that does not require training) recognition that has achieved a very high rate of success and acceptability.
5. While attending the North American Association for Computational Linguistics meeting recently in Montreal (see my blog post) I noticed that Nuance Communications was a Platinum Sponsor, a higher level of commitment than even Google (Gold) or Microsoft (Silver). While NAACL2012 included some speech recognition research presentations, most presentations dealt with natural language processing further along the so-called NLP “pipeline”: morphology, syntax, semantics, pragmatics, and discourse. Where do you see Nuance going in these areas?
Nuance has a serious investment in Research and Development spread across many industries and part of the value we derive as an organization is from the cross fertilization of these efforts to different areas and verticals. The learning we derive of understanding a driver in their noisy car environment can be applied to the physician and their noisy Emergency Room department.
Applying understanding to the voice interaction opens up many avenues and we have seen this outside of healthcare (Dragon Go! for example). These principles have tremendous potential to simplify the physician interaction with technology and the complex systems they must master and use on a daily basis. I talked about this recently at a presentation I gave to the Boston AVIOS Chapter.
6. At the keynote for a workshop in biomedical NLP at NAACL2012, the concluding slide included the bullet: “NLP has potential to extend value of narrative clinical reports.” In light of your recent comments on the EMR and EHR blog I’m sure you’d agree. But, could you expand on those comments?
Nuance began investing in Clinical Language Understanding (CLU) technology, a clinical-specific form of NLP, over two years ago.
CLU is a foundational technology being applied to the other areas and applications (DM360 MD Assist and DM360 Analytics) that offers the ability to understand the free form narrative and extract out discreet data and tag it and link to a number of structured medical vocabularies.
7. The first question asked after the keynote was “Will meaningful use drive [need for/use of] clinical natural language processing?” Is SR/NLP more important for some MU measures than others? If so, which ones and why?
The answer lies in the source of data and how it is captured not in the specific data elements in my mind.
So take one measure as an example:
Is an ACE inhibitor prescribed for a patient who is suspected or suffering from a Heart Attack?
The source of this data will come from different sources in different facilities. If you have a CPOE and prescribing system then that data already exists in the system in digital form as structured data – you can answer that question (and guide the clinicians to make sure they comply with best practices and high quality care) with the existing structured data.
However, if this is a patient being seen in the ED and they dictate their notes then that information will be locked away in their documentation (unless they use a digital structured form to create the medical history) and then NLP is needed to extract the information and SR may be needed to facilitate the creation of the note efficiently.
8. At a recent workshop on clinical decision support and natural language processing at the National Library of Medicine, clinical NLP researchers cited several concerns. Point-and-click interfaces threaten to reduce amount of free text in EHRs. Since modern computational linguistics algorithms rely on machine learning against large amounts of free text, this is a potential obstacle. Is there a “race” between data input modalities? If so, who’s winning? If not, why not?
True. But this battle has been going on for years. I have referred to Henry VIII’s medical record many times as a great example why structured data entry will never fulfill the requirements:
If we rely on structured data entry that presupposes that we know everything we need to know then a structured form and selecting from a list will allow you to capture everything you need in the medical record. However, we do not know everything we need to know. By way of an example, Ground Glass opacities appeared in medical notes in narrative form before we knew what this radiological finding meant. If we had not captured this in the narrative there would have been no record of these findings since it was new at the time. We know now that these findings are linked to a number of diseases including pulmonary edema, ARDS, and viral, mycoplasmal, and pneumocystis pneumonias.
The narrative is, and will always remain, essential to a complete medical record – NLP will bridge the gap between the structured data necessary to create semantically interoperable records and just keeps getting better.
9. Much of the success of the automated language processing that we take for granted today (for example, in Google and Apple products) is due to access to lots of annotated free text, called “treebanks” (“tree” for syntax tree). However, HIPAA requirements make sharing marked-up clinical text problematic. As Nuance moves from speech recognition to natural language processing, how will you deal with this constraint?
We take this issue very seriously. As the largest provider of medical transcription in the world, we have a good understanding of the issues and the importance of securing the data and ensuring patient confidentiality. I don’t have specific answers relative to how we handle this but our development and engineering teams have incorporated these concerns into our discussions and designs as we move this technology forward.
End of Interview.
What a great written interview! (Dr. Nick’s answers, not my questions.) We got into the weeds at the end, though. Please watch the video to compensate. If only to hear Holly’s “bark-on”.
On one hand there are lots of marketing-oriented white papers about clinical natural language processing. On the other hand there are lots of arcane academic papers about computational linguistics and medicine. Dr. Nick drove the ball right down the middle. People like Dr. Nick — clinician, programmer, journalist, entrepreneur, social media personality, and Golden Lab owner — are valuable bridges between communities who need to work together to digitize medicine.