Here’s a fantastic video and written interview with Juergen Fritsch, Ph.D. of M*Modal. M*Modal is a leading provider of clinical transcription services, clinical documentation workflow solutions, advanced cloud-based speech understanding technology and advanced unstructured data analytics. Juergen is Chief Scientist at M*Modal. (Sounds fun!)
Juergen’s insightful responses, to my in-the-weeds questions about language technology and healthcare workflow, hit so many nails on so many heads that I run out of metaphors. And his heartfelt thoughts on starting a company make me re-appreciate how lucky I am to live and work in the good-ol’ USA!
(see below) that turns everyone into an “Arrow Head”!
One-Minute Interview:
Juergen Fritsch on Founding of M*Modal and US Innovation
[Nota bene! I captured this video at two-frames-a-second
video using Skype. It’s quality, or lack, is my sole responsibility!]
- Who is Juergen Fritsch?
- Structured vs. Unstructured Data
- Closed loop documentation with automated feedback
- M*Modal”? “Multi-modal”?
- Juergen’s Ph.D. Thesis
- Does firing linguists improve speech recognition?
- Starting a company in the US
- The workflow tech/language tech connection
- Moving to pragmatics & discourse processing
- Medical equivalent of the HAL computer?
(By the way, while I used Skype for the One-Minute Interview segment of this post, at #HIMSS13 I’ll be out and about with my Infamous HatCam. Tweet me at if you’d like a shot at almost real-time stardom. I’ll record, edit, get your OK, upload to YouTube, and tweet your booth number and #HIMSS13 hashtag, all literally on the spot! Here’s an example from the #HIMSS12 exhibit floor.)
QUESTIONS – Answers from Juergen Fritsch, Chief Scientist, M*Modal
1. Who are you? Where do you work? What is your role?
I’m Juergen Fritsch and I serve as Chief Scientist at M*Modal. I’m responsible for all innovation activities around M*Modal’s speech understanding and clinical documentation workflow solutions.
2. In your recent AHIMA presentation, your closing slide included the following bullets:
Unstructured documentation not sufficient
Structured data entry via EHRs not sufficient
What do you mean by this?
Re: second point: The government is pushing for structured data entry. That’s because the EHR paradigm is not sufficient as it forces physicians to abbreviate and be minimalistic in their approach to clinical documentation. Physicians don’t have time anymore to tell the full patient story, leaving quality on the table and creating substandard clinical documentation.
Re: first point: Unstructured documentation is not sufficient because it’s a blob of text and although very valuable for the physician to read, it doesn’t allow a computer to read and then drive action. These are hidden in the unstructured blob of text and not actionable as a result.
3. With respect to the same slide, what do you mean by “closed loop documentation with automated feedback”?
Closed loop means bidirectional. As a physician when I do documentation I do not want to just provide input, I need to be able to get feedback, hear back from the system, as in “yes this is sufficient,” and give all the details I need. In most cases there’s a lack of specificity in the documentation. Here’s an example:
If I’m documenting a patient with a hand fracture I will need to comply with ICD-10 and be able to provide detail as to which fingers, which arm, is it healing or not. It’s a lot of detail from a billing perspective that physicians may not even think to provide on their own. Closed loop documentation helps with that by constantly observing the information being provided and prompting the physician to fill in missing information or address a lack of specificity in the system so at the end you get the best possible documentation with the least amount of effort.
4. When I look at “M*Modal”, I think “multimodal”. Right or wrong? How did you (or whoever) come up with the name? What does the “multi” refer to?
Great question. Yes, M*Modal refers to multimodal, and means that only one way of doing things is not sufficient. In other words, different physicians have different approaches, preferences and needs. For example, an ER physician may not have a hands-free environment, and may want to use a microphone when creating documentation, while a primary care physician can be in front of a computer and enter things right then and there with the patient. At M*Modal, we want to be multimodal and not force physicians into one way of doing things, but accommodate them, their needs and different ways of completing documentation.
5. The title of your Ph.D. thesis was “Hierarchical Connectionist Acoustic Modeling for Domain-Adaptive Large Vocabulary Speech Recognition.” In basic, non-mathematical terms, could you explain your research? Is it still relevant? How has speech recognition evolved since?
That thesis was about using artificial neuro networks to do speech recognition. At the time there was not much research done along those lines and there was a prevailing way of doing things using statistical method. I tried to apply artificial neuro networks and was quite successful. Interestingly there was just recently a renaissance of that idea and people picked it up again with a slight twist. It has evolved and I can provide more details via our video interview if you are interested. What’s exciting to me is, it is being picked up again, not in almost five to seven years, but now again people are doing it.
6. I studied computational linguistics back when one took courses in linguistics and GOFAI (Good Old Fashioned Artificial Intelligence). Statistical and machine learning approaches superseded that kind of NLP with considerable success. Will the pendulum continue to swing? In which direction? In other words, is “Every time I fire a linguist, the performance of our speech recognition system goes up.” still true?
Unfortunately, yes, this is still true – mostly because we have so much data available to us. Stat methods have been so successful in replacing the old school linguistic approaches because of good, plentiful data. The enormous amount of data available makes those methods so difficult to replace.
http://en.wikiquote.org/wiki/Fred_Jelinek
7. You performed original research, founded a company, and continue to evolve those ideas and that product. This must be personally satisfying. Could you share some thoughts about science, innovation, jobs, and economic progress?
I found it extremely gratifying coming to the U.S. as a student, not having been born or raised here, and being able to work on challenging, cutting-edge research problems. Then getting the opportunity to form a company and how relatively easy it was to get started, and how much people gave a very small company with only about 10 people and not much revenue a chance, was also very gratifying. I would never have been able to do this in my home country – there would have been too many obstacles and people would not have been ready to bet on a startup as much as they do here. The American culture of giving the underdog a chance to try out new ideas as long as they are perceived to be valuable is very rewarding. I would encourage students of various disciplines to try the same things.
8. I write and tweet a lot about workflow management systems and business process management systems in healthcare. These include, at the very least, workflow engines and process definitions. To me, there does seem to be some similarities, or at least complementary fit, between language technology and workflow technology. For example, on the M*Modal website is a short page where “workflow” is mentioned nine times, as well as “workflow orchestration” and a “workflow management module.”
http://mmodal.com/products/mmodal-fluency/fluency-for-transcription
From your unique perspective, what is the connection between language tech and workflow tech?
This is an absolute dead-on question, I’m so happy you asked it. The important connection is if we would just do speech-to-text transcription we wouldn’t affect anything. We’d just be creating a piece of text, without being able to drive actions. Ultimately we want to drive that action in the workflow – for example, have a physician create that order for a new medication. We want to make sure follow up happens and facilitate the workflow that enables that process from beginning to end. Also, healthcare is all about collaboration among providers. There is a lot of patient handoff and effective coordination of care doesn’t happen nearly as much as it should, and it only happens if proper workflow processes are in place. If we’re not trying to get involved in that process and drive more effective workflow processes, we’re not being successful in affecting change.
9. As you know, computational linguistics, the science behind the NLP engineering, is about more than sound (phonetics and phonology), sentence structure (syntax), or even meaning (semantics). It’s also conversation (discourse) and achieving goals (pragmatics). Where do you see medical language technology going in this regard?
Again, you hit it dead-on – in the past, people have ignored the pragmatics aspect. At M*Modal we have been focused on pragmatics since the very beginning. Where it’s all going is being able to understand the content of speech, using semantics and syntax to understand what people are really talking about. You are absolutely right that without pragmatics we’d never be able to accomplish what we’re trying to with NLP technology.
10. How many years until we have the medical equivalent of the HAL computer depicted in the Movie 2001?
Hopefully never! ☺ We are getting there in a different way but I don’t think that the computer will ever replace humans. The computer will provide information, guide and educate the user, but not replace the human decision-making process.