Cloud Computing and Big Data: Looking for Healthcare Workflow

Short Link: http://ehr.bz/nistcbd

Last week I attended the Cloud Computing and Big Data Forum & Workshop at the National Institute for Standards and Technology (NIST) in Gaithersburg, Maryland. While conference was not about healthcare or workflow per se, cloud computing and big data are obviously relevant to healthcare. I tweeted generally about cloud computing and big data while noting material of specific relevance to healthcare and workflow. I collected some of the tweets and wrapped them in further comments to write this blog post.

By the way, you may also be interested in my complementary blog post: 2012 Amazon Web Services (Health) User Conference Trip Report. Specifically relevant subsections include…

Is AWS Secure Enough for Patient-Identifiable Data?
What Does AWS Bring to the EHR and Health IT Party?
Chuck: Your Blog is Called EHR Workflow Management Systems…Well?

…back to the NIST Cloud Computing and Big Data Forum & Workshop!

Attending Computing & Forum January 15-17 Hashtag is

— Charles Webster, MD ()

I (like the workshop) will start with cloud, move to big data, and then big data in the cloud.

Cloud Computing

NIST is in the business of coming up with helpful definitions. For example, prior to the conference I wondered out loud…

How many ‘cloud-based’ EHRs meet all five NIST’s essential characteristics? @

— Charles Webster, MD ()

Cloud Computing: NIST Definition Measured network access 2 on-demand self-service w/resource pooling & rapid elasticity

— Charles Webster, MD ()

NIST defines cloud computing to have five essential characteristics. They’re worth reviewing, so as to build upon when we cover federated cloud computing in the next few tweets.

“On-demand self-service. A consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with each service provider.
Broad network access. Capabilities are available over the network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones, tablets, laptops, and workstations).
Resource pooling. The provider’s computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand. There is a sense of location independence in that the customer generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter). Examples of resources include storage, processing, memory, and network bandwidth.
Rapid elasticity. Capabilities can be elastically provisioned and released, in some cases automatically, to scale rapidly outward and inward commensurate with demand. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be appropriated in any quantity at any time.
Measured service. Cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported, providing transparency for both the provider and consumer of the utilized service.”

Federated Community Cloud Definition

— Charles Webster, MD ()

Federate Community Cloud definition:

A cloud in which resources are provisioned for use by a community or by multiple communities of consumers from multiple organizations using methods that address shared needs or concerns.

Building on the idea of federated community clouds are virtual organizations. I’ve heard and read the phrase “virtual organization” for decades. What appears different here is that NIST is using it in a more formally defined manner.

Virtual Organization: multiple orgs using federation to share computing resources (8 example capabilities)

— Charles Webster, MD ()

“[Virtual organizations] are defined as multi-organization constructs that use federation to share access to computing resources.”

… and include one or more of the following capabilities:

privacy and security
compliance adherence
trust infrastructure
membership
internal organization roles
common governance
policies and procedures
private communication

So, a cloud is made up of measured network access to on-demand self-service with resource pooling and rapid elasticity. A federated cloud serves multiple communities coming together as virtual organization. If these definitions stick, then we’ll be speaking of federated clouds to create virtual healthcare organizations.

Federated Community Cloud Scenarios: specialized remote medical care & catastrophic dynamic event response..

— Charles Webster, MD ()

Two-fifths of listed federated community cloud scenarios listed on this slide involve healthcare:

Catastrophic dynamic event response
Specialized remote medical care

These particular scenarios make complete sense, of course, given the definition of cloud as being remote, on-demand, and scalable. But I am certain we’ll see more mundane examples too, as everyday healthcare organizations (hospitals, medical offices, laboratories, etc.) move to the cloud.

Workflow sighting! (Fifth bullet down)

“Workflow” mention at > needs to be part of data science educational curricula

— Charles Webster, MD ()

Chris Davis, from Verizon, spoke in the Progress on Standards for Interoperability between Clouds session about healthcare. I wasn’t close enough to get good pictures of slides. But here’s a tweet to give the flavor. I sure my health IT readers would have found what he said familiar, about healthcare’s unique needs, etc.

CD: Focusing now on healthcare organizations; not just networking, not just computing, but also authentication, records mgmt etc.

— Kenna Shaw ()

The special treat of the NIST workshop was to be sitting right up front while Vint Cert (of Inventing-The-Internet fame) gave the keynote.

Vint Cerf speaking at about “loose coupling” (earthquake analogy) in computing to achieve

— Charles Webster, MD ()

There was an interesting question from the audience about Google Health, which was discontinued last year.

Question 2 Vint Cerf healthcare data & Google’s discontinued initiative: acuteness effects privacy, virtual clinical trials, etc.

— Charles Webster, MD ()

VC said their basic problem was getting data from the physicians, by which I assume he means from their EHRs and patients.

Question 2 Vint Cerf healthcare data & Google’s discontinued initiative: couldn’t get medical folks to contribute enough data…

— Charles Webster, MD ()

VC went on to say that we need to make physicians lives easier with tech, not harder. Since surely no one would, or could, disagree with this sentiment, it’s a remarkable statement to have to make!

Vint Cerf on healthcare: “disappointed” but need to create tools “at the edge” to make doctor’s life easier rather than harder!

— Charles Webster, MD ()

But then it got even better than that! Vint Cerf came right down into the audience, Oprah-style, and took questions. He stood a couple feet from me. I caught on smartphone video this funny, interesting, and informative story about Google’s self-driving cars. Aside from the fun and interesting part, he had a point. We’re entering the age of machines talking to machines. Not only will cars negotiate with each other at stop signed four way intersections, but they will have access to everything each other has ever seen at that intersection. “That” never moves. It’s a tree. “That” is moving thing wasn’t there before. Don’t run over it.

I uploaded 5-minute YouTube video of Vint Cerf telling funny! story about Google’s Self-Driving Cars 1/16/13

— Charles Webster, MD ()

I could go on about relevance of self-driving cars to public health (epidemiology of traffic accidents) or aging-in-place (keeping seniors as independent as possible as long as possible), but VC’s story was just plain fun!

Big Data

The second day began mixing big data into the cloud, including some interesting healthcare and biomedical case studies.

Peter Levin Chief Technology Officer and Senior Advisor to the Secretary, Department of Veterans Affairs, talked about the VA’s EHR data and some mind-blowing statistics. Click on the photo of the slide to zoom into a readable rendition.

EHRs Matter at the VA @ shows amazing stats: patient volume, quality, cost, evidence, accuracy, rates

— Charles Webster, MD ()

From a later slide I tweeted…

Proteomic snapshot/person/day for one year=4600 million terabytes (total 2010 Internet data = 250 million terabytes)

— Charles Webster, MD ()

An interesting aspect of live-tweeting a conference is that people who aren’t present can chime it. And then, after the conference is over, a speaker can chime right back, such as below. For example, Peter Levin showed a slide with funny ICD-10 codes and I tweeted….

Funny @ “some codes downright insulting” “bizarre personal appearance” “walked into lamppost, subsequent encounter”

— Charles Webster, MD ()

…triggering the following three-party conversation:

@ @ only humor. Support .I was just pointing out that there are a LOT of them.I mean . . . chicken coops?

— Peter L. Levin ()

Next is a fascinating series of slides, about text mining EHRs to predict probability of patient recovery, presented by Ram Akella of UC-Santa Cruz in the Big Data Analytics , Processing and Interaction: Measurement Science, Benchmarking and Evaluation Challenges session. I dug around on the Web to try to find more details, but this appears to be a currently ongoing project. Prof. Akella was making larger points about relevance and big data, so he flew through these slides. I present them below without much comment (“Looks like…”, “I surmise…”). The point being: EHR data is indeed part of big data. (Though if I find out more info at a later date I’ll update this post.)

Slide 1/6. Info extraction from

— Charles Webster, MD ()

If you click through several times on the image you can get to a more readable version of the slide. It shows free text from the major content subsections of an EHR: medical history, physical examination, treatment, etc.

Slide 2/6. data set R Akella

— Charles Webster, MD ()

Here it looks like the data that has been extracted from an patient’s electronic record.

Slide 3/6 Experimental setting R Akella

— Charles Webster, MD ()

This slide describes the kind of available data available for each patient.

Slide 4/6. Methodology Ram Akella

— Charles Webster, MD ()

This looks like an influence diagram. The stuff on the left influences the stuff on the right.

Slide 5/6 Most Discriminant Terms Ram Akella

— Charles Webster, MD ()

I surmise that the “most discriminant terms” are the most valuable terms, information-wise, to predict patient recovery.

Slide 6/6 Results: Prob of Recovery Akella

— Charles Webster, MD ()

I surmise (again, he flew through these slides) that this shows probability of recovery when a particular term (“unstable blood pressure”) is present in the patient’s chart.

To return to a tweet about Peter Levin’s presentation….

Healthcare is truly data management challenge & opportunity -Peter Levin, VA CTO

— MJ Schmitt ()

…Healthcare truly is a data management challenge and opportunity

Cloud Computing, Big Data and Wrap-Up

What about big data’s relation to cloud computing?

Yesterday’s about computing, today’s about (BigData/Cloud intersection) stay tuned 4 tweets, links & photos!

— Charles Webster, MD ()

One of the folks from NIST joked that while we know what “cloud computing” is (see earlier section regarding NIST’s role in helping to define it) we don’t know what “big data” is. But we do know it is here and important. So, we all need to help define big data.

Pat Gallagher, Under Sec NIST opens with “We haven’t defined a definition for big data but whatever it is, it is here”

— Bernadette Hyland ()

So, what are the properties of “big data”?

The ‘V’s of volume, velocity, variety, veracity, value

— Charles Webster, MD ()

One way (the only way?) to describe something is in terms of its properties. For example, see my Clinical Groupware: A Definition. One of the NIST reps listed five Big Data properties (see below), but did not define them. I’d heard of Volume, Velocity, and Variety, but not Veracity and Value. I cobbled together the following outline from various venues (VVeeee!).

Apparently the 3D — Volume, Velocity, Variety — distinction goes all the way back to 2001. Wikipedia references that and includes a recent update:

In a 2001 research report[20] and related lectures, META Group (now Gartner) analyst Doug Laney defined data growth challenges and opportunities as being three-dimensional, i.e. increasing volume (amount of data), velocity (speed of data in and out), and variety (range of data types and sources). Gartner, and now much of the industry, continue to use this “3Vs” model for describing big data.[21] In 2012, Gartner updated its definition as follows: “Big data are high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.” (my emphasis) Wikipedia on Big Data

Volume: “amount of data” (Healthcare example: A single medical sensor emits an extraordinary amount of data per unit time. Now multiply!)
Velocity: “speed of data in and out” (Healthcare example: Reacting in the right way to the right patient event within seconds can save a life)
Variety: “range of data types and sources” (Healthcare example: Structured data such as blood pressures and clinical codes versus unstructured free text dictated by clinicians)
Veracity: “uncertain or imprecise data” (Healthcare example: Natural language descriptions of clinical phenomena is notoriously imprecise and even structured clinical data is almost always suspect)
Value: “positive net value to be retained in an accessible fashion” (Healthcare example: See my closing remarks…)

Relative to adding veracity to big data’s trivium there’s a succinct blog post on Data Science Central. And here are a couple of thumbnails to encourage you to read it. I like the first infographic: lots of dots (volume), moving dots (velocity), different colored dots (variety), and probability distributions around dots (veracity)…

…and a clever Venn diagram of Volume, Velocity, and Variety:

Big Data value? I’ll close with that, connecting it back to healthcare IT.

I’d like to return to the big three 3D of big data, as they are most famously and frequently invoked.

I found this great Big Data Consumer Guide from the Open Data Center Alliance. They only mention healthcare in passing. However their definitions are so accessible, I’ll include them here.

“Volume. As the name Big Data suggests, its volume can take up terabytes and petabytes of storage space. It has arisen as a result of an increasing enterprise demand to use and analyze more types of structured and unstructured data that do not fit into existing operational and analytic business systems. Data is growing at an exponential rate, so much that 90 percent of the data in the world today has been created in the last two years alone.
Velocity. Increasingly, enterprises need answers not next week or next month, but right now. Nightly batch loading is poorly suited for e-commerce, multimedia content delivery, ad targeting, and other real-time applications. This puts pressure on accelerating data loading at the same time that data volumes are skyrocketing. Data streaming, complex event processing, and related technologies, once mostly prevalent in financial services and government, are now emerging as enterprise data architecture requirements in multiple industries. Likewise, as more enterprises engage in social media and the Web, responding in real-time or in near real-time becomes significantly more necessary.
Variety. Variety relates to the complexity of data types and data sources. Also, much of today’s data is unstructured or semi-unstructured. This means that it doesn’t fit into neat rows and columns of the traditional relational database management systems (DBMS).”

We usually want to do things to data, such as gather, clean, transform, report, etc. We used to do this on mainframes or desktop computers. Now we upload to the cloud. Different clouds may specialize in different data processing services. How do we move data around? You can’t, practically, download petabytes of data from one cloud in order to upload it to another. In other words, how to you move data through the “inter-cloud” (a phrase I frequently heard).

The next couple of slides addressed this issue, which is, essentially, about data workflow.

Nice synthesis of workflow: collect, reorder, join, identify, model, validate, apply, publish

— Charles Webster, MD ()

That’s proposed abstraction of data workflow. Here is another, in the biomedical domain.

Biomedical Pipeline: data, transmit, store, process, analysis, results

— Charles Webster, MD ()

How much does it cost to move data through the cloud? How about 600 picocents a bit? (A completely out of context factoid, I admit!).

600 picocents to move a bit of data over the cloud ‘sGetSmall!

— Charles Webster, MD ()

And cost of data is only half the equation. Data has to have a positive net value to be retained in an accessible fashion. Otherwise it’s “archived”, in which case, quipped a presenter ( of ), it takes an act of God or an order from government to retrieve.

ROB Return On Byte

— Charles Webster, MD ()

And, keep in mind, cost and price are not the same thing.

OH “Cost & price are not the same thing” Me: otherwise no profit, therefore unsustainable, healthcare shld learn this

— Charles Webster, MD ()

One person’s cost (the buyer) is another person’s price (the seller). Folks call for price transparency. Unfortunately, most healthcare organizations don’t know their true costs, so they don’t know how to correctly price. (Hey! I was a pre-med accountancy major!) In order for external price transparency to be helpful it has to reflect internal cost transparency.

Ideally, in a free market, microeconomics operates to drive prices down to costs. This is because if there is any excess profit, sellers will enter the market to compete on price. Once price equals cost there is no more lure. However, if you don’t know your costs, you don’t know what price will allow you to continue in business in a sustainable manner. Price transparency without reflecting true costs is unhelpful to the healthcare system as a whole.

I hope that the combination of cloud computing (with its metered service costs) and big data (about costs!) will help make a dent in healthcare’s price/cost transparency quandary. But that is really another entire blog post. And it is time to wrap up this one!

Question: How make sure standards don’t hobble innovation? (reminds me of healthcare debate) “strong tension” slow standard APIs

— Charles Webster, MD ()

I’ll close this blog post with the question from the audience that I tweeted above. How do we make sure that standards (after all we are here at the National Institute of Standards and Technology) don’t prevent future innovation? Sound familiar? (If you are from Health IT, it must!). No, I didn’t hear any better answer to this question than I’ve heard in debates about healthcare standards. Oh well. These people are really smart. Makes me feel better.

That said, the NIST Cloud Computing and Big Data Forum & Workshop was truly a remarkable convocation of smart people (I heard about a thousand physically attended, not counting those watching the web cast). I learned a lot about cloud computing, big data, and big data in the cloud, relevant to where healthcare informatics is and is going. I hope you enjoyed my account of the experience, and that the social media angle contributed by my, and others’, tweets, gave it an extra dimension and degree of immediacy.

Cloud Computing

Big Data

Cloud Computing, Big Data and Wrap-Up

Share this:

Related

Leave a Reply Cancel reply