DIGITALIZATION OF HEALTH CARE INFORMATION: HADOOP

A major push has been under way by medical service providers and their technology outsourcing companies to digitalize electronic records and the attendant Personal Health Information (PHI) that is generated every time we get medical care. In fact, digitalizing healthcare records is widely thought of as one of the best available avenues of obtaining savings and holding back the United States’ rapidly ballooning healthcare costs. However, beyond the immediate savings, digitalization of medical records holds out the promise of running analytics on those records, and thus uncovering precious new information on the trends hidden in our PHI.

One example of a domain where this effort is starting to bear fruit is analytic efforts regarding episodes of care. Let us say that State X analyzes all of the instances where its citizens were treated for a particular disease. Next, after isolating those cases, it normalizes for age, race, gender and other demographic variables, and then analyzes the data for cost per patient for the treatment of these diseases. Once that’s done, the state could then compile and rank all the medical providers providing medical treatment to its population, in order of least expensive to most expensive. It could then provide a financial reward to, say, the five medical providers who are providing this treatment at the lowest risk, while penalizing those medical providers who are costing their patients the most money for getting the same treatment. Or it could try to pick either the best or the worst medical providers, and try to find out why one group is performing so much differently than the other group.

Two recent trends in modern healthcare have accelerated the potential for deriving greater benefits from the PHI analytics. One is the emergence of Cloud technology. Due to the ever-tightening budgets, states are increasingly finding it more and more convenient to move their healthcare operation and applications from dedicated data centers to the Cloud environment. The more security-conscious of these entities are opting for some variation of private clouds. Either way, they are depending on the protection provided by the breach requirements of the Healthcare Insurance Portability and Accountability Act (HIPAA), which allows them to shift the liability of these breach incidents to the business associates rather than the covered entities.

The second trend worth paying attention to is the emergence of Hadoop. Hadoop is an open-source software language developed by the non-profit Apache Foundation. Due to its open-source nature, it can be used without having to incur any commercial licensing costs. Hadoop allows users to process very large data sets on a group of different computers, by splitting up the data, sending it to different computers to be processed, and then putting all the processed data back together in the correct order. This feature allows it to produce results similar to what would be achieved through a supercomputer but by using a group of less-powerful computers.

Entities using the Cloud are becoming increasingly comfortable with allowing their data to reside in external servers. This then allows this data to be analyzed by analytics suites running Hadoop. The result is increasing clarity and decreasing cost. This happy convergence of more powerful analytics and cloud computing should ultimately accrue greater benefit to the healthcare community and the patients they serve.