shopify stats

The Image Cataloging Nightmare


5 Steps to Sleeping at Night:

A few weeks ago I was walking the aisles at a small, yet important, healthcare technology conference. I innocently struck up a conversation with a guy who turned about to be a rather big wheel in an even bigger organization. My blazer and exceptional grooming convinced him that I knew perhaps more than I really did, and he presented me with his challenge of the moment.

Three Blind Doctors

The outfit he was with had acquired a vast assortment of medical image files, spanning back 4 to 5 decades, said collection comprising several terabytes of data storage. Pretty impressive.

“So what’s the problem?”, I mused, expecting that I might already know the answer.

“We can’t find anything,” came his agonized response.

And he meant it. Nothing. Nada. The big donut. Image files of valuable medical research, conditions, syndromes, and more, all of it valuable in the right physician’s hands for education and diagnosis, and you couldn’t search for file content any better than Jacques Cousteau questing for Narwhals in the Arctic Ocean blue. The medical community was blind to this data treasure.

Nothing was tagged with any meta information, so all the files were just a big, garbled mess of consumed storage capacity. My new friend had himself the mother of all image file reclamation projects on his hands. The positive part of the whole deal was, he was and is determined to make these files useful for the benefit of humankind.

“Don’t you have some sort of scanning technology that will determine what is in each image and spit out useful meta tags for all of them on a mouse click?”

I liked his creative imagination, but even though I know there are some bleeding edge technologies out there that try to solve this problem, I hadn’t heard of any that would get to the granular level of utility his physician friends needed. He was coming to me because he knew my firm specializes in organizational collaboration and bringing value from big data. I had to give him something.

Set the Go-Forward Governance and Process

After cogitating for a few minutes that to him must have seemed like hours, here is what I came up with. There was no way to avoid the digital tagging endeavor we had on our hands for historical data, but first, patch the hole in the sinking boat today so no more water gets in. Establish the governance and metadata tagging structure for all new image data, and firm it up with a business process, backed by a degree of automation.

  1. Identify all the major categories of metadata relevant to the images. This was health care, so that might range by type of disease or condition, variations within each, root cause of a particular malady, type of patient (no personally-identifiable information please!), and other factors.
  2. Identify the more mundane but important attributes that still add value, like date, location of incident, etc.
  3. Create lists and sub-lists of this descriptive data, each sub-list a slave to its particular master list.
  4. Use that information to have a smart form application developed (perhaps by a very smart company…) for the semi-automatic cataloging of all future images stored, with the form responses tagging each image at time of storage.
  5. Tag on an approval workflow with edit rights to validate images.

Oh, trust me, it’s not as simple as it sounds, but if my wannabe client gets this far, he’s set up a nice little process that gets all present and future images stored in a way that can be retrieved by docs and nurses throughout the organizational health care universe. It’s a process that not only saves time, ultimately, on data entry, but ensures standardization regardless of the person uploading the images. The smart form solution and workflows guarantee it.

The Historical Image Nightmare

I know what you’re thinking. We still have a very, very large chunk of image data out there that is virtually useless to users. Well folks, sometimes there just ain’t no putting lipstick on a pig. BUT…we can dress it up a bit and streamline the process to take the sting out of this arduous process to a large degree. I wish I could tell you the solution was all technology, but moreover, it is technology facilitating a higher level of effectiveness of human processes

  1. Identify Tier 1 meta data taggers. These are people with, in this case, enough clinical background to know basically what an image is (a burn) but not at a high enough level to sub-classify it (3rd degree burn, from an explosion). These people are not subject matter experts in discreet fields but data generalists.
  2. Identify Tier 2 meta data taggers. These are the specialists. In this example, they are physicians with specific clinical backgrounds
  3. Build a 2 stage workflow for segmenting all image data in several phases, for instance…
    • Using the same smart forms as for new images created, have a slew of Tier 1 data taggers begin the major, or level 1, general classification of each image (it’s a burn). Start in reverse chronological order for lack of a better approach, or, if the file names give you some kind of information about the image, sort by order of importance. Newer images will likely be clearer and more currently relevant.
    • Design the workflow so that the output of the level 1 classifications are segmented into queues for the Tier 2 specialists. In all likelihood, these folks will be working on something else full time and will be doing this in their spare time, so they each have a dashboard that alerts them (email as well) of when there is a slug of images to review for level 2 classification. These folks serially classify each item over time.
    • An option on (b) is to have each level 2 classification performed by two experts, each of whom must classify the item. This improves accuracy. When the expert classifications agree, final tagging happens automatically and either serves as approval, OR goes into an approval workflow that gives someone in organizational leadership or administration final say. If the two expert tags disagree, this would launch another review workflow and those images would be placed automatically into an exception file, by level 1 classification, for further disposition and agreement.

There are many variances to put on this process and the devil is in the details of establishing the workflows to use technology to better the outcomes. Essentially what we’ve done is to segment labor to reduce cost and improve overall efficiency, particularly for the Tier 2 experts. The workflows, dashboards and exception reports make the process systematic, consistent, and easier to navigate. With a major reclamation project like this, the historical archiving is still going to feel like forcing a basketball through a garden hose. This process makes it seem like there is some grease on the basketball.