Data Detectives: Investigating What is, and What is Not, MeasuredBy Michael Clarke. Originally published in the Scholarly Kitchen on February 27, 2014
“Data! Data! Data!” he cried impatiently. “I can’t make bricks without clay.”
– Sherlock Holmes, from The Adventure of the Copper Beeches, 1892
Well over a century after Sir Arthur Conan Doyle’s famous detective lamented his lack of data, we live amidst a superabundance: Big Data, the Quantified Self, Evidence-Based Medicine, Article Level Metrics, and Fourth Paradigm Science to name but a few of the data-related buzzwords that are (over) used today. Which is all to say, we sure do like to measure things. And for the most part all this measuring is a good thing. I’ll take medicine based on evidence rather than anecdote. I’m glad we have computers capable of processing the vast and complex datasets associated with climate modeling and particle physics. And I do like to know how far and how fast I go when I get on my bike, except at the beginning of the season when that information is typically a bit demoralizing.
Businesses are using more data than ever to inform decision making, though the truly large Big Data in the business world is limited to companies like Google, Facebook, Amazon and the like that have largely online products and services with users bases in the hundreds of millions. When you try to track all the ways those hundreds of millions of users interact with your products, services, and each other, it creates a lot of data. But as Richard Padley notes in a thoughtful recent article on the hype and substance of Big Data in STM and scholarly publishing, most businesses (and certainly most publishers) do not have to contend with the scale and complexity of truly Big Data.
While the technical challenges may be less daunting with smaller data sets, there remain challenges in interpreting data and in using it to make informed decisions. Perhaps the most daunting challenge is in understanding the limitations of the dataset: What is being measured and, just as importantly, what is not being measured? What inferences and conclusions can be drawn and what is mere conjecture? Where are the bricks and mortar solid and where does the foundation give way beneath our feet?
And while many organizations have created the position of Data Scientist to help answer these questions, I sometimes think that “Data Detective” may be the better term. In homage to Doyle, who gave us the greatest of detectives, I explore these issues through three cases, demonstrating how slippery and confounding, data can be.Read Full Article