The uncertain uncertainty with Data

How managing uncertainty can look different with Data

Michelle-Joy Low
reecetech

--

“When will we have our <insert data-backed solution here> deliverable?”

Many of us thrive on certainty; in most, if not all aspects of life, having a view of some foreseeable future invokes a sense of tranquility and blissful order. When it comes to technology, bringing control into uncertain environments is the brainchild of planning frameworks, and through the years businesses have embraced Waterfall, Agile, Lean, Lean-Agile, Lean-Waterfall (I jest)…the list goes on.

Motivating these frameworks is the need to demonstrate progress at regular, predictable intervals, while pursuing a goal for which the path remains uncertain; however, our grasp of uncertainty and how to manage it is constantly shifting. In the world of waterfall it was believed uncertainty could be planned away, resulting in beautiful linear roadmaps rendered obsolete by new knowledge while they were being drawn. As an industry, our solution was to embrace the comfort of Agile, where progress would be portioned into bite-sized iterations, and plans re-drawn at the end of each cycle.

Delivering in a data landscape confronts these paradigms. At Reece, we’re committed to becoming a data-forward organisation, where data is not a by-product of our operations, but a first class consideration woven into every corner of our business. In working towards this goal, we’ve faced the question: why is delivering in data so hard?

Source: https://xkcd.com/2370/

The certainty conundrum in data

In most places, everyone knows with near absolute certainty the value of using data:

  • Descriptive: Imagine what we could do if we had a real-time feed of metrics from across the business, enacting the interventions needed to keep an experience seamless, such as deploying a back-up delivery vehicle in the event of a contingency.
  • Predictive: How great would the capability be to predict demand, adjust stock and inventory levels accordingly across 150,000 product SKUs in an always-on fashion?
  • 𝙼̶𝚘̶𝚘̶𝚗̶𝚜̶𝚑̶𝚘̶𝚝̶𝚜̶ Prescriptive: What if we could do those things Deepmind did with their data centres and have AI optimise our pricing, logistics, or any part of the business — better than humans can?

But while the value of using data seems obvious, the path to delivering that value is woefully uncertain. Standing in stark contrast is the more typical product delivery dilemma, where the value proposition (or product-market fit) is as uncertain, if not more so, than the delivery uncertainty.

This conundrum is not well understood; many assume that if the value of a data initiative is certain, its delivery should be as predictable. But more often than not, stakeholders struggle to see the imagined value materialise in the way originally envisioned, and so the industry is stocking up on frameworks for controlling data delivery uncertainty. Underpinning these developments is the nature of the uncertainty that working with data entails.

Sources of variance

There are three main sources of variance in outcomes when working with data. Two of these, engineering feasibility and product-user fit, while familiar to most technologists, take an adapted form in data. The third, whether the data actually delivers the envisioned value or data-usage fit, is unique to data.

  1. Engineering feasibility is the least menacing of the three. Much of the difficulty with building data products is because the enabling technologies are relatively new. As a community, practitioners have yet to settle on a standard toolkit for tackling the myriad data problems to be solved. And while this is a challenge for the present, in time I expect the industry will converge on a few tried and tested platforms, and the uncertainty arising from engineering feasibility will become manageable.
  2. Finding product-user fit is a source of variance due to the difficulty of identifying a user/customer’s actual needs. Encapsulating this concisely is the quote dubiously attributed to Henry Ford: “If I had asked people what they wanted, they would have said faster horses”. Developing truly innovative products requires teams to understand customer problems better than the customers themselves. It makes sense, therefore, that this variability can and should be managed through the rigour we apply to defining, measuring, and testing user experiences.
  3. Consumption-related uncertainty –– where data introduces a twist on the familiar theme of product-user fit is the certainty (see above) future customers/users/stakeholders have about what they want, and what they believe the data will tell them. For instance, when developing a data solution that provides decision support, like the ubiquitous customer churn model — certainty of the churn model’s value evaporates when customers realise that the data and model tells them nothing of how to prevent churn.

So I think the third and hardest to manage is whether the available data is fit for consumption. Let’s delve into two aspects.

How data is decisioned upon: a shifting target

Conflating the issue is that the value of data is inextricably linked to how decisions are made — but decision paths are almost always subjective. In our example for modelling customer churn, the success of a model-backed decision in this scenario hinges on multiple layers of complexity: the model must back-test well, produce an actionable interpretation, and inform mitigation actions that produce the intended results. Lord forbid the model identifies ‘gender’ as a critical ‘driver’, and when combined with an under-considered decisioning process, ends up triggering a retention campaign discriminating against a specific group of customers.

This subjectivity applies for simple and complex scenarios alike; be it the use of static dashboards, or the deployment of automated intelligent technologies — the uncertainty of decisions taken is rarely controllable at the outset.

Doing everything right but achieving nothing

Also peculiar about data is that it is wholly possible to build to specification, yet end up with no results — we are entirely beholden to what’s actually in the data. Unlike how obeying the SOLID principles in software engineering dependably leads to manageable, modularised code; no framework can ever guarantee that a given dataset will yield valuable insight. For example, it is a common occurrence in rules mining that the existing datasets simply do not contain any differentiating information, and that through no fault of the analyst, the only learning by the end of it is that we’ve not learned much at all.

What mining for insights can feel like. Photo by Riho Kroll on Unsplash

Add probabilistic thinking to lean-agile

At the centre of many recommended data-centric delivery approaches I believe is the adoption of a probabilistic mindset. Instead of focusing on how to iterate faster, a probabilistic mindset fosters the discipline of checking whether each iteration improves your likelihood of arriving at your intended destination. Two concepts are particularly helpful:

  • Bayesian thinking: where all relevant prior information in making decisions is used in evaluating and prioritising work each sprint. But ‘prior information’, in the context of data-driven decisioning, is as we have discussed — subjective, and can lead to vastly different results. For instance, developers who take a ‘gimme-requirements-once-only’ approach in my experience tend to underperform others who consult with re-litigating discipline to understand what decisioning patterns reflect true need. Similarly, teams who omit spikes and research from their ML release plans almost never find themselves delivering on schedule or scope. So when it comes to data-intensive solutions, how a team shapes their ‘priors’ matters.
  • Asymmetry is also of note — notably, optimism bias. It is not uncommon for practitioners, between the rock of uncertainty and hard place of stakeholders hunting down a model ETA, to acquiesce on a hopeful estimate; only later to disappoint everyone when the model reveals in-actionable inference. It would help, therefore, to reduce such asymmetries in the first place — be it through deeper engagement to refine hypotheses, or adapted risk mitigation strategies. Ultimately, controlling the impacts of asymmetries boils down to having the restraint to stick to uncertainty management disciplines regardless of how ‘agile’ a project feels.

Modern iterative delivery methods have been promulgated to help teams find product-user fit. But these practices must continue evolving; teams working with data must reorient their practices to account for additional uncertainty, or struggle to remain commercially relevant.

But even as the need for change is foreboding, probabilistic thinking helps organisations see uncertainty for what it really is — a risk that carries potential not just for disappointment, but also exceeding expectations — and importantly, a call to proactively shift the odds in our favour. So, beyond shaping ways of working, our team at reecetech is architecting whole systems to make data fit-for-use. How are we doing this, you ask? Drop me a message, say hi and find out.

--

--

Michelle-Joy Low
reecetech

Econometrician, always curious, loves growing people, and helping businesses use data.