The meeting always starts the same way. New project, data needed, tight deadline. Someone asks: "What data do we already have?" Within minutes, the room is problem-solving. "I think marketing has something in their system." "I can pull some numbers from our CRM." "Let me send over a spreadsheet." Everyone wants to deliver. The project already has a dozen dependencies, the timeline is tight, and nobody wants data to be the thing that adds another delay.

So the team works with what's available. And from that moment, the most expensive decision in the project has already been made.

I've watched this pattern for nearly twenty years. The scope gets shaped around whatever data is accessible. Data that almost fits the need gets forced into service. Workarounds fill the gaps. Nothing is as permanent as a temporary solution, and these workarounds become fixtures. And the pattern repeats, because incentives reward it. Every project is measured on its own budget and timeline. Fixing a predecessor's workaround is always more expensive than creating a new one. So every project adds to the stack of ad hoc solutions rather than reducing it. The result is the opposite of a data flywheel: more projects don't produce better data. They produce more dysfunction. Rational at the project level. Destructive at the organizational level.

This isn't an argument for letting perfect be the enemy of good. The "good enough" matters enormously. The question is whether you defined what good enough actually looks like before you started building.

Think of it like a construction project. Grab whatever materials are in the barn and start hammering, and you'll get a structure. But without a blueprint, every wall constrains the next decision. A foundation poured in the wrong spot doesn't just limit this build. It limits every future addition. Draw up the design first, and you can still start with materials on hand. The difference is that every early decision is a deliberate trade-off, not an accidental commitment.

The fix isn't to stop starting quickly. It's to understand what you need before deciding where to start.

In practice, this means mapping the full conceptual data requirements before touching a single dataset. What information does this initiative actually need to succeed? What quality levels does each element require? What does fit for purpose mean here? These are not technical questions. They're business questions that determine whether the data work delivers lasting value or creates expensive rework.

Once you have that map, you can sequence the work in three steps. First, capture immediate value: data that's already available and fit for purpose. This is your quick win, and it looks identical to the "just start" approach on the surface. The difference is that you know it's a deliberate first move, not the entire plan. One that won't create dependencies or hidden costs further down the road.

Second, invest in data that needs work. Some of what you need exists but isn't fit for purpose: quality gaps, missing attributes, inconsistent definitions. Some doesn't exist yet and needs to be sourced. A simple matrix of availability against fitness for purpose shows exactly what each data element requires.

Third, plan for the stretch. Where can this data create value beyond the current project? What additional requirements does broader reuse introduce? This is where the compounding effect lives. Every dataset built to a reusable standard adds depth and breadth to the organization's data foundation. It doesn't just serve one project. It expands what becomes possible for the next one.

Mapping requirements upfront doesn't slow down the project. Not when it's directly followed by a pragmatic first step. It does ensure that what you build first doesn't make what comes next more expensive.

Back to that first meeting. Same room, same deadline pressure, same desire to deliver. But one different question changes the entire trajectory. Not "what data do we have?" but "what data do we need?" That question requires a data leader in the room (I explored why that role just became the most strategic seat at the table in a previous essay). Not to discuss tables and pipelines, but to bridge business objectives and data requirements before the conversation turns technical. Same pragmatic start. Fundamentally different destination. One optimizes for getting going. The other optimizes for getting there.

The most expensive question in data projects isn't about technology, tooling, or talent. It's the one asked in that first meeting. Get it right, and pragmatism stops being the enemy of quality. It becomes the vehicle for it.

The Most Expensive Question in Data Projects

Sign up for Insights on Data & AI: what matters and why.