Download the DORA AI Capabilities report.
Capabilities

Healthy data ecosystems AI

The foundation for AI success

For years, organizations have understood that data is valuable. But in the era of AI-assisted software development, the quality of your data ecosystem has graduated from “valuable” to “existential.” DORA’s research into the State of AI-assisted Software Development reveals a critical truth: AI is an amplifier. It magnifies the strengths of high-performing organizations and the dysfunctions of struggling ones.

A healthy data ecosystem, defined by internal data that is high-quality, easily accessible, and unified, is a foundational capability that significantly amplifies the positive influence of AI adoption on organizational performance.

To unlock the full potential of AI, leaders must shift their perspective: data is no longer just a by-product of work; it is a strategic asset that must be curated, governed, and treated as a product.

DORA research found with a high degree of certainty that the positive benefits of AI adoption depend on organizations having healthy data ecosystems. When data health is high, AI’s positive influence on organizational performance is significantly amplified.

The AI Angle: Why data health matters now

The adage “garbage in, garbage out” has never been more relevant. Generative AI tools are only as effective as the context they are given.

In a healthy data ecosystem, internal data is robust and trustworthy, making it suitable for sophisticated AI tasks like fine-tuning foundational models or implementing Retrieval-augmented Generation (RAG) systems. When an AI tool can access high-quality, unified data, it moves from being a generic coding assistant to a specialized expert that understands your organization’s unique architecture and business logic.

Conversely, if your data environment is fragmented or low-quality, AI can accelerate the generation of incorrect or irrelevant outputs. As one research participant noted in the 2025 DORA report, many companies “aren’t even at a stage where they have their data properly organized… it’s like, spread out all across the company.” An AI connected to bad data will only produce bad answers, faster.

How to implement a healthy data ecosystem

Improving your data ecosystem requires a strategic investment akin to improving core technical capabilities like continuous integration. It is a journey of continuous improvement that combines technical implementation with cultural shifts.

Treat data as a product, not a by-product

The most significant shift is cultural. Transition from treating data as “exhaust” from applications to seeing it as a product with consumers. This means assigning ownership. Establish clear owners and stewards for critical data domains (for example, “customer account data”) who are responsible for its accuracy, metadata, and access policies.

Prioritize a “single source of truth”

Siloed data is a major blocker for AI. Teams should work to identify key data sources and consolidate or federate them to create a unified view. This might involve building workflows that automatically tag and surface data context during development, reducing the time developers—and AI agents—spend searching for information.

Implement data quality frameworks

Aligning with the principle of “building quality in,” organizations should introduce automated checks and monitoring for data. Just as you have automated tests for your code, you need automated validation for your data to ensure accuracy, completeness, and timeliness.

Democratize access (with governance)

High-quality data is useless if it is locked away. Use platforms that make it easy for teams and tools to discover and access the data they need. This directly supports the creation of AI-accessible internal data, allowing tools to provide company-specific context. However, this must be balanced with governance: create a “paved road” of secure, well-documented data pathways.

Document locally

Ideally, documentation and metadata should live alongside the data and be treated as a code artifact. For example, teams can maintain a “data” section in an application’s README.md that describes what data is created or consumed. This keeps documentation versioned and available to all consuming applications.

Common pitfalls

  • The “tool is the silo” trap: A mature data platform must manage different sources and schemas. Avoid letting specific tools dictate your data architecture. You must be willing to break with opinionated approaches from specific tools to meet a common ground for integration.
  • Data as a by-product: Failing to assign ownership or strategy to data results in “data swamps” that confuse both humans and AI. If no one owns the data quality, the data is effectively garbage.
  • Boiling the ocean: Don’t try to fix all data at once. Start with a pilot project bound to a specific high-value service or application.

Why this deserves investment

Investing in our data ecosystem is no longer just about ‘good hygiene’, it is the prerequisite for ROI on our AI investments. DORA research shows that AI is an amplifier; if we apply AI to our current fragmented data, we will simply scale our confusion. By treating our internal data as a product, ensuring it is accurate, accessible, and unified, we turn AI from a generic tool into a specialized expert that understands our business. This doesn’t just improve individual developer effectiveness; it is statistically proven to amplify organizational performance.

How to measure

Progress in data health should be assessed through a combination of perceptual measures and objective metrics.

Survey questions to ask your teams

  • Access: How easily can you access and analyze the internal data sources you need to complete your work?
  • Silos: How often are you unable to use important internal data because the data is siloed?
  • Quality: How would you rate the overall quality of the data you typically rely on for your work?
  • Searchability: If you need a particular data point, how likely is it that you can get a definitive answer within 1 hour of searching?

System metrics to track

  • Timeliness of data: The elapsed time it takes for a developer to acquire access to a necessary dataset.
  • Data incidents: The number of bugs or production incidents traced back to poor data quality.
  • Data freshness: Metadata analysis of “last updated” timestamps across key datasets.

More from DORA

Read more about healthy data ecosystems in the following sources:

What’s next?

  • Take the DORA quick check to get a baseline of your team’s software delivery performance.
Last updated: December 8, 2025