Back to Blog
Opinion Piece

Data Commons: The Missing Infrastructure for Public Interest Artificial Intelligence

The data winter ins't just a technical glitch. It's a structural failure. What we urgently need is new infrastructure: data commons.

PublishedApril 29, 2025
SourceODPL · The GovLab
Data Commons: The Missing Infrastructure for Public Interest Artificial Intelligence

Stefaan Verhulst (Co-Founder of The GovLab and The Data Tank, Research Prof. NYU Tandon School of Engineering)

Burton Davis (Vice President and Deputy General Counsel, Intellectual Property Group at Microsoft )

Andrew Schroeder (Vice President of Research and Analysis for Direct Relief )

Artificial intelligence is celebrated as the defining technology of our time. From ChatGPT to Copilot and beyond, generative AI systems are reshaping how we work, learn, and govern. But behind the headline-grabbing breakthroughs lies a fundamental problem: The data these systems depend on to produce useful results that serve the public interest is increasingly out of reach.

Without access to diverse, high-quality datasets, AI models risk reinforcing bias, deepening inequality, and returning less accurate, more imprecise results. Yet, access to data remains fragmented, siloed, and increasingly enclosed. What was once open—government records, scientific research, public media—is now locked away by proprietary terms, outdated policies, or simple neglect. We are entering a data winter just as AI's influence over public life is heating up.

This isn’t just a technical glitch. It’s a structural failure. What we urgently need is new infrastructure: data commons.

A data commons is a shared pool of data resources—responsibly governed, managed using participatory approaches, and made available for reuse in the public interest. Done correctly, commons can ensure that communities and other networks have a say in how their data is used, that public interest organizations can access the data they need, and that the benefits of AI can be applied to meet societal challenges.

Commons offer a practical response to the paradox of data scarcity amid abundance. By pooling datasets across organizations—governments, universities, libraries, and more—they match data supply with real-world demand, making it easier to build AI that responds to public needs.

We’re already seeing early signs of what this future might look like. Projects like Common Corpus, MLCommons, and Harvard’s Institutional Data Initiative show how diverse institutions can collaborate to make data both accessible and accountable. These initiatives emphasize open standards, participatory governance, and responsible reuse. They challenge the idea that data must be either locked up or left unprotected, offering a third way rooted in shared value and public purpose.

But the pace of progress isn’t matching the urgency of the moment. While policymakers debate AI regulation, they often ignore the infrastructure that makes public interest applications possible in the first place. Without better access to high-quality, responsibly governed data, AI for the common good will remain more aspiration than reality.

That’s why we’re launching The New Commons Challenge—a call to action for universities, libraries, civil society, and technologists to build data ecosystems that fuel public-interest AI. The initiative will fund two winning projects with $100,000 each to prototype commons focused on critical areas like disaster response and local decision-making. It has secured critical support from leaders in these fields, with Direct Relief/CrisisReady and the Institutional Data Initiative at the Harvard Law School Library serving as partners and UNESCO serving as an observer.

The New Commons Challenge aims to put this structural and community-guided approach at the forefront. Imagine AI tools that can help cities plan for floods, improve response to crisis, or enable contextual guidance to public interest organizations —not just in well off capitals but in underserved regions around the world. These applications are possible—but only if the right data is available to train and sustain them.

That’s why data commons, and The New Commons Challenge, are so essential.

If we want AI that works in the public’s interest, we need to invest in the infrastructure that makes it possible. A data commons is not a utopian idea. It’s a practical foundation for innovation that reflects—and serves—the diversity of human experience. The commons won’t build itself. But together, we can build it in time.