Sample Program
Below is a sample set of modules for the New Commons Incubator program. These modules are designed to be delivered over six months. Each module will combine expert-led seminars with structured team work sessions, culminating in a fundable proposal and operational roadmap. Module content will be adapted to reflect the needs and composition of each cohort and in coordination with our steering committee of Indigenous language experts.
Framing the Opportunity/Challenge
GoalDefine the specific AI use case, establish the case for a data commons, and ground the initiative in Indigenous Data Sovereignty from day one.
Key Topics- Data commons in the AI era: why they matter for low-resource, indigenous languages
- What AI applications look like for indigenous languages: speech recognition (ASR), text-to-speech, machine translation, LLM fine-tuning, and search — and what each demands in data and governance
- Problem definition and question fluency: moving from a broad challenge to tractable priorities
- Indigenous Data Sovereignty and the CARE principles (Collective Benefit, Authority to Control, Responsibility, Ethics)
- Understanding indigenous needs, values, and histories as a foundation — not an afterthought
Establishing Community Benefit and Social License
GoalOperationalize the CARE principle of Collective Benefit before data and technical decisions are made—ensuring AI applications serve, not extract from, indigenous communities.
Key Topics- How AI applications can reinforce or undermine indigenous community goals: concrete examples and red lines
- Community engagement mechanisms and consultation protocols appropriate to different indigenous contexts
- Social licensing for data reuse and AI development: what meaningful, ongoing consent looks like in practice
- Self-determination, economic benefit, and community ownership models for AI‑generated value
- Contextual factors: land rights, intergenerational trauma, language politics, and institutional trust
Mapping the Data Landscape: Supply, Risk, and Due Diligence
GoalIdentify and assess the data needed to support the commons—with rigorous attention to risks, legal exposure, and the due diligence required before any data is ingested or shared.
Key Topics- What AI applications actually require: data and language types (audio, text, metadata, annotations), minimum volumes, quality thresholds, and format standards for indigenous language AI
- FAIR-R principles and AI-readiness for low-resource language data
- Risk assessment: identifying cultural, legal, reputational, and technical risks associated with specific datasets, partners, and AI use cases
- Due diligence for data partnerships: evaluating potential data holders on provenance, consent history, community authorization, and alignment with IDS principles
- Legal and governance barriers: copyright, community IP, institutional gatekeeping, and cross-border data considerations
- Minimum viable data approaches: how to build useful AI tools without waiting for perfect datasets
Designing Governance and Accountability
GoalDesign governance models that embed Indigenous Data Sovereignty, define who controls what AI is built, and ensure trusted, long-term collaboration.
Key Topics- AI governance as indigenous governance: who decides which models are trained, on what data, for which purposes—and who can revoke access
- Collective governance models for data commons: structures, decision rights, and accountability mechanisms
- Data stewardship roles and responsibilities across the data lifecycle
- Authority, legitimacy, and control: tiered access models, community veto rights, and sunset clauses
- Embedding risk management in governance: translating the risk register into enforceable policies, audit rights, and remediation processes
Technical Architecture and AI Application Design
GoalDesign the technical infrastructure needed to build, govern, and operate the commons and to power AI applications in ways communities can control and sustain.
Key Topics- AI application types for indigenous languages: ASR, TTS, machine translation, LLM fine-tuning, and retrieval-augmented generation (RAG) — technical requirements and community implications for each
- Technical architecture options: centralized, federated, and on-premise models aligned with sovereignty principles
- Data access controls that encode governance decisions: community-controlled licensing, tiered access APIs, and audit logs
- Interoperable standards, schemas, and shared vocabularies for indigenous language data
- Open-source tooling and frameworks: what is available, what gaps remain, and how to build capacity locally
Sustainability, Funding, and Implementation
GoalBuild a viable, fundable pathway to long-term operation, one grounded in institutional reality, not short-term pilot logic.
Key Topics- Turning the incubator into a proposal: what funders need to see (problem clarity, community legitimacy, technical viability, governance rigor, and sustainability)
- Funding landscape: philanthropic, public sector, multilateral (UNESCO, UN), and revenue-generating models for indigenous language and culture data commons
- Sustainability models: institutional hosting, community ownership, and earned revenue from AI applications
- Institutional incentives and barriers: what motivates universities, governments, and tech organizations to participate — and what holds them back
- Scaling and replication: designing from the start for broader uptake across language communities
Build your fundable proposal.
The six-month arc moves from framing the opportunity to a fundable proposal and operational roadmap. Apply to bring your data commons idea into the cohort.