Creating a Cartridge
This guide walks you through creating a knowledge cartridge from scratch. By the end, you'll have a cartridge with extracted, reconciled, and curated knowledge nodes ready for Rai to use.
Prerequisites¶
- A corpus of markdown documents (the knowledge you want to teach Rai)
- An LLM API key (OpenRouter recommended — supports 200+ models at low cost)
- RaiSE CLI installed (
pip install raise-cli)
Step 1: Initialize the Cartridge¶
Create the cartridge directory and manifest:
mkdir -p .raise/cartridges/my-cartridge/extractors
mkdir -p .raise/cartridges/my-cartridge/instances
Create CARTRIDGE.yaml:
name: my-cartridge
display_name: "My Project Knowledge"
version: "1.0.0"
author: "Your Name"
license: "Apache-2.0"
tier: open
description: >
Domain knowledge for my project.
schema:
module: raise_core.graph.models
class_name: GraphNode
corpus:
- ../../docs/**/*.md
requires:
llm: any
Step 2: Organize the Corpus¶
Your corpus is the collection of markdown documents that contain the knowledge. Good corpus documents:
- Cover one topic per document
- Use heading structure (h2 sections become extraction chunks)
- Contain concrete definitions, rules, and relationships — not just prose
- Are the source of truth (not copies of something maintained elsewhere)
Typical corpus organization:
docs/
concepts/ # what things are and why they matter
guides/ # how to do things
reference/ # precise specifications
decisions/ # architectural decisions (ADRs)
Step 3: Configure Extractors¶
Create extractors/config.yaml to tell the LLM extractor which files to process:
extractors:
- name: core-concepts
type: llm
sources:
- ../../docs/concepts/*.md
node_type: knowledge
- name: guides
type: llm
sources:
- ../../docs/guides/*.md
node_type: practice
Each extractor entry defines:
- name — identifier for this extraction run
- type — llm for LLM-powered extraction, yaml or markdown for structured file extraction
- sources — glob patterns for source files (relative to the cartridge directory)
- node_type — default type for extracted nodes (the LLM may assign more specific types like concept, guardrail, workflow)
Step 4: Set Up Your LLM Provider¶
The LLMExtractor works with any OpenAI-compatible API. Set your API key:
# OpenRouter (recommended — cheapest for batch extraction)
export OPENROUTER_API_KEY="your-key"
# Or OpenAI directly
export OPENAI_API_KEY="your-key"
# Or any compatible provider
export OPENAI_API_KEY="your-key"
export OPENAI_BASE_URL="https://your-provider/v1"
Default model: google/gemini-2.0-flash-lite-001 via OpenRouter (~$0.01 per cartridge extraction).
Step 5: Extract¶
Run the extraction:
This processes each source file through the pipeline:
- Chunk — split by h2 headings (configurable via
heading_levelin extractor config) - Prompt — send each chunk to the LLM with extraction instructions
- Parse — validate the JSON response against the
GraphNodePydantic model - Collect — write all nodes to
instances/{extractor-name}.json
Monitor the output for warnings about failed chunks or validation errors.
Step 6: Reconcile¶
Check the extracted nodes for structural issues:
The reconciler detects:
- Phantom targets — relationships pointing to nodes that don't exist
- Orphan nodes — nodes with no incoming or outgoing relationships
- Cross-category edges — relationships between nodes in different categories (may indicate misclassification)
Address issues by re-extracting problematic files or proceeding to curation.
Step 7: Curate¶
Review extracted nodes with the HITL curation workflow:
# Start a curation session
rai cartridge curate my-cartridge start
# Check progress
rai cartridge curate my-cartridge status
# Review current node — then accept, reject, or skip
rai cartridge curate my-cartridge accept --reason "accurate extraction"
rai cartridge curate my-cartridge reject --reason "changelog noise, not a real entity"
rai cartridge curate my-cartridge skip
# Write curated output
rai cartridge curate my-cartridge write
Curation state persists to disk — you can close your session and resume later. The curation session tracks:
- Which nodes you've reviewed
- Your decision for each (accepted, rejected, edited)
- A hash of the original node (detects if re-extraction changed it)
Step 8: Verify and Ship¶
After curation, verify the final cartridge:
Your cartridge is now ready. The instances/ directory contains the curated knowledge nodes that Rai loads into the knowledge graph at session start.
Tips¶
- Start small — extract a few files first, review quality, iterate the corpus before extracting everything
- Heading structure matters — the chunker splits on h2 by default. If your docs use h3 as primary sections, set
heading_level: 3in the extractor config - One concept per section — sections that cover multiple topics produce lower-quality extractions
- Skip noise — the LLM prompt already filters changelogs, config blocks, and boilerplate. If you see noise in extraction, improve the source document rather than post-filtering
- Re-extract freely — extraction costs ~$0.01 per cartridge with Gemini Flash Lite. Iterate until quality is right