Creating a Cartridge

This guide walks you through creating a knowledge cartridge from scratch. By the end, you'll have a cartridge with extracted, reconciled, and curated knowledge nodes ready for Rai to use.

Prerequisites¶

A corpus of markdown documents (the knowledge you want to teach Rai)
An LLM API key (OpenRouter recommended — supports 200+ models at low cost)
RaiSE CLI installed (pip install raise-cli)

Step 1: Initialize the Cartridge¶

Create the cartridge directory and manifest:

mkdir -p .raise/cartridges/my-cartridge/extractors
mkdir -p .raise/cartridges/my-cartridge/instances

Create CARTRIDGE.yaml:

name: my-cartridge
display_name: "My Project Knowledge"
version: "1.0.0"
author: "Your Name"
license: "Apache-2.0"
tier: open
description: >
  Domain knowledge for my project.

schema:
  module: raise_core.graph.models
  class_name: GraphNode

corpus:
  - ../../docs/**/*.md

requires:
  llm: any

Step 2: Organize the Corpus¶

Your corpus is the collection of markdown documents that contain the knowledge. Good corpus documents:

Cover one topic per document
Use heading structure (h2 sections become extraction chunks)
Contain concrete definitions, rules, and relationships — not just prose
Are the source of truth (not copies of something maintained elsewhere)

Typical corpus organization:

docs/
  concepts/       # what things are and why they matter
  guides/         # how to do things
  reference/      # precise specifications
  decisions/      # architectural decisions (ADRs)

Step 3: Configure Extractors¶

Create extractors/config.yaml to tell the LLM extractor which files to process:

extractors:
  - name: core-concepts
    type: llm
    sources:
      - ../../docs/concepts/*.md
    node_type: knowledge

  - name: guides
    type: llm
    sources:
      - ../../docs/guides/*.md
    node_type: practice

Each extractor entry defines: - name — identifier for this extraction run - type — llm for LLM-powered extraction, yaml or markdown for structured file extraction - sources — glob patterns for source files (relative to the cartridge directory) - node_type — default type for extracted nodes (the LLM may assign more specific types like concept, guardrail, workflow)

Step 4: Set Up Your LLM Provider¶

The LLMExtractor works with any OpenAI-compatible API. Set your API key:

# OpenRouter (recommended — cheapest for batch extraction)
export OPENROUTER_API_KEY="your-key"

# Or OpenAI directly
export OPENAI_API_KEY="your-key"

# Or any compatible provider
export OPENAI_API_KEY="your-key"
export OPENAI_BASE_URL="https://your-provider/v1"

Default model: google/gemini-2.5-flash-lite via OpenRouter (~$0.01 per cartridge extraction).

Step 5: Extract¶

Run the extraction:

rai cartridge extract my-cartridge

This processes each source file through the pipeline:

Chunk — split by h2 headings (configurable via heading_level in extractor config)
Prompt — send each chunk to the LLM with extraction instructions
Parse — validate the JSON response against the GraphNode Pydantic model
Collect — write all nodes to instances/{extractor-name}.json

Monitor the output for warnings about failed chunks or validation errors.

Step 6: Reconcile¶

Check the extracted nodes for structural issues:

rai cartridge validate my-cartridge

The reconciler detects:

Phantom targets — relationships pointing to nodes that don't exist
Orphan nodes — nodes with no incoming or outgoing relationships
Cross-category edges — relationships between nodes in different categories (may indicate misclassification)

Address issues by re-extracting problematic files or proceeding to curation.

Step 7: Curate¶

Review extracted nodes with the HITL curation workflow:

# Start a curation session
rai cartridge curate my-cartridge start

# Check progress
rai cartridge curate my-cartridge status

# Review current node — then accept, reject, or skip
rai cartridge curate my-cartridge accept --reason "accurate extraction"
rai cartridge curate my-cartridge reject --reason "changelog noise, not a real entity"
rai cartridge curate my-cartridge skip

# Write curated output
rai cartridge curate my-cartridge write

Curation state persists to disk — you can close your session and resume later. The curation session tracks:

Which nodes you've reviewed
Your decision for each (accepted, rejected, edited)
A hash of the original node (detects if re-extraction changed it)

Step 8: Verify and Ship¶

After curation, verify the final cartridge:

rai cartridge validate my-cartridge
rai cartridge list

Your cartridge is now ready. The instances/ directory contains the curated knowledge nodes that Rai loads into the knowledge graph at session start.

Tips¶

Start small — extract a few files first, review quality, iterate the corpus before extracting everything
Heading structure matters — the chunker splits on h2 by default. If your docs use h3 as primary sections, set heading_level: 3 in the extractor config
One concept per section — sections that cover multiple topics produce lower-quality extractions
Skip noise — the LLM prompt already filters changelogs, config blocks, and boilerplate. If you see noise in extraction, improve the source document rather than post-filtering
Re-extract freely — extraction costs ~$0.01 per cartridge with Gemini Flash Lite. Iterate until quality is right