Saltar a contenido

Creating a Cartridge

This guide walks you through creating a knowledge cartridge from scratch. By the end, you'll have a cartridge with extracted, reconciled, and curated knowledge nodes ready for Rai to use.

Prerequisites

  • A corpus of markdown documents (the knowledge you want to teach Rai)
  • An LLM API key (OpenRouter recommended — supports 200+ models at low cost)
  • RaiSE CLI installed (pip install raise-cli)

Step 1: Initialize the Cartridge

Create the cartridge directory and manifest:

mkdir -p .raise/cartridges/my-cartridge/extractors
mkdir -p .raise/cartridges/my-cartridge/instances

Create CARTRIDGE.yaml:

name: my-cartridge
display_name: "My Project Knowledge"
version: "1.0.0"
author: "Your Name"
license: "Apache-2.0"
tier: open
description: >
  Domain knowledge for my project.

schema:
  module: raise_core.graph.models
  class_name: GraphNode

corpus:
  - ../../docs/**/*.md

requires:
  llm: any

Step 2: Organize the Corpus

Your corpus is the collection of markdown documents that contain the knowledge. Good corpus documents:

  • Cover one topic per document
  • Use heading structure (h2 sections become extraction chunks)
  • Contain concrete definitions, rules, and relationships — not just prose
  • Are the source of truth (not copies of something maintained elsewhere)

Typical corpus organization:

docs/
  concepts/       # what things are and why they matter
  guides/         # how to do things
  reference/      # precise specifications
  decisions/      # architectural decisions (ADRs)

Step 3: Configure Extractors

Create extractors/config.yaml to tell the LLM extractor which files to process:

extractors:
  - name: core-concepts
    type: llm
    sources:
      - ../../docs/concepts/*.md
    node_type: knowledge

  - name: guides
    type: llm
    sources:
      - ../../docs/guides/*.md
    node_type: practice

Each extractor entry defines: - name — identifier for this extraction run - typellm for LLM-powered extraction, yaml or markdown for structured file extraction - sources — glob patterns for source files (relative to the cartridge directory) - node_type — default type for extracted nodes (the LLM may assign more specific types like concept, guardrail, workflow)

Step 4: Set Up Your LLM Provider

The LLMExtractor works with any OpenAI-compatible API. Set your API key:

# OpenRouter (recommended — cheapest for batch extraction)
export OPENROUTER_API_KEY="your-key"

# Or OpenAI directly
export OPENAI_API_KEY="your-key"

# Or any compatible provider
export OPENAI_API_KEY="your-key"
export OPENAI_BASE_URL="https://your-provider/v1"

Default model: google/gemini-2.0-flash-lite-001 via OpenRouter (~$0.01 per cartridge extraction).

Step 5: Extract

Run the extraction:

rai cartridge extract my-cartridge

This processes each source file through the pipeline:

  1. Chunk — split by h2 headings (configurable via heading_level in extractor config)
  2. Prompt — send each chunk to the LLM with extraction instructions
  3. Parse — validate the JSON response against the GraphNode Pydantic model
  4. Collect — write all nodes to instances/{extractor-name}.json

Monitor the output for warnings about failed chunks or validation errors.

Step 6: Reconcile

Check the extracted nodes for structural issues:

rai cartridge validate my-cartridge

The reconciler detects:

  • Phantom targets — relationships pointing to nodes that don't exist
  • Orphan nodes — nodes with no incoming or outgoing relationships
  • Cross-category edges — relationships between nodes in different categories (may indicate misclassification)

Address issues by re-extracting problematic files or proceeding to curation.

Step 7: Curate

Review extracted nodes with the HITL curation workflow:

# Start a curation session
rai cartridge curate my-cartridge start

# Check progress
rai cartridge curate my-cartridge status

# Review current node — then accept, reject, or skip
rai cartridge curate my-cartridge accept --reason "accurate extraction"
rai cartridge curate my-cartridge reject --reason "changelog noise, not a real entity"
rai cartridge curate my-cartridge skip

# Write curated output
rai cartridge curate my-cartridge write

Curation state persists to disk — you can close your session and resume later. The curation session tracks:

  • Which nodes you've reviewed
  • Your decision for each (accepted, rejected, edited)
  • A hash of the original node (detects if re-extraction changed it)

Step 8: Verify and Ship

After curation, verify the final cartridge:

rai cartridge validate my-cartridge
rai cartridge list

Your cartridge is now ready. The instances/ directory contains the curated knowledge nodes that Rai loads into the knowledge graph at session start.

Tips

  • Start small — extract a few files first, review quality, iterate the corpus before extracting everything
  • Heading structure matters — the chunker splits on h2 by default. If your docs use h3 as primary sections, set heading_level: 3 in the extractor config
  • One concept per section — sections that cover multiple topics produce lower-quality extractions
  • Skip noise — the LLM prompt already filters changelogs, config blocks, and boilerplate. If you see noise in extraction, improve the source document rather than post-filtering
  • Re-extract freely — extraction costs ~$0.01 per cartridge with Gemini Flash Lite. Iterate until quality is right