Claude Code as Phase 1 Scaffold for Domain Ontologies

A methodology for using Claude Code configuration files as the first implementation layer of a typed reasoning architecture.

Who This Is For

You have (or want to build) a domain ontology — a typed knowledge representation with nodes, edges, and invariants — for a complex domain (clinical medicine, legal reasoning, engineering design, scientific research). You also use Claude Code as your development environment.

This guide shows how to map your ontology onto Claude Code’s configuration layers so that every session operates within your architectural constraints, and your configuration files serve as the scaffolding for a future typed system rather than just a project helper.

1. The Core Insight

Claude Code has four configuration layers that naturally map to components of a reasoning architecture:

Claude Code Layer	Architecture Function	Loaded When
CLAUDE.md	Mode selection rules, invariant enforcement policies	Every interaction
*`.claude/rules/.md`**	Constraint library, ontology schema, method decision trees	Every interaction (auto-discovered, all `.md` files including subdirectories)
memory/	Evidence catalog, assumption registry, learned benchmarks	MEMORY.md every session (first 200 lines); topic files on demand
Notebook/output convention	Processing pipeline template	When producing work products

What Claude Code Auto-Loads (verified mechanisms only)

./CLAUDE.md (or ./.claude/CLAUDE.md) — project instructions
./.claude/rules/*.md — all markdown files, recursively auto-discovered
~/.claude/CLAUDE.md — personal user preferences
./CLAUDE.local.md — personal project-local preferences
MEMORY.md — first 200 lines only

Warning: .claude/context/ is NOT a Claude Code feature. Files placed there will not be auto-loaded. Use .claude/rules/ as the extraction target.

Most projects dump everything into CLAUDE.md. This guide shows how to distribute content across layers so that each layer serves a specific architectural function.

Why This Matters

If you plan to eventually build a typed reasoning system, the work you do now in Claude Code configuration is not throwaway scaffolding — it becomes the knowledge base, constraint library, and validation logic that the typed system operationalizes. Architectural shortcuts in configuration become technical debt in the final system.

2. Defining Your Ontology for Claude Code

2.1 Identify Your Node Types

Every domain ontology has typed entities. Common patterns:

Generic Node	Description	Examples
Task	The user’s goal or query	GitHub issues, tickets, research questions
Claim	An asserted statement requiring evidence	Hypotheses, conclusions, recommendations
Evidence	Measurements, observations, results	Metrics, study results, test outputs
Constraint	Invariants that bound the reasoning space	Domain laws, regulations, best practices
Method	Procedures, tests, analytical approaches	Algorithms, protocols, experiments
Assumption	Implicit beliefs that reasoning depends on	Data quality assumptions, scope limitations
Validation	Pass/fail criteria for claims	Acceptance tests, review criteria, thresholds
Risk	Safety, cost, and reversibility concerns	Side effects, failure modes, costs
Resource	External knowledge sources with quality metadata	Papers, guidelines, manuals, datasets

Exercise: List the node types relevant to your domain. For each, identify where they currently live in your Claude Code configuration (CLAUDE.md? memory? nowhere?).

2.2 Identify Your Edge Types

Edges define relationships between nodes. Common patterns:

Edge	From → To	Semantics
`supports`	Evidence → Claim	Evidence increases confidence in claim
`refutes`	Evidence → Claim	Evidence decreases confidence in claim
`requires`	Claim → Evidence	Claim depends on this evidence
`derives_from`	Claim → Constraint	First-principles derivation
`gates`	Claim → Validation	Claim must pass validation before becoming actionable
`assumes`	Claim → Assumption	Claim depends on this assumption holding
`has_risk`	Method → Risk	Method carries this risk

2.3 Identify Your Reasoning Modes

Most domains have at least two reasoning modes with different rigor requirements:

Mode	Trigger	Invariants	Output Ceiling
Synthesis	“What does the evidence say?”	Cite sources, note quality	Claims are provisional
Creation	“What should we do?”	Full validation, risk assessment, assumptions enumerated	Claims can be actionable after gate passage

Critical design decision: When uncertain which mode applies, default to the more rigorous one. The cost of under-reasoning exceeds the cost of over-reasoning in any domain where decisions have consequences.

3. Mapping Ontology to Claude Code Layers

3.1 CLAUDE.md — The Deterministic Classifier (~200-300 lines)

CLAUDE.md is loaded on every interaction. It should contain only decision-making rules — things that apply to every task. Think of it as your system’s routing logic and invariant declarations.

What belongs here:

# Project — Claude Code Directives

## 1. Work Prioritization
[Who is the stakeholder? What is the source of truth for tasks?
What are the escalation rules?]

## 2. Mode Selection
### Task Classification
- [Task type A] → Synthesis mode
  - Ceiling: claims are provisional
- [Task type B] → Creation mode
  - Required: validation gates, assumption enumeration
- Ambiguous → Creation mode (fail-safe)

### Escalation Triggers
- [Condition] → promote from Synthesis to Creation

## 3. LLM Boundary
### The LLM MAY
- Propose analysis approaches, generate code, interpret results
- Suggest next steps, generate formatted outputs

### The LLM SHOULD NOT
- Claim any output is "validated" without human review
- Skip validation checks
- Assign quality scores to sources (use deterministic formula)

### The LLM MUST
- Report deterministic metrics without editorializing sufficiency
- Enumerate assumptions for every analytical claim
- Document decision provenance (llm_proposed / user_directed / convention)

## 4. Workflow Convention
[Mandatory steps: issue reference, approach documentation, etc.]

## 5. Output Convention
[Required sections, templates, quality standards]
[Pointers to .claude/rules/ for detailed reference material]

What does NOT belong here: - Data dictionaries (move to rules/) - Code recipes and patterns (move to rules/) - Checklists and reference tables (move to rules/) - Learned lessons (move to memory/)

Rule of thumb: If you’d Ctrl-F for it occasionally rather than reading it every time, it belongs in rules/, not CLAUDE.md.

3.2 `.claude/rules/` — The Knowledge Base

All .md files in .claude/rules/ (including subdirectories) are auto-discovered and auto-loaded every session. No import directives needed — just drop a markdown file in the directory and it loads. This is the correct extraction target for reference material from CLAUDE.md.

Important: .claude/context/ does NOT exist as a Claude Code feature. Files placed there will NOT be auto-loaded. Use .claude/rules/ only.

Map each rules file to an ontology function:

File	Ontology Function	Contains
`ontology-schema.md`	Type definitions	Node types, edge types, field specifications
`domain-constraints.md`	Constraint library	Domain invariants with durability ratings
`assumption-registry.md`	Assumption catalog	Known assumptions, sensitivity ratings, test methods
`authority-scoring.md`	Evidence quality formula	Deterministic scoring: how to rate sources
`method-decision-tree.md`	Method selection logic	When to use which approach, prerequisites
`validation-gates.md`	Validation criteria	Structured review checklist with pass/fail states
`data-dictionary.md`	Evidence structure	Dataset schemas, field definitions
`code-patterns.md`	Method implementation	Reusable code recipes, package conventions

Creating the ontology schema file:

# Ontology Schema

## Node Types

### Claim
- statement: string (the asserted claim)
- confidence: { epistemic, contextual, temporal } — each [0,1]
- provenance: weighted vector over {first_principles, empirical, consensus, heuristic, anecdotal}
- actionability: enum(informational | provisional | validated | actionable)
- explainability: string (plain-language version for non-specialists)

### Assumption
- statement: string
- type: enum(behavioral | access | state | preference | methodological)
- sensitivity: enum(low | moderate | high | critical)
- testable: bool
- test_method: optional reference to Method

[... additional node types ...]

## Edge Types
[Table of edges with From → To → Semantics]

## Invariants
- Every Claim in Creation mode with actionability >= provisional
  MUST enumerate critical assumptions
- Every Method MUST document risks
- Every Resource MUST have a deterministic authority score

Creating the constraint library:

# Domain Constraints

## Established Constraints (will not change)
| ID | Constraint | Source | Last Reviewed |
|----|-----------|--------|---------------|
| C-001 | [Fundamental domain law] | [Authoritative source] | [Date] |

## Stable Constraints (change on multi-year timescale)
| ID | Constraint | Version | Source | Next Review |
|----|-----------|---------|--------|-------------|
| C-010 | [Current best practice] | v2.3 | [Guideline body] | [Date] |

## Emerging Constraints (has an expiration date)
| ID | Constraint | Source | Confidence | Expiration |
|----|-----------|--------|-----------|-----------|
| C-020 | [Recent finding] | [Preprint/trial] | [Rating] | [Date] |

Creating the assumption registry:

# Assumption Registry

## Domain-Specific Assumptions
| ID | Assumption | Type | Default | Sensitivity | Testable | Test Method |
|----|-----------|------|---------|-------------|----------|-------------|
| AS-001 | [Assumption statement] | [Category] | [true/false/unknown] | [critical/high/moderate/low] | [yes/no] | [How to verify] |

## Analytical Assumption Template
For every analysis, enumerate assumptions in this format:
| ID | Assumption | Type | Sensitivity | Status |
|----|-----------|------|-------------|--------|
| [auto] | [Statement] | methodological | [Rating] | [untested/confirmed/violated] |

3.3 memory/ — Persistent State

Memory files persist across sessions. They accumulate evidence about what works, what fails, and what the system has learned.

File	Ontology Function	Content
`MEMORY.md`	Index + high-priority state	Links to topic files, key learnings (<200 lines)
`model-benchmarks.md`	Evidence catalog	Known performance baselines for comparison
`error-patterns.md`	Violated assumptions	Bugs mapped to the assumptions they violated
`domain-topic.md`	Specialized knowledge	Deep dives on specific domain areas

Key principle: MEMORY.md is loaded every session, so keep it concise. Detailed content goes in topic files referenced from MEMORY.md.

3.4 Output Convention — The Processing Pipeline

Map your work products to a processing pipeline. Every output should follow a consistent section structure that mirrors the reasoning stages:

Pipeline Stage	Output Section	What It Produces
INGEST	Task Reference + Data Loading	What was asked + what data is available
ROUTE	Approach Documentation	Which reasoning mode was selected and why
EXPAND	Analysis / Investigation	Claims generated, evidence gathered
GATE	Validation	Which claims pass which gates, which don’t
DECIDE	Recommendations	Method selection with risk assessment
UPDATE	Limitations & Degradation	What remains uncertain, what assumptions are unverified
OUTPUT	Summary	Audience-appropriate rendering of claims with explainability

4. The LLM/Deterministic Boundary

This is the most important architectural decision for LLM-assisted reasoning systems.

4.1 The Problem

LLMs produce plausible-sounding reasoning traces that are not structurally verifiable. In any high-stakes domain, a plausible but incorrect claim can lead to harmful decisions. The solution is to separate what the LLM does from what deterministic logic does.

4.2 The Boundary

The LLM MAY: - Propose new nodes (Claims, Evidence, Assumptions) - Identify relationships between Evidence and Claims - Suggest evidence quality inputs (study design, population, recency) - Generate natural-language explanations - Propose method ordering rationale

The LLM MAY NOT: - Directly assign confidence values - Bypass validation gates - Mark claims as actionable without gate passage - Override deterministic routing decisions - Assign authority scores to resources

The DETERMINISTIC SYSTEM: - Computes confidence from evidence quality inputs using defined formulas - Evaluates validation gate passage - Computes method priority ordering - Enforces all invariants

4.3 Phase 1 Implementation (Convention-Based)

In Phase 1, you enforce this boundary through CLAUDE.md conventions:

## LLM Boundary
Claude SHOULD NOT claim any analysis is "validated" or "actionable"
without human review. All claims remain `provisional` until a human
reviewer confirms gate passage.

Claude MUST report deterministic metrics (computed values, test results)
without editorializing on their sufficiency. Let the validation gates
determine pass/fail.

4.4 Phase 2 Implementation (Code-Based)

In Phase 2, enforce programmatically: - Validation scripts check that outputs meet gate criteria - Authority scores computed by deterministic functions, not LLM judgment - Audit trail records mutation_source: enum(llm_proposed | deterministic | user_input)

5. Gap Analysis Template

When evaluating your current configuration against your target ontology, use this template for each component:

## Gap: [Component Name]

**Current State:** [What exists now — be specific about files and locations]
**Required State:** [What the ontology specifies]
**Gap Severity:** Critical | High | Medium | Low
**Migration Path:**
1. [Specific step]
2. [Specific step]
**Dependencies:** [What else must change first]

5.1 Common Gaps

These gaps appear in almost every project that hasn’t done this mapping:

Gap	Typical Severity	Symptom
No assumption tracking	Critical	Errors are misdiagnosed as “the model is wrong” when the data doesn’t match what was assumed
No mode selection	High	Simple lookups get the same heavyweight process as complex reasoning tasks
No LLM boundary	Critical	LLM assigns confidence, evaluates its own output, and claims are “validated” without human review
CLAUDE.md too large	Medium	Context window consumed by reference material irrelevant to current task
No validation structure	High	Review is voluntary and binary (done/not done) rather than typed criteria with states
No evidence quality scoring	High	A case report and a meta-analysis have equal visual weight in citations
No risk tracking	Medium	Risks of recommended approaches are implicit, not explicitly documented
Assumptions invisible	Critical	The most dangerous gap — implicit assumptions are the primary source of silent reasoning errors

6. Implementation Roadmap

Phase 1a: Extract and Organize (2-3 hours)

The quickest improvement: separate rules from reference material.

Create .claude/rules/ directory
Move data dictionaries, code recipes, and checklists from CLAUDE.md to appropriately named rules files
Slim CLAUDE.md to ~200-300 lines of decisions and conventions
Add pointers from CLAUDE.md to rules files
Clean stale entries from settings files

Success criterion: CLAUDE.md contains only rules that apply to every interaction. Reference material is in rules files.

Phase 1b: Ontology Scaffolding (4-8 hours)

Create the architectural rules files:

rules/ontology-schema.md — node/edge type definitions
rules/domain-constraints.md — constraint library with durability tags
rules/assumption-registry.md — known assumptions + template
rules/authority-scoring.md — deterministic source quality formula
rules/method-decision-tree.md — approach selection logic
rules/validation-gates.md — structured review criteria
Add mode selection and LLM boundary sections to CLAUDE.md

Success criterion: Every rules file maps to a specific ontology function. A new contributor can understand the reasoning architecture from the rules files alone.

Phase 1c: Convention Updates (1-2 hours)

Update work product templates:

Add mandatory Assumptions section to output convention
Add Limitations & Degradation section
Add authority_score to bibliography convention
Add decision provenance metadata (llm_proposed / user_directed / convention)

Success criterion: Every work product follows the processing pipeline structure and documents its assumptions.

Phase 2: Enforcement (future)

Build validation scripts that check gate passage programmatically
Implement authority score computation as deterministic code
Create domain-specific Claude Code skill with full pipeline template
Add audit trail to every graph mutation

Phase 3+: Graph Runtime (future)

Define graph database or in-memory representation
Implement typed graph mutation API
Build deterministic mode classifier as code
Implement audience-switching output renderer

7. Architectural Validation Test

Choose the hardest decision your domain needs to support. This decision should integrate every capability: risk assessment, evidence synthesis, constraint application, assumption tracking, multi-stakeholder output, and time-sensitivity.

Walk through how your proposed scaffold would represent this decision:

Task: What is the question? What mode does it route to?
Constraints activated: Which domain constraints apply?
Assumptions required: What must be true for the reasoning to hold?
Validation gates: What must be verified before a claim becomes actionable?
Risk assessment: What are the risks, in both technical and lay terms?
What works: What can the scaffold handle today?
What doesn’t: What requires Phase 2+ infrastructure?

If the scaffold cannot represent the decision at all, your ontology is missing node types or edges. If it can represent it but not enforce it, that’s expected for Phase 1 — convention-based enforcement is the goal.

8. Principles

Progressive Disclosure

CLAUDE.md (always loaded)
  → Decisions, conventions, priorities
  → Short (200-300 lines)

.claude/rules/ (loaded after CLAUDE.md)
  → Reference material, schemas, constraint libraries
  → Loaded automatically but lower priority

memory/ (persistent across sessions)
  → Evidence, benchmarks, learned patterns
  → MEMORY.md always loaded; topic files on demand

Separation of Concerns

Layer	Contains	Does NOT Contain
CLAUDE.md	Rules, mode selection, invariants	Code blocks, data schemas, checklists
rules/	Reference material, type definitions	Decision-making rules
memory/	Learnings, benchmarks, evidence	Instructions (those go in CLAUDE.md)
settings	Permissions (allow/deny)	Instructions

Fail-Safe Toward Rigor

When the mode classifier is uncertain, route to the more rigorous mode. The cost of over-reasoning is wasted compute. The cost of under-reasoning in a consequential domain is a reasoning failure with real consequences.

Halt on Contradiction

When the user’s request implies something exists (a file has content, a variable is defined, a service is running) and reality contradicts that expectation (file is empty, variable is missing, service is down), this is a blocking contradiction — not a minor inconvenience to work around.

Protocol:

Detect: Before acting on any referenced resource, verify it contains what the user’s request implies it contains. An empty file when content is expected is a contradiction. A missing column when a query references it is a contradiction.
Halt: Stop all dependent work immediately. Do not proceed with assumptions, workarounds, or fallbacks. The user’s mental model and reality have diverged — any work built on the wrong model is wasted.
Report: State the contradiction clearly and specifically:
- What the user expected (inferred from their request)
- What was actually found
- Why this blocks progress
Wait: Do not ask soft questions (“should I continue anyway?”). State the problem and wait for the user to resolve the discrepancy.

This applies in interactive conversations. In batch/automated workflows where the user cannot respond, log the contradiction and skip the dependent work rather than guessing.

Why this is a principle, not a preference: The cost of halting is one round-trip of clarification. The cost of proceeding is an entire chain of work built on a false premise — work that must be discarded and redone once the contradiction surfaces later (and it always surfaces later).

Examples: - User says “use the descriptions in report.txt” → file is empty → HALT - User says “update the function’s error handling” → function has no error handling → HALT (they may mean “add” not “update”, or they’re looking at a different version) - User says “fix the failing test” → test is passing → HALT

Convention Before Code

Phase 1 enforces invariants through CLAUDE.md conventions (natural-language rules). Phase 2 enforces through code (validation scripts, structured output constraints). Convention-based enforcement is imperfect but ships immediately. Code-based enforcement is reliable but requires implementation.

Start with conventions. Upgrade to code when the conventions are stable and proven.

9. Checklist: Is Your Configuration Architecture-Ready?

Question	If No
Is CLAUDE.md under 300 lines?	Extract reference material to `.claude/rules/`
Does CLAUDE.md define reasoning modes?	Add mode selection section
Is there an explicit LLM boundary?	Add LLM MAY / MAY NOT section
Do you have a constraint library?	Create `rules/domain-constraints.md`
Are assumptions tracked?	Create `rules/assumption-registry.md`
Is evidence quality scored?	Create `rules/authority-scoring.md`
Do outputs follow a pipeline structure?	Define section convention mapping to INGEST→OUTPUT
Are validation criteria typed?	Restructure checklist in `rules/validation-gates.md`
Is memory structured?	Organize into MEMORY.md + topic files
Can you walk through your hardest decision?	Your ontology is missing components — add them

10. Tiered Framework: Three Levels of Ontology Architecture

The generic scaffold described in Sections 1-9 is one way to use this methodology, but not the only way. Projects vary in how deeply the ontology penetrates the actual work. This section describes three levels, each building on the previous.

Level 1: Organizational Reasoning

Purpose: Impose reasoning discipline on project work. The ontology organizes how Claude thinks about your domain, but the domain objects themselves are outside the ontology.

Node types: Generic — Task, Claim, Evidence, Constraint, Assumption, Method, Validation, Risk, Resource. These are reasoning primitives that apply to any domain.

Validation model: 6-state gate system (unvalidated → validated). Claims pass through gates before becoming actionable.

Method selection: Decision tree routing tasks to approaches. “When should I use tool A vs tool B?”

Output: Narrative documents with structured sections (Task Reference, Constraints Applied, Assumptions, Analysis, Validation, Limitations, Provenance).

When to use Level 1: - Project management and workflow organization - Infrastructure and DevOps tooling - General-purpose data analysis - Any project where the ontology serves the process, not the product

Examples: - Azure DevOps project (/projects/azure/) — organizes API access, repo management, and deployment workflows - A clinical research project (Level 1 scaffold) — organizes clinical research analysis with R/tidymodels

Templates: templates/ontology-scaffold/ provides ready-to-customize files for Level 1.

Level 2: Domain-Native Ontology

Purpose: The ontology IS the domain model. Node types are domain objects, not abstract reasoning primitives. The .claude/rules/ files define the actual knowledge representation, not just how to think about it.

Node types: Domain-specific — replace Task/Claim/Evidence with the real entities in your domain. The node types ARE the things you extract, discover, or build.

Validation model: Empirical metrics replace abstract gates. Instead of “did this claim pass Gate 3?”, validation is “what is the precision/recall of this extractor against a gold standard?” The 6-state model may still exist but is secondary to quantitative evaluation.

Method selection: Becomes a computational algorithm, not a workflow router. Instead of “use tool A when condition B”, it’s “spawn child agents where parent agents found signal, prune branches where they didn’t.”

Output: Structured data (graphs, JSON, typed records) rather than narrative documents. The output IS the domain artifact.

What changes from Level 1:

Component	Level 1	Level 2
Node types	Generic reasoning primitives	Domain objects (Concept, Span, Document, Agent)
Edge types	supports/refutes/requires	Domain relations (measurement_of, treatment_for, caused_by)
Constraints	Project policies and platform limits	Domain-specific quality rules (precision over recall, negation awareness)
Assumptions	Access/state/methodological beliefs	Empirical — tested by running the system, not by checking permissions
Authority scoring	Source quality formula	Per-entity confidence scores from the system itself
Validation	Gate passage (6-state)	Gold standard comparison (precision, recall, F1 per entity)
Method tree	Workflow routing	Computational algorithm (hierarchical spawning, pruning)
Output format	Narrative with sections	Structured data (graph, JSON)

When to use Level 2: - NLP / information extraction systems - Knowledge graph construction - Ontology development (where the ontology is the product) - Any project where the domain has a formal knowledge representation (UMLS, SNOMED, FHIR, schema.org, etc.)

Example: Clinical NLP project (/projects/clinical-nlp/) — UMLS concepts are the node types, extraction agents are the methods, and inter-agent negotiation replaces gate validation.

Key insight: At Level 2, you don’t need the generic scaffold templates. The ontology schema file becomes the domain model, not a reasoning framework. You’re building the ontology, not using one to organize your thinking.

Level 3: Multi-Agent Systems with Negotiation

Purpose: Multiple autonomous agents operate on the domain model, producing claims that must be reconciled through formal negotiation protocols. The architecture defines not just what the agents know, but how they interact when they disagree.

What’s new beyond Level 2:

Agent as first-class node type. Each agent has identity, scope, performance metrics, and negotiation standing. Agents are not just tools — they are participants with documented capabilities and limitations.
Negotiation protocol. When agents produce conflicting claims over the same evidence (e.g., two agents claim the same text span), a formal resolution process determines the outcome:
- Dominance: One agent’s claim subsumes another (hierarchy check)
- Relation discovery: Conflict reveals a relationship between claims (not contradiction, but structure)
- Escalation: Unresolvable conflicts flagged for human review

Hierarchical spawning. Agents are not all deployed at once. Parent agents scan first; child agents spawn only where parents found signal. This is a computational scaling strategy:

Level 0: Broad category agents (10-15 total)
   ↓ spawn only where parent found something
Level 1: Subcategory agents
   ↓ spawn only where parent found something
Level 2: Specific entity agents
   ↓ spawn only where parent found something
Level 3: Leaf-level agents (most specific)

Without hierarchy: O(N × L) where N = all possible agents. With hierarchy: most branches pruned at Level 0-1.

Negotiation transcript as artifact. The output includes not just the resolved graph, but the negotiation history: which agents claimed what, how overlaps were resolved, what remains unresolved. This provides interpretability that monolithic systems lack.
Hybrid modes. Combine hierarchical agents (broad discovery) with focused agents (high-priority precision). Focused agents take priority in negotiation when they overlap with hierarchical agents.

Rules files at Level 3:

File	Level 1 Equivalent	Level 3 Version
`ontology-schema.md`	Generic node/edge types	Domain entities + Agent node type with performance fields
`domain-constraints.md`	Project policies	Domain quality rules (precision over recall, negation awareness, section awareness)
`method-decision-tree.md`	Workflow routing	Agent spawning algorithm with hierarchy pruning
`validation-gates.md`	6-state gate system	Per-entity metrics + negotiation resolution rates
— (new)	—	`agent-architecture.md` — agent lifecycle, negotiation protocol, scaling strategy
— (new)	—	`domain-reference.md` — external knowledge system integration (UMLS, SNOMED, etc.)

When to use Level 3: - Information extraction at scale (clinical NLP, legal document analysis) - Multi-model ensembles where outputs must be reconciled - Any system where different specialized components operate on the same evidence and may disagree

Example: Clinical NLP project (/projects/clinical-nlp/) — concept agents scan clinical notes in parallel, negotiate overlapping span claims using UMLS hierarchy for dominance, and discover relations between proximate concepts.

Choosing Your Level

Does your project need reasoning discipline
for tasks and workflows?
├── Yes → Level 1 (Organizational Reasoning)
│         Use templates/ontology-scaffold/
│
│   Are domain objects the primary product,
│   not just the subject of reasoning?
│   ├── Yes → Level 2 (Domain-Native Ontology)
│   │         Build ontology-schema.md from domain model
│   │
│   │   Do multiple agents/extractors operate on
│   │   the same evidence and need reconciliation?
│   │   ├── Yes → Level 3 (Multi-Agent Negotiation)
│   │   │         Add agent-architecture.md, negotiation protocol
│   │   └── No → Stay at Level 2
│   │
│   └── No → Stay at Level 1
│
└── No → You may not need this framework yet

Progression Between Levels

Levels are not exclusive — a project can use Level 1 for its workflow organization while building a Level 2 or 3 system as its product. A clinical research project might use Level 1 to organize analysis notebooks while a clinical-NLP project might build a Level 3 extraction system.

Projects may also evolve between levels: - Start at Level 1 to organize initial exploration - Promote to Level 2 when domain model becomes the primary artifact - Extend to Level 3 when multiple extraction/analysis agents need reconciliation

The generic scaffold (Level 1) is always a valid starting point. You can replace generic node types with domain-specific ones as the project matures, keeping the .claude/rules/ structure and progressive disclosure architecture intact.

This guide is domain-agnostic at Level 1, and provides the architectural framework for Levels 2 and 3. For worked examples: see the Knowledge Vault Guide (Level 1), the Azure DevOps project (Level 1), a clinical research project (Level 1), and a clinical-NLP project (Level 3).

11. Generic Templates (Level 1)

Ready-to-use templates for bootstrapping a Level 1 project are in templates/ontology-scaffold/. Copy the templates, replace {PLACEHOLDERS}, and you have a working Phase 1 scaffold without needing to reverse-engineer an existing implementation.

Template	Target	Purpose
`CLAUDE.md.template`	`CLAUDE.md`	Decision rules (~150 lines)
`rules/ontology-schema.md`	`.claude/rules/`	Node/edge types, invariants
`rules/domain-constraints.md`	`.claude/rules/`	Constraint library with durability ratings
`rules/assumption-registry.md`	`.claude/rules/`	Known assumptions with test methods
`rules/authority-scoring.md`	`.claude/rules/`	Deterministic source quality formula
`rules/method-decision-tree.md`	`.claude/rules/`	Approach selection logic
`rules/validation-gates.md`	`.claude/rules/`	6-state gate system
`rules/output-conventions.md`	`.claude/rules/`	Processing pipeline section template

See templates/ontology-scaffold/README.md for usage instructions and placeholder reference.