TxtArchive Skill
Usage: Load this file into Claude Code (via CLAUDE.md pointer) or paste into an LLM session for assistance with creating, unpacking, and working with txtarchive code archives.
TxtArchive creates text-based archives of codebases optimized for LLM analysis. Its key value: it flattens Jupyter notebooks into simple text so LLMs can read, modify, and create notebooks without dealing with .ipynb JSON markup.
1) Why TxtArchive Exists
The Problem
- Jupyter notebooks are JSON files with deeply nested structure. LLMs struggle to produce valid
.ipynbJSON reliably. - Sharing a directory of analysis files with an LLM requires flattening multiple files into a single context.
- Transferring notebooks between environments (local, remote servers, HealtheDataLabs) is friction-heavy.
The Solution
TxtArchive converts notebooks and code files into a simple text format that LLMs can easily read and write. The round-trip works:
.ipynb / .py / .qmd --> txtarchive archive --> plain text (paste to LLM)
|
LLM creates/modifies plain text --> txtarchive extract-notebooks --> .ipynb
Key insight: An LLM can produce a txtarchive-formatted text file containing notebook cells, and extract-notebooks reconstructs valid .ipynb files from it – no JSON authoring required.
2) Two Archive Formats
| Format | Flag | Header Pattern | Best For |
|---|---|---|---|
| Standard | (default) | ---\nFilename: path/file\n--- |
Exact file reconstruction |
| LLM-Friendly | --llm-friendly |
# FILE N: path/file\n###...### |
AI analysis, notebook round-trips |
Always use --llm-friendly when working with LLMs. The standard format is for backup/transfer only.
3) LLM-Friendly Format Structure
# Archive created on: 2026-02-06 10:30:00
# LLM-FRIENDLY CODE ARCHIVE
# Generated from: /path/to/project
# Date: 2026-02-06 10:30:00
# TABLE OF CONTENTS
1. 010-Data-Prep.py
2. 310-Baseline-Model.ipynb
3. utils.R
################################################################################
# FILE 1: 010-Data-Prep.py
################################################################################
import pandas as pd
df = pd.read_csv("data.csv")
################################################################################
# FILE 2: 310-Baseline-Model.ipynb
################################################################################
# Markdown Cell 1
"""
# Baseline Random Forest Model
This notebook builds the initial RF classifier.
"""
# Cell 2
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
# Markdown Cell 3
"""
## Load Data
"""
# Cell 4
df = pd.read_parquet("cohort.parquet")
print(f"Shape: {df.shape}")
################################################################################
# FILE 3: utils.R
################################################################################
library(tidymodels)
# ... R code ...
Notebook Cell Conventions
| Marker | Meaning |
|---|---|
# Cell N |
Code cell (Python/R) |
# Markdown Cell N |
Markdown cell (wrapped in triple quotes """...""") |
- Cell numbers are sequential within each notebook file.
- Outputs are stripped – only source code and markdown are preserved.
- On extraction, cells are reconstructed into valid
.ipynbJSON with proper metadata.
4) Core Commands
Archive: Create an Archive
# Archive a project directory (LLM-friendly, code only)
python -m txtarchive archive myproject/ output.txt \
--file_types .py .ipynb .qmd .R \
--llm-friendly \
--extract-code-only
# Archive specific files only (faster, precise)
python -m txtarchive archive myproject/ output.txt \
--explicit-files 310-Baseline.ipynb 320-Tuned.ipynb utils.py \
--llm-friendly
# Archive with exclusions
python -m txtarchive archive myproject/ output.txt \
--file_types .py .ipynb .qmd \
--exclude-dirs .venv __pycache__ .git _targets \
--llm-friendly --extract-code-only
# Include root-level files regardless of type
python -m txtarchive archive myproject/ output.txt \
--file_types .py .ipynb \
--root-files README.md pyproject.toml \
--llm-friendly
# Split large archives by token count
python -m txtarchive archive myproject/ output.txt \
--file_types .py .ipynb \
--llm-friendly --extract-code-only \
--split-output --max-tokens 75000Key flags:
| Flag | Purpose |
|---|---|
--llm-friendly |
Use LLM-friendly format (always use for AI work) |
--extract-code-only |
Strip notebook outputs (smaller, cleaner) |
--explicit-files |
Archive only these specific files |
--file_types |
Filter by extension (default: .yaml .py .r .ipynb .qmd) |
--exclude-dirs |
Skip these directories |
--root-files |
Include these root files regardless of --file_types |
--include-subdirs |
Only include these subdirectories |
--no-subdirectories |
Don’t recurse into subdirectories |
--split-output |
Split into chunks by token count |
--max-tokens N |
Max tokens per chunk (default: 100000) |
Unpack: Extract Files from an Archive
# Auto-detects format (standard or LLM-friendly)
python -m txtarchive unpack archive.txt output_dir/
# Overwrite existing files
python -m txtarchive unpack archive.txt output_dir/ --replace_existingExtract Notebooks: Reconstruct .ipynb Files
# Extract only Jupyter notebooks from an LLM-friendly archive
python -m txtarchive extract-notebooks archive.txt output_dir/
# Extract notebooks AND generate Quarto (.qmd) files
python -m txtarchive extract-notebooks-and-quarto archive.txt output_dir/
# Overwrite existing
python -m txtarchive extract-notebooks archive.txt output_dir/ --replace_existingThis is the key command for the LLM-creates-notebooks workflow. The LLM writes cell markers in plain text, and extract-notebooks produces valid .ipynb files.
5) Workflow: LLM Creates a Jupyter Notebook
This is the primary use case. An LLM produces analysis code in txtarchive format, and you convert it to a runnable notebook.
Step 1: Ask the LLM to Write in TxtArchive Format
Prompt the LLM with something like:
Create a Jupyter notebook for EDA of the cohort data. Write it in txtarchive
LLM-friendly format. Use "# Cell N" for code cells and "# Markdown Cell N"
(with triple-quoted content) for markdown cells. Number cells sequentially.
The file should be named 110-EDA.ipynb.
Step 2: LLM Produces Plain Text
################################################################################
# FILE 1: 110-EDA.ipynb
################################################################################
# Markdown Cell 1
"""
# Exploratory Data Analysis
Initial look at the cohort demographics and outcome distribution.
"""
# Cell 2
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_parquet("cohort.parquet")
print(f"Cohort size: {len(df)}")
# Markdown Cell 3
"""
## Demographics
"""
# Cell 4
df[["age", "gender", "race"]].describe()
# Cell 5
df["death_flag"].value_counts().plot(kind="bar", title="Outcome Distribution")
plt.tight_layout()
plt.show()
Step 3: Save and Extract
# Save the LLM output to a .txt file
# (copy-paste or redirect)
# Extract to a Jupyter notebook
python -m txtarchive extract-notebooks llm-output.txt ./
# Result: 110-EDA.ipynb is created as a valid Jupyter notebookStep 4: Run the Notebook
Open 110-EDA.ipynb in Jupyter, run cells, iterate.
6) Workflow: Transfer Notebooks to a Remote Environment
For copying notebooks to HealtheDataLabs or other remote Jupyter servers where you can’t easily git clone.
Outbound (Local to Remote)
# Archive notebooks you want to transfer
python -m txtarchive archive paper/mortality/ transfer.txt \
--file_types .ipynb .py \
--llm-friendly --extract-code-only
# Copy transfer.txt to the remote environment (paste, upload, scp, etc.)On the Remote Machine
# If txtarchive is installed:
python -m txtarchive extract-notebooks transfer.txt ./
# If txtarchive is NOT installed, paste the archive into an LLM and ask:
# "Extract the notebook files from this archive and give me the .ipynb JSON"Inbound (Remote to Local)
On the remote machine, archive the work:
python -m txtarchive archive . results.txt \
--explicit-files 310-Model.ipynb 320-Tuned.ipynb \
--llm-friendly --extract-code-onlyCopy results.txt back locally and unpack.
7) Workflow: Archive for LLM Code Review
Send an existing codebase to an LLM for review, refactoring, or extension.
# Archive the analysis directory
python -m txtarchive archive paper/mortality/ review.txt \
--file_types .py .ipynb .qmd .R \
--llm-friendly --extract-code-only
# Paste review.txt into the LLM session (or attach it)
# Ask: "Review this analysis. Suggest improvements to the feature engineering
# in 210-Features.ipynb and the model tuning in 320-Tuned.ipynb."The LLM can respond with modified files in txtarchive format. Save and extract.
8) Workflow: Ask Sage Integration
# Archive and ingest in one step
python -m txtarchive archive-and-ingest myproject/ archive.txt \
--file_types .py .ipynb \
--llm-friendly --extract-code-only \
--ingestion-method auto
# Ingest an existing file
python -m txtarchive ingest --file archive.txt
# Test which endpoints work
python -m txtarchive ingest --file archive.txt --test-endpointsRequires ACCESS_TOKEN environment variable for Ask Sage API authentication.
9) Word Document Conversion
# Convert a Word doc to Markdown
python -m txtarchive convert-word --input_path paper.docx --output_path paper.md
# Methods (in order of quality): mammoth, pandoc, python-docx, basic
python -m txtarchive convert-word --input_path paper.docx --output_path paper.md \
--method mammoth11) Programmatic API
from txtarchive import (
archive_files,
unpack_files,
extract_notebooks_to_ipynb,
run_extract_notebooks,
)
# Create an archive
archive_files(
directory="myproject/",
output_file="archive.txt",
file_types=[".py", ".ipynb"],
llm_friendly=True,
extract_code_only=True,
)
# Unpack (auto-detects format)
from txtarchive.packunpack import unpack_files_auto
unpack_files_auto("archive.txt", "output_dir/")
# Extract notebooks only
extract_notebooks_to_ipynb("archive.txt", "output_dir/")12) Installation
pip install txtarchive
# Or from source
pip install -e /path/to/txtarchiveRequires Python >= 3.7.6. Optional dependencies: mammoth, python-docx, pypandoc (for Word conversion).
13) Common Patterns
Archive a Single Notebook for LLM Editing
python -m txtarchive archive . edit.txt \
--explicit-files 310-Baseline.ipynb \
--llm-friendly --extract-code-onlyPaste edit.txt into the LLM. Get modified text back. Save and extract.
Archive an Entire Analysis Pipeline
python -m txtarchive archive paper/mortality/ pipeline.txt \
--file_types .py .ipynb .qmd .R \
--exclude-dirs __pycache__ .ipynb_checkpoints _targets \
--llm-friendly --extract-code-onlyCreate a Notebook from Scratch via LLM
- Describe the analysis you want.
- Tell the LLM to use txtarchive format.
- Save the output to a
.txtfile. - Run
python -m txtarchive extract-notebooks output.txt ./ - Open the resulting
.ipynbin Jupyter.
Split a Large Codebase for Context-Limited LLMs
python -m txtarchive archive . archive.txt \
--file_types .py .ipynb \
--llm-friendly --extract-code-only \
--split-output --max-tokens 75000
# Creates: archive_part1.txt, archive_part2.txt, ...14) Troubleshooting
| Problem | Solution |
|---|---|
| Notebook won’t open after extraction | Check cell markers: # Cell N (not # cell N). Case matters. |
| Archive too large for LLM | Use --split-output --max-tokens 75000 |
| Missing files in archive | Check --file_types includes the extension |
| Encoding errors | txtarchive uses errors="replace" – check for binary files in the archive |
extract-notebooks finds no notebooks |
Ensure archive uses LLM-friendly format (--llm-friendly) |
| Outputs cluttering the archive | Add --extract-code-only to strip outputs |