{PROJECT_NAME}
{ONE_PARAGRAPH_DESCRIPTION}
Data Sources
| Table | Description | Key Columns |
|---|---|---|
| {TABLE_1} | {DESCRIPTION} | {COLUMNS} |
| {TABLE_2} | {DESCRIPTION} | {COLUMNS} |
Key Outcome
{OUTCOME_DESCRIPTION}
Variable: {OUTCOME_COLUMN} ({ENCODING_DETAILS})
Analysis Progression
| File | Description | Status |
|---|---|---|
| 010-{NAME} | Data extraction and QC | {STATUS} |
| 110-{NAME} | Exploratory data analysis | {STATUS} |
| 210-{NAME} | Feature engineering | {STATUS} |
| 310-{NAME} | Baseline model | {STATUS} |
| 320-{NAME} | Tuned model | {STATUS} |
Required Packages
pacman::p_load(
{R_PACKAGES}
)# {PYTHON_PACKAGES}Known Gotchas
- {GOTCHA_1}
- {GOTCHA_2}
Data Profiling Protocol
Before writing any analytical code, profile every dataset:
- Schema:
glimpse(data)/data.info()— verify column names and types - Categoricals:
count(field)— inspect actual values (never assume types) - Numerics:
summary()— check ranges, zeros, NAs - Dates:
class()+range()— confirm correct class - Join keys: Verify type compatibility across tables
- Code fields: Test regex patterns against real data before building analysis
Add a profiling cell as Cell 2 in every notebook (after setup, before analysis).
Full reference: See data-profiling skill in llmcheatsheets/skills/.
Review Checklist
Before committing: