Critical rule: NEVER embed Python functions that process erk data or encode business logic.
Why This Matters
Embedded Python code in documentation:
- Is NOT under test - it silently goes stale
- Causes bugs when agents copy outdated patterns
- Encodes business assumptions (field names, conventions) that can change
- Creates maintenance burden requiring docs-code sync
PR #2681 demonstrated this: an agent copied incorrect scratch directory paths from documentation because the docs hadn't been updated when the implementation changed.
The "Simple Function" Trap
Even "simple" functions are dangerous. A one-liner like:
```python
# DANGEROUS - encodes agent- prefix convention
files = [f for f in dir.glob("*.jsonl") if not f.name.startswith("agent-")]
```
Embeds a naming convention that could change. When it does, the docs become a source of bugs.
What to REMOVE (Aggressive Stance)
Remove ALL Python def functions that:
- Process session logs, JSONL, or erk data
- Encode path patterns or naming conventions
- Filter, parse, or transform erk-specific data
- Implement algorithms that exist (or could exist) in production
- Show "how to" implementation patterns for erk internals
Even if the function doesn't exist in production today, it could be added later, creating divergence.
Class Templates and File Listings
Also REMOVE:
Class templates:
- Full class definitions showing "how to implement X"
- Method implementations from base classes
- Example classes that duplicate actual implementations
Directory/file listings with counts:
- Lines like
βββ skills/ (2 files) - counts go stale - Exhaustive file listings enumerating every file
Replace with:
- Source references: "See
path/to/example.py for the pattern" - Structural trees without counts showing organization
- CLI commands: "Use
ls dir/ to see current files"
What to KEEP (Narrow Exceptions)
- JSON/YAML format examples: Showing data structure, not processing code
- External library patterns: Click commands, pytest fixtures, Rich tables (teaching third-party APIs)
- Anti-pattern demonstrations: Code explicitly marked "WRONG" or "DON'T DO THIS"
- Shell/bash commands:
ls, jq, grep for operational tasks - Type definitions: Dataclass/TypedDict showing structure (not methods)
Decision Test
Before keeping a Python code block, ask:
- Does it contain a
def statement? β Probably REMOVE - Does it process erk-specific data? β REMOVE
- Does it encode a convention (field name, path pattern, prefix)? β REMOVE
- Is it teaching a third-party API (Click, pytest, Rich)? β KEEP
- Is it showing data FORMAT (not processing)? β KEEP
- Does it show a class template? β REMOVE, reference source
- Does it list files with counts? β REMOVE counts, use structural tree
Replacement Format
When removing code, replace with:
- Prose description of what the operation does
- Source pointer to canonical implementation
- CLI command if one exists for agents to use
Before (BAD):
````markdown
```python
def find_session_logs(project_dir: Path) -> list[Path]:
"""Find all main session logs (exclude agent logs)."""
return [
f for f in project_dir.glob("*.jsonl")
if f.is_file() and not f.name.startswith("agent-")
]
```
````
After (GOOD):
```markdown
Main session logs are .jsonl files that don't start with agent-. Agent
subprocess logs use the agent-.jsonl naming convention.
To list sessions for a project, use:
erk exec list-sessions
See preprocess_session.py for the canonical implementation.
```
Source Pointer Rules
- Point to source file path:
package/module.py - NEVER include line numbers (they go stale)
- Use backticks for file paths
- Prefer CLI commands over source pointers when available