
A Claude Code Skill to Tidy Up Auto-Memory
Claude Code's auto-memory is loaded into every conversation. A poorly maintained memory silently pollutes every session: tokens consumed for nothing, degraded context, recommendations based on stale info. This skill addresses that. It also modifies your files, so automatic backup before anything else. Always.
The attic
Friday evening. I open MEMORY.md because a Claude Code recommendation had felt off all afternoon. I find this:
# Memory
## Skills directory structure
- Skills are stored in `~/.claude/skills/`
- Each skill has a `SKILL.md` with frontmatter
## Existing skills
- `data-pipeline` — ETL orchestration
- `incident-response` — runbook structure
[...11 lines...]
## Skills rewrite (2026-03-10)
- Both `data-pipeline` and `release-checklist` rewritten to match standards
- Now use: checklist progression, parallel agents, ultrathink
- Key updates: runtime validation, deprecated schemas [...230 chars on one line...]
## release-checklist architecture
- `SKILL.md` — Main orchestration
- `STAGES.md` — 8 deploy stages
[...]
46 lines. Six multi-paragraph sections. Two of them entirely reproducible with a single ls. One that read like a session changelog. One describing a skill I could just open in its folder.
It wasn't that the content was wrong. It was that all of this was loaded into every conversation, with zero added value. The harness loads MEMORY.md automatically at session startup. Anything past 200 lines gets truncated. What you put in there lives in every thread's context. Period.
Lovely.
I needed to clean up. But how? No public documentation on best practices.
The problem: internal rules, undocumented
Auto-memory is defined in Claude Code's internal system prompt, in a section called auto memory. You don't see it, but it lives in every session. It describes:
- 4 strict types of memory:
user,feedback,project,reference - A mandatory body format for
feedbackandproject(**Why:**and**How to apply:**lines) - Strict rules for
MEMORY.md(no frontmatter, lines < 150 chars, pure index) - A list of things to never save (derivable catalogs, git history, bug fixes, CLAUDE.md duplicates)
I searched Anthropic's public docs for these rules. Nothing. The concept of memory shows up in the managed agents docs, but not the format or internal rules of the Claude Code CLI. Everything's in the system prompt.
Right. Encode these rules into a skill that audits and refactors.
What it looks like
One command:
/memory-reorganize
And 30 seconds later:
- [x] Phase 1: Inventory (list files, read MEMORY.md)
- [x] Phase 2: Per-file audit of each memory file
- [x] Phase 3: MEMORY.md audit (index format)
- [x] Phase 4: Detect duplicates and overlaps
- [x] Phase 5: Detect derivable content
- [x] Phase 6: Freshness check
- [x] Phase 7: Violation report
## Memory audit — 2026-04-28
**Current state**: 3 files + MEMORY.md (46 lines)
### Critical violations
- MEMORY.md L3-6 "Skills directory structure": entirely derivable from `ls`
- MEMORY.md L8-19 "Existing skills": 12 lines derivable from `ls`
- MEMORY.md L21-27 "Skills rewrite": 7 inline lines (should be removed)
- MEMORY.md L26 is ~280 characters (>150)
### Medium violations
- release-checklist.md: non-standard `originSessionId` field
- data-pipeline.md: "Architecture" section derivable from `ls`
### Derivable content to remove
- 60% of release-checklist.md: catalog of support files
The skill then proposes a structured refactor plan and asks for explicit validation before touching anything. It backs everything up to *.backup-YYYY-MM-DD before writing a single line. Always. (Ask any dev who's wiped a critical folder once why I'm emphatic about "always".)
Result on my own memory: 46 lines of MEMORY.md down to 5. Three files refactored (averaging -50% length, missing Why/How added). Zero loss of valuable info.
The rules, condensed
4 strict types
| Type | What | When to save |
|---|---|---|
user | Role, preferences, expertise of the user | We learn a detail about them |
feedback | Behavior rule (correction OR validation) | The user corrects or validates an approach |
project | Decision, deadline, motivation behind a piece of work | We learn a non-trivial reason |
reference | Pointer to an external system (Linear, Grafana, MCP) | We learn where to look outside the repo |
Any file that doesn't fit one of these 4 types is suspect. If you write "Memory: project structure", it's neither user, feedback, project, nor reference. It's derivable from the filesystem. So it's not a memory. (Spoiler: that's exactly what I had.)
Why / How to apply format
For feedback and project, two mandatory lines:
**Why:** [motivation, incident, constraint]
**How to apply:** [when/where this rule applies]
Without these two lines, judging edge cases later becomes impossible. A memory that says "don't commit on Friday" without a Why becomes ambiguous in two months. No-deploy or no-commit? All day or only after 4pm? Bank holiday Fridays? You're stuck staring at the bare rule with none of the invariants that produced it.
MEMORY.md = pure index
Strict rules:
- No frontmatter
- One line per memory, format
- [Title](file.md) — one-line hook - Each line < 150 characters
- Total < 200 lines (beyond = truncated by the harness)
- No content written directly, just pointers
- Semantic sections by topic, not chronological
If your MEMORY.md contains the word "we", or lists things, or describes architecture: it's broken.
What you MUST NOT save
Even if you explicitly asked for it:
- Code patterns, architecture, paths (derivable)
- Git history, who-changed-what (
git logis authoritative) - Debug recipes or fixes (the fix is in the code)
- Anything already in CLAUDE.md
- Ephemeral state (in-progress tasks, current conversation context)
- Lists derivable from
ls
Mental test: "where will I find this info later?". If the answer is ls, git log, or CLAUDE.md, it's not a memory.
Absolute dates
Always 2026-04-28. Never "recently", "last week", "two days ago". Memory lives for months. Relative dates, six months apart, are pure fiction.
Skill architecture
The skill lives in ~/.claude/skills/memory-reorganize/ (see on GitHub):
memory-reorganize/
├── SKILL.md (orchestrator, 171 lines)
├── BEST_PRACTICES.md (full rules, 86 lines)
└── EXAMPLES.md (concrete before/after, 207 lines)
The pattern follows Anthropic's official progressive disclosure recommendation. SKILL.md stays lean (phase orchestration), the detailed rules and examples only get loaded when the skill needs them for a specific phase. It's not cosmetic. It's measurable in tokens saved per invocation. More on this below.
SKILL.md, 10-phase orchestrator
The workflow:
Phase 1 : Inventory (list files, read MEMORY.md)
Phase 2 : Per-file audit vs BEST_PRACTICES.md
Phase 3 : MEMORY.md audit (index format)
Phase 4 : Duplicate / overlap detection
Phase 5 : Derivable content detection
Phase 6 : Freshness check (broken refs, relative dates)
Phase 7 : Violation report
Phase 8 : Refactor plan (consulting EXAMPLES.md)
Phase 9 : User validation (AskUserQuestion)
Phase 10 : Backup + apply + final verification
See the full file on GitHub.
BEST_PRACTICES.md, encoded rules
This file captures the internal rules from the system prompt as tables and checklists. It's the source of truth for the audit (phase 2-6).
Why a separate file? To avoid bloating SKILL.md. The skill only loads BEST_PRACTICES.md during the audit phase. The rest of the time, those 86 lines consume zero tokens. Multiplied by future invocations, the math gets interesting.
See on GitHub.
EXAMPLES.md, calibrating refactor style
This is the file that surprised me most in terms of impact. Without concrete examples, Claude refactored correctly but the style varied a lot from one call to the next. With 3 well-chosen before/after examples, the style becomes deterministic. Like teaching taste by demonstration: I show what I expect, Claude reproduces.
Three cases covered:
projectmemory without Why/How, the most frequent violation- MEMORY.md = chaos to pure index, the most visible transformation
- Well-structured but verbose memory, the subtle case
Plus an anti-pattern table at the end to calibrate detection.
See on GitHub.
A concrete case
Before the skill, here's what a typical "well-intentioned but broken" project memory looks like:
---
name: release-checklist skill architecture
description: Architecture of the release-checklist skill (refactored 2026-04-14)
type: project
originSessionId: 00000000-0000-0000-0000-000000000000
---
Pre-deploy validation skill for backend services.
Architecture (after 2026-04-14 refactor):
- `SKILL.md` (118 lines) — Lean orchestrator
- `STAGES.md` — 8 deploy stages with gating criteria
- `ROLLBACK.md` — 4 rollback procedures
- `KPIS.md` — measurable thresholds per stage
Key rules:
- No deploy on Friday after 16:00 (SRE policy)
- Mandatory canary 5% before full rollout
- Rollback within 15min if error rate > 1%
- Postmortem required for any P1/P2
The skill's diagnosis:
- Frontmatter with non-standard
originSessionId - "Architecture" section entirely derivable from
lsof the skill folder - No
**Why:**or**How to apply:**(mandatory for type=project) - "Key rules": must keep (non-derivable SRE rules)
The result after refactor:
---
name: release-checklist skill
description: Pre-deploy validation skill for backend services. Blocks risky deploys and enforces mandatory SRE checks.
type: project
---
Pre-deploy skill for internal backend services.
**Why:** Reduce rollbacks (12% of deploys before introduction). Enforce canary, no-deploy-Friday, and automatic rollback <15min on >1% error rate, after the 3 P1 incidents in Q4 2025.
**How to apply:** Invoke for any release/deploy preparation. Architecture details in support files — do NOT duplicate here.
## Non-derivable operational rules
- **No deploy Friday > 16:00** (SRE policy)
- **Mandatory 5% canary** before full rollout
- **Rollback < 15min** if error rate > 1%
- **Postmortem required** for any P1/P2
26 lines down to 16. No more duplication with the filesystem. The Why finally tells the incident story behind the rule. The How to apply says explicitly when to invoke it.
And most importantly: in 3 months, when I hit the edge case ("can we deploy a critical patch on Friday at 5pm to fix a data leak?"), I'll have the motivation to decide. Not just the bare rule.
Safeguards
The skill modifies files. So:
- Systematic backup: each modified file is copied to
*.backup-YYYY-MM-DDbefore any write - Explicit user validation (Phase 9, via
AskUserQuestion). The skill touches nothing without your OK. - No modifications outside the memory folder. No CLAUDE.md, no settings.json, no skills.
- Preserves valuable content: if a memory violates the rules but contains useful info, it's refactored, not deleted
- Never invent a memory: the skill reorganizes what's there, it doesn't create from scratch
Try it
Restart Claude Code (skills load at startup), then:
/memory-reorganize
Or with a specific focus:
/memory-reorganize duplicates
/memory-reorganize frontmatter
/memory-reorganize index
Expect 30 seconds to 2 minutes depending on the size of your memory. With 3 memory files, fast. With 30 files, the skill takes its time to read and cross-reference everything.
What I learned building this
Claude Code's internal rules aren't publicly documented. Everything's in the system prompt. It's deliberate (internals can evolve), but it means that to do things correctly you either have to read your current system prompt or ask Claude itself for the rules. The skill encodes these rules into BEST_PRACTICES.md to make them explicit. At the cost of potentially becoming obsolete if Anthropic changes the system. That's the deal.
The Why changes everything. I tested two versions of the skill: one that just asked for type + content, one that required Why + How to apply. The second produces memories usable at 6 months. The first produces memories I can't interpret at 2 months. My own notes, two months later, opaque. Motivation is what turns a note into an actionable rule.
Derivable content is the main enemy. On my memory, it was 60% of the volume. List of skills (ls gives it), folder architecture (tree gives it), session changelogs (git log gives it). This content pollutes every session without adding anything. The mental test "where will I find this info later?" should be systematic before every save. I do it now. Cut my memory size in half.
Progressive disclosure really works. Splitting the skill across 3 files saved me ~30% on SKILL.md size. Files are loaded only when the skill needs them. The pattern Anthropic recommends isn't cosmetic. It's measurable in tokens saved per invocation.
User validation is non-negotiable. A skill that modifies files without asking is a skill that will eventually erase something important. One day. AskUserQuestion adds 10 seconds per execution. That's nothing compared to the cost of restoring a backup. Even less compared to restoring a file without one.
Limits
It's a tool. Not an authority.
The skill applies rules. But some of my memories "violate" the rules while still being useful. For instance a memory that contains both project and reference. The skill detects ambiguity and asks rather than deciding alone. It can also be wrong about the type it proposes, especially on short or genuinely ambiguous memories.
It does not generate memories. If your memory is empty, the skill won't invent anything. It reorganizes what exists, that's it. To enrich memory, that's regular usage doing the work: Claude Code saves over the course of conversations according to its own rules.
And of course, it depends on the stability of Claude Code's internal memory format. If Anthropic changes the format tomorrow, BEST_PRACTICES.md becomes potentially wrong. The skill is a snapshot of the rules at a given moment. It'll need updating when the system evolves.
But for the present use case, an attic that drifts over months, it's become my quarterly cleanup tool. 30 seconds to audit, 2 minutes to validate. Memory restarts clean for 3 months. Until I start piling up derivable junk again, which I obviously will.
Related articles