A Claude Code Skill to Tidy Up Auto-Memory

A Claude Code Skill to Tidy Up Auto-Memory

·13 min read·Updated on April 28, 2026
Disclaimer

Claude Code's auto-memory is loaded into every conversation. A poorly maintained memory silently pollutes every session: tokens consumed for nothing, degraded context, recommendations based on stale info. This skill addresses that. It also modifies your files, so automatic backup before anything else. Always.

The attic

Friday evening. I open MEMORY.md because a Claude Code recommendation had felt off all afternoon. I find this:

# Memory

## Skills directory structure

- Skills are stored in `~/.claude/skills/`
- Each skill has a `SKILL.md` with frontmatter

## Existing skills

- `data-pipeline` — ETL orchestration
- `incident-response` — runbook structure
  [...11 lines...]

## Skills rewrite (2026-03-10)

- Both `data-pipeline` and `release-checklist` rewritten to match standards
- Now use: checklist progression, parallel agents, ultrathink
- Key updates: runtime validation, deprecated schemas [...230 chars on one line...]

## release-checklist architecture

- `SKILL.md` — Main orchestration
- `STAGES.md` — 8 deploy stages
  [...]

46 lines. Six multi-paragraph sections. Two of them entirely reproducible with a single ls. One that read like a session changelog. One describing a skill I could just open in its folder.

It wasn't that the content was wrong. It was that all of this was loaded into every conversation, with zero added value. The harness loads MEMORY.md automatically at session startup. Anything past 200 lines gets truncated. What you put in there lives in every thread's context. Period.

Lovely.

I needed to clean up. But how? No public documentation on best practices.

The problem: internal rules, undocumented

Auto-memory is defined in Claude Code's internal system prompt, in a section called auto memory. You don't see it, but it lives in every session. It describes:

  • 4 strict types of memory: user, feedback, project, reference
  • A mandatory body format for feedback and project (**Why:** and **How to apply:** lines)
  • Strict rules for MEMORY.md (no frontmatter, lines < 150 chars, pure index)
  • A list of things to never save (derivable catalogs, git history, bug fixes, CLAUDE.md duplicates)

I searched Anthropic's public docs for these rules. Nothing. The concept of memory shows up in the managed agents docs, but not the format or internal rules of the Claude Code CLI. Everything's in the system prompt.

Right. Encode these rules into a skill that audits and refactors.

What it looks like

One command:

/memory-reorganize

And 30 seconds later:

- [x] Phase 1: Inventory (list files, read MEMORY.md)
- [x] Phase 2: Per-file audit of each memory file
- [x] Phase 3: MEMORY.md audit (index format)
- [x] Phase 4: Detect duplicates and overlaps
- [x] Phase 5: Detect derivable content
- [x] Phase 6: Freshness check
- [x] Phase 7: Violation report

## Memory audit — 2026-04-28

**Current state**: 3 files + MEMORY.md (46 lines)

### Critical violations
- MEMORY.md L3-6 "Skills directory structure": entirely derivable from `ls`
- MEMORY.md L8-19 "Existing skills": 12 lines derivable from `ls`
- MEMORY.md L21-27 "Skills rewrite": 7 inline lines (should be removed)
- MEMORY.md L26 is ~280 characters (>150)

### Medium violations
- release-checklist.md: non-standard `originSessionId` field
- data-pipeline.md: "Architecture" section derivable from `ls`

### Derivable content to remove
- 60% of release-checklist.md: catalog of support files

The skill then proposes a structured refactor plan and asks for explicit validation before touching anything. It backs everything up to *.backup-YYYY-MM-DD before writing a single line. Always. (Ask any dev who's wiped a critical folder once why I'm emphatic about "always".)

Result on my own memory: 46 lines of MEMORY.md down to 5. Three files refactored (averaging -50% length, missing Why/How added). Zero loss of valuable info.

The rules, condensed

4 strict types

TypeWhatWhen to save
userRole, preferences, expertise of the userWe learn a detail about them
feedbackBehavior rule (correction OR validation)The user corrects or validates an approach
projectDecision, deadline, motivation behind a piece of workWe learn a non-trivial reason
referencePointer to an external system (Linear, Grafana, MCP)We learn where to look outside the repo

Any file that doesn't fit one of these 4 types is suspect. If you write "Memory: project structure", it's neither user, feedback, project, nor reference. It's derivable from the filesystem. So it's not a memory. (Spoiler: that's exactly what I had.)

Why / How to apply format

For feedback and project, two mandatory lines:

**Why:** [motivation, incident, constraint]
**How to apply:** [when/where this rule applies]

Without these two lines, judging edge cases later becomes impossible. A memory that says "don't commit on Friday" without a Why becomes ambiguous in two months. No-deploy or no-commit? All day or only after 4pm? Bank holiday Fridays? You're stuck staring at the bare rule with none of the invariants that produced it.

MEMORY.md = pure index

Strict rules:

  • No frontmatter
  • One line per memory, format - [Title](file.md) — one-line hook
  • Each line < 150 characters
  • Total < 200 lines (beyond = truncated by the harness)
  • No content written directly, just pointers
  • Semantic sections by topic, not chronological

If your MEMORY.md contains the word "we", or lists things, or describes architecture: it's broken.

What you MUST NOT save

Even if you explicitly asked for it:

  • Code patterns, architecture, paths (derivable)
  • Git history, who-changed-what (git log is authoritative)
  • Debug recipes or fixes (the fix is in the code)
  • Anything already in CLAUDE.md
  • Ephemeral state (in-progress tasks, current conversation context)
  • Lists derivable from ls

Mental test: "where will I find this info later?". If the answer is ls, git log, or CLAUDE.md, it's not a memory.

Absolute dates

Always 2026-04-28. Never "recently", "last week", "two days ago". Memory lives for months. Relative dates, six months apart, are pure fiction.

Skill architecture

The skill lives in ~/.claude/skills/memory-reorganize/ (see on GitHub):

memory-reorganize/
├── SKILL.md              (orchestrator, 171 lines)
├── BEST_PRACTICES.md     (full rules, 86 lines)
└── EXAMPLES.md           (concrete before/after, 207 lines)

The pattern follows Anthropic's official progressive disclosure recommendation. SKILL.md stays lean (phase orchestration), the detailed rules and examples only get loaded when the skill needs them for a specific phase. It's not cosmetic. It's measurable in tokens saved per invocation. More on this below.

SKILL.md, 10-phase orchestrator

The workflow:

Phase 1  : Inventory (list files, read MEMORY.md)
Phase 2  : Per-file audit vs BEST_PRACTICES.md
Phase 3  : MEMORY.md audit (index format)
Phase 4  : Duplicate / overlap detection
Phase 5  : Derivable content detection
Phase 6  : Freshness check (broken refs, relative dates)
Phase 7  : Violation report
Phase 8  : Refactor plan (consulting EXAMPLES.md)
Phase 9  : User validation (AskUserQuestion)
Phase 10 : Backup + apply + final verification

See the full file on GitHub.

BEST_PRACTICES.md, encoded rules

This file captures the internal rules from the system prompt as tables and checklists. It's the source of truth for the audit (phase 2-6).

Why a separate file? To avoid bloating SKILL.md. The skill only loads BEST_PRACTICES.md during the audit phase. The rest of the time, those 86 lines consume zero tokens. Multiplied by future invocations, the math gets interesting.

See on GitHub.

EXAMPLES.md, calibrating refactor style

This is the file that surprised me most in terms of impact. Without concrete examples, Claude refactored correctly but the style varied a lot from one call to the next. With 3 well-chosen before/after examples, the style becomes deterministic. Like teaching taste by demonstration: I show what I expect, Claude reproduces.

Three cases covered:

  1. project memory without Why/How, the most frequent violation
  2. MEMORY.md = chaos to pure index, the most visible transformation
  3. Well-structured but verbose memory, the subtle case

Plus an anti-pattern table at the end to calibrate detection.

See on GitHub.

A concrete case

Before the skill, here's what a typical "well-intentioned but broken" project memory looks like:

---
name: release-checklist skill architecture
description: Architecture of the release-checklist skill (refactored 2026-04-14)
type: project
originSessionId: 00000000-0000-0000-0000-000000000000
---

Pre-deploy validation skill for backend services.

Architecture (after 2026-04-14 refactor):

- `SKILL.md` (118 lines) — Lean orchestrator
- `STAGES.md` — 8 deploy stages with gating criteria
- `ROLLBACK.md` — 4 rollback procedures
- `KPIS.md` — measurable thresholds per stage

Key rules:

- No deploy on Friday after 16:00 (SRE policy)
- Mandatory canary 5% before full rollout
- Rollback within 15min if error rate > 1%
- Postmortem required for any P1/P2

The skill's diagnosis:

  • Frontmatter with non-standard originSessionId
  • "Architecture" section entirely derivable from ls of the skill folder
  • No **Why:** or **How to apply:** (mandatory for type=project)
  • "Key rules": must keep (non-derivable SRE rules)

The result after refactor:

---
name: release-checklist skill
description: Pre-deploy validation skill for backend services. Blocks risky deploys and enforces mandatory SRE checks.
type: project
---

Pre-deploy skill for internal backend services.

**Why:** Reduce rollbacks (12% of deploys before introduction). Enforce canary, no-deploy-Friday, and automatic rollback <15min on >1% error rate, after the 3 P1 incidents in Q4 2025.

**How to apply:** Invoke for any release/deploy preparation. Architecture details in support files — do NOT duplicate here.

## Non-derivable operational rules

- **No deploy Friday > 16:00** (SRE policy)
- **Mandatory 5% canary** before full rollout
- **Rollback < 15min** if error rate > 1%
- **Postmortem required** for any P1/P2

26 lines down to 16. No more duplication with the filesystem. The Why finally tells the incident story behind the rule. The How to apply says explicitly when to invoke it.

And most importantly: in 3 months, when I hit the edge case ("can we deploy a critical patch on Friday at 5pm to fix a data leak?"), I'll have the motivation to decide. Not just the bare rule.

Safeguards

The skill modifies files. So:

  • Systematic backup: each modified file is copied to *.backup-YYYY-MM-DD before any write
  • Explicit user validation (Phase 9, via AskUserQuestion). The skill touches nothing without your OK.
  • No modifications outside the memory folder. No CLAUDE.md, no settings.json, no skills.
  • Preserves valuable content: if a memory violates the rules but contains useful info, it's refactored, not deleted
  • Never invent a memory: the skill reorganizes what's there, it doesn't create from scratch

Try it

Restart Claude Code (skills load at startup), then:

/memory-reorganize

Or with a specific focus:

/memory-reorganize duplicates
/memory-reorganize frontmatter
/memory-reorganize index

Expect 30 seconds to 2 minutes depending on the size of your memory. With 3 memory files, fast. With 30 files, the skill takes its time to read and cross-reference everything.

What I learned building this

Claude Code's internal rules aren't publicly documented. Everything's in the system prompt. It's deliberate (internals can evolve), but it means that to do things correctly you either have to read your current system prompt or ask Claude itself for the rules. The skill encodes these rules into BEST_PRACTICES.md to make them explicit. At the cost of potentially becoming obsolete if Anthropic changes the system. That's the deal.

The Why changes everything. I tested two versions of the skill: one that just asked for type + content, one that required Why + How to apply. The second produces memories usable at 6 months. The first produces memories I can't interpret at 2 months. My own notes, two months later, opaque. Motivation is what turns a note into an actionable rule.

Derivable content is the main enemy. On my memory, it was 60% of the volume. List of skills (ls gives it), folder architecture (tree gives it), session changelogs (git log gives it). This content pollutes every session without adding anything. The mental test "where will I find this info later?" should be systematic before every save. I do it now. Cut my memory size in half.

Progressive disclosure really works. Splitting the skill across 3 files saved me ~30% on SKILL.md size. Files are loaded only when the skill needs them. The pattern Anthropic recommends isn't cosmetic. It's measurable in tokens saved per invocation.

User validation is non-negotiable. A skill that modifies files without asking is a skill that will eventually erase something important. One day. AskUserQuestion adds 10 seconds per execution. That's nothing compared to the cost of restoring a backup. Even less compared to restoring a file without one.

Limits

It's a tool. Not an authority.

The skill applies rules. But some of my memories "violate" the rules while still being useful. For instance a memory that contains both project and reference. The skill detects ambiguity and asks rather than deciding alone. It can also be wrong about the type it proposes, especially on short or genuinely ambiguous memories.

It does not generate memories. If your memory is empty, the skill won't invent anything. It reorganizes what exists, that's it. To enrich memory, that's regular usage doing the work: Claude Code saves over the course of conversations according to its own rules.

And of course, it depends on the stability of Claude Code's internal memory format. If Anthropic changes the format tomorrow, BEST_PRACTICES.md becomes potentially wrong. The skill is a snapshot of the rules at a given moment. It'll need updating when the system evolves.

But for the present use case, an attic that drifts over months, it's become my quarterly cleanup tool. 30 seconds to audit, 2 minutes to validate. Memory restarts clean for 3 months. Until I start piling up derivable junk again, which I obviously will.

ShareLinkedInXBluesky

Related articles