8 patterns that make Claude Code actually follow your rules
At some point, I stopped typing and started hammering the keyboard. This was the fourth time I was explaining to Claude that this was NOT the way we use labels in form-fields.
The next morning I decided to try something different. Could I turn Claude into a self-learning system? At first it seemed to work brilliantly. At the end of each session I would ask Claude to reflect on what it had learned, and it would write out pitfalls and patterns into separate markdown files that got loaded at the start of the next session.
But if you’ve worked with Claude for a while, you know: those files grow fast.
I thought I was being clever by creating index files. One for logging, one for testing, one for frontend, one for backend. But the system did not get much smarter. Instead of explaining the same mistake four times in a row, I could now point Claude to the file that said it should not make that mistake again. Progress: from four explanations down to two. But my colleagues using the same workflow had no idea those files even existed.
The real question became: can this be smarter? So I went deep into everything Claude Code offers. And I arrived at what I think is a solid architecture.
After a lot of research and testing, I realized the index files were nice, but the files they pointed to were almost never actually loaded. When things went right, it was luck, not good architecture. What did help was that I could now point Claude to the right file on the second try. But colleagues who had not seen the same pattern before would not have that benefit.
How could I create a system that was self-learning AND actually used those lessons to improve with every session?
Here is what I found. And also: my method for significantly less SHOUTING at Claude.
1. Think in layers, not in one big file
The best Claude Code setup organizes knowledge into four distinct layers, each triggered differently and loaded at different times.
Layer 1: Always loaded. This is the smallest layer and the most critical. You know it as CLAUDE.md. These are rules that apply to every session, regardless of what you’re working on. Model policy. Process rules. The things you’d genuinely need if someone handed you any file in the codebase and said, “Fix this.” In our case: the rule that we never use US cloud model names in code (we use internal aliases instead), the rule that every completion must report evidence before stopping, the rule that breaking changes must be traced through logs before fixing.
Keep this layer short. Every line here costs tokens in every session. We maintain 150 lines total. Under 1,500 tokens. Anything that applies “usually” or “sometimes” does not belong here.
Layer 2: Per file type. Triggered when Claude reads a .py or .ts or .yml file. Generic language patterns, linting standards, async rules for Python, React conventions for TypeScript. Things that matter for that file type, but not for everything else. Our Python file-type rules are 80 lines: they cover async patterns, SQLAlchemy conventions, type hints, test structure. When we’re not touching Python, they don’t load. When we are, they load once and stay active for the whole session.
Configuration: a small YAML frontmatter on the rules file tells Claude which file patterns trigger the load:
paths:
- "**/*.py"
- "**/pyproject.toml"
Layer 3: Per project. More specific than file type. klai-portal/backend/**/*.py loads the portal’s architecture decisions, its security pitfalls, its deploy specifics. klai-website/** loads the typography rules and color tokens. A different service gets different context. A developer working on the portal backend gets backend context. Someone editing the website gets website context. They never step on each other.
This is where organizational knowledge lives. Not generic “how to Python,” but “how we Python in this service.”
Layer 4: Per action. This is enforcement, not knowledge. Hooks that fire when specific commands run. When we run ssh, a DevOps context hook prints the server checklist. When we run sops decrypt, the secrets protocol activates. When we try to stop without reporting confidence, we get blocked.
Hooks are 100% mechanical. A PreToolUse hook that matches Bash commands fires before every bash command that matches, without exception. No ambiguity. No chance of the model “forgetting” to apply the rule. This matters for things where you cannot afford forgetfulness.
The key question for any piece of knowledge is: which layer does this belong in? Put it in the wrong layer and it either clutters every session or never loads when you need it.
2. paths: frontmatter has more patterns than you think
Most examples you see online show paths: "**/*.py", which loads this rules file whenever we touch a Python file. Useful. But also just one of five patterns we use.
File extension: The obvious one. **/*.py loads Python rules. **/*.ts and **/*.tsx load TypeScript rules. Works for any language.
Full directory: klai-website/** loads website design rules: typography, color tokens, border radius conventions. Any file in that directory triggers the load. The Python async rules stay invisible.
Service + extension combined: klai-portal/backend/**/*.py loads the portal’s specific architecture rules: its multi-tenant security model, its SQLAlchemy conventions, its migration patterns. Generic Python rules also load, but the portal-specific layer adds the context that matters for that service.
Config files: **/Caddyfile loads our Caddy configuration rules. You are not writing Python or TypeScript, but opening a Caddyfile is a clear signal that server routing context is relevant.
Test files: **/test_*.py and **/*.test.ts load testing-specific rules. Playwright browser cleanup rules, assertion patterns, fixture conventions. These only matter when you are writing tests. The rest of the time, they stay out of the way.
One thing worth knowing: the paths: condition fires when Claude reads a matching file, not when it writes one. This is a known quirk: if you create a new file without reading an existing one first, the relevant rules do not load for that first action. In practice this rarely happens because Claude often reads files of the same type before it writes them. And once a rules file loads, it stays active for the entire session.
3. Use CLAUDE.md in subdirectories, but only for real sub-repos
Claude Code supports two mechanisms for scoped instructions. Understanding the difference saves you from writing the same rule twice.
A CLAUDE.md file in a subdirectory (like klai-portal/CLAUDE.md) loads when Claude reads any file in that directory. Simple, obvious, easy to find. The downside is that subdirectory CLAUDE.md files are lazy-loaded: they only get picked up when Claude’s tools actually access that directory. If Claude starts working from a different entry point, it may not see them initially.
A paths: rule in .claude/rules/ is triggered by glob matching on file reads. More flexible: one file can target multiple directories or cross-cutting patterns like all .py files regardless of location. And because it is explicit glob matching rather than filesystem traversal, the loading behavior is more predictable.
We use subdirectory CLAUDE.md only for projects that are genuinely separate repos with their own identity. klai-portal is a FastAPI backend plus a React frontend plus its own deploy story. It has its own conventions and its own team context. A CLAUDE.md at its root makes sense. klai-website is the same: separate codebase, separate deployment, separate concerns.
For utilities, smaller services, and packages within the monorepo, we use paths: rules in .claude/rules/klai/. Two reasons. First, higher reliability. Second, everything stays in one place. Scattered CLAUDE.md files are hard to audit and easy to forget exist.
The practical decision rule: if it’s a separate repo with its own identity, use subdirectory CLAUDE.md. If it’s a service within a monorepo, use paths:.
4. Above 200 lines, compliance drops
Anthropic’s own documentation says it directly: longer instruction files reduce adherence. The 2023 paper “Lost in the Middle” (Liu et al.) confirmed the underlying mechanism. Language models pay most attention to the beginning and end of a document. Information buried in the middle gets systematically less weight. A critical rule on line 49 competes with everything around it. The same rule on line 1 does not.
We encoded this directly. Every CRIT and HARD rule in our knowledge base is at the top of its file. Because compliance improves when the critical stuff is where the model looks first.
Above 200 lines per rules file, compliance drops noticeably.
The arithmetic matters. Five small files of 30 lines each is 150 lines total, roughly 1,000 tokens. Negligible in a 200K context window. Five files of 300 lines each is 1,500 lines and 10,000 tokens or more. The model is now parsing a lot of irrelevant content to find the rules that apply.
During our reorganization, we found rules files that were 800 to 1,100 lines. Not because anyone made a bad decision. Because knowledge accumulates gradually and nobody drew the line. The result: the critical rule you needed was buried on line 480, surrounded by lower-priority notes.
We split them all. Each file now covers one domain. One concern. When the file exceeds 200 lines, we split it further.
5. Overlap between layers is fine, but double-loading is expensive
Early on, we tried hard to avoid any overlap between layers. The file-type rules and the project rules never covered the same thing. It created complex exclusion logic that was hard to maintain and hard to reason about.
We got that wrong.
The better approach: accept that **/*.py and klai-portal/backend/**/*.py will both trigger when you’re working on portal Python files. Both files load. If both files are small (under 50 lines each), the overlap costs maybe 500 extra tokens. Negligible. The simplicity you gain by keeping rules focused is worth far more.
What is expensive is unintentional double-loading of the same content. The pattern: if two files cover different concerns (file type vs. project architecture), overlap is fine. If two files define the same rule, consolidate.
It’s worth noting that repetition is generally not a good Claude pattern. You’re better off saying a few things once than many things often. But in a layered rules system, some degree of duplication is nearly impossible to avoid, and we’ve found the benefits outweigh the costs.
6. Maintain your system, or it maintains you
A layered rules system is not something you set up once and forget. It needs the same care as code: regular review, pruning, and a critical eye on what gets added and where. Every file you add increases the context that loads into each session. Every rule you duplicate across layers costs tokens twice. The duplication we accepted in section 5 is fine when it’s intentional and small. But without maintenance, it creeps.
Why this matters: a smaller, tighter context makes the AI perform better. When instructions are narrow and focused, the model knows more precisely what to do. It follows rules more consistently. It makes fewer mistakes. That is the primary reason to keep your context lean — not cost, not session length, but output quality.
Before we reorganized, our context was a mess. Large files loaded in duplicate across layers. A DevOps operation would trigger infrastructure rules, server patterns, secrets protocols, deployment checklists, and Python patterns all at once. Before we had typed a single command, 35% of the context window was already gone. The immediate consequence was shorter sessions, more /clear commands, and more rebuilding context from scratch.
And then there is the energy side. Every token processed is electricity consumed. Not theoretical: direct and measurable. Poorly structured context that loads three times what it needs uses three times the energy for the same output. If you are working with AI every day, that adds up.
After the reorganization: 5 to 8 files loading per session, 150 to 400 lines total, roughly 1,000 to 3,000 tokens. A DevOps operation now loads only what is relevant to that operation. Nothing else.
7. Close the feedback loop, every incident becomes a permanent rule
The layers and the file structure are table stakes. The feedback loop is what makes the system actually get smarter over time.
Every time Claude gets something wrong and you figure out why, you have a choice: explain it again next session, or encode it permanently. A well-structured system makes the second option the easier one. Run /retro, describe what happened, and the lesson lands in the right file, loaded in the right context, next time you need it.
When something breaks (a migration fails, a security issue slips through, a deploy fails in a way that surprises you), we run /retro. A specialized agent analyzes what happened. It uses a six-step decision tree to determine which domain file the lesson belongs in. Then it writes a new rule. Formatted correctly. Placed in the right file. At the top if it is critical.
Next session, the lesson is present. In the actual working memory of the system, not in a Notion retrospective that nobody reads. Not in a Slack thread that disappears after two weeks. In the right file, loading when you’re working on the thing that broke.
The decision tree matters. Without a clear answer to “where does this go?”, lessons don’t get written down. The friction of figuring out the right place is enough to skip it entirely. One unambiguous routing rule removes that friction.
The knowledge base is not a record of what happened. It’s the AI’s working memory, structured to surface the right knowledge at the right moment. Every incident that gets added makes the next incident less likely.
8. Know when to start fresh
All of this makes Claude significantly better at following your rules. But it will never be perfect, and it was never going to be. CLAUDE.md is context, not configuration. The model reads it and tries to follow it. There is no guarantee.
Within a single session, compliance degrades over time. When the context window fills up, Claude compresses earlier messages and your rules, loaded at the start, are the first to lose weight. Every tool output and file read competes with your instructions for the model’s attention. By message 30, that rule on line 4 of your rules file is fighting for space against thousands of lines of code output. You will notice it: conventions followed ten messages ago suddenly forgotten, shortcuts you explicitly told it not to take.
For rules that absolutely cannot be broken, use hooks — a PreToolUse hook with exit 2 blocks the action mechanically, and the model cannot override it. We use this for blocking force pushes and enforcing test runs before commits. For everything else, accept that compliance will be significantly higher but not 100%.
The practical fix: start a completely new session before things drift too far. Not /compact — that compresses the conversation but keeps the same bloated context, and the compression itself is lossy. Instead, ask Claude to summarize the current session: what was decided, what was built, what is left to do, what context is needed for a new session. Copy that summary, run /clear or start a fresh terminal session, and paste it as your opening message. Your CLAUDE.md and rules files reload from disk. The model gets a clean attention budget with your instructions back at full weight.
I think of it like rebooting a server. A clean session with good rules will always outperform a long session where those rules got buried.
But Claude has its own memory, right?
One more thing worth mentioning: Claude Code has its own auto memory system. It maintains a MEMORY.md file per project where it stores patterns it learned from your corrections, build commands, architecture decisions, things it picked up along the way. The first 200 lines of this file get loaded at every session start. Over time, this file accumulates a mix of useful insights and outdated notes. It is worth reviewing it periodically — sit down with Claude, go through the entries together, and move the valuable ones to the rules files where they belong. A lesson about Python async patterns should not live in a memory dump. It should be in your Python rules file, loaded when you are working on Python, with the right priority. The memory file is a triage inbox. The rules files are where knowledge lives permanently.
You can also turn auto memory off entirely. We ended up doing that. Once you have a well-structured rules system with a working feedback loop, auto memory becomes redundant at best and noisy at worst. It adds tokens to every session for information that either already lives in your rules files or should be moved there. Turning it off keeps the context cleaner and removes one more thing to maintain.
What this actually requires
It is not a lot of upfront work. Most of these patterns take an afternoon to set up. The ongoing discipline is the hard part.
Actually writing the lesson when something breaks. Actually keeping files focused as they grow. Actually running /retro after the incident instead of moving on. Actually reviewing the feedback loop and making sure nothing sticks in the wrong layer.
Do this consistently and something shifts. Code quality goes up because the right rules are always present when the relevant files are open. Mistakes that happened once stop happening again. Sessions get longer and more productive because context is not wasted on irrelevant rules.
But the thing I notice most is that working with Claude becomes more enjoyable. It feels less like correcting a tool and more like working with a colleague who actually remembers things. A colleague who gets better at your specific codebase, your specific standards, your specific ways of doing things. And I don’t like shouting. And this helps.