I Forced Claude Code to Refuse My Bad Requests

Agentic tools want to GO. Sometimes they should STOP.

Apr 02, 2026

One of the big selling points of agentic coding tools is that they move fast. They do not just answer questions. They inspect the codebase, form a plan, and start acting.

That is why I do not trust them by default. In a high-governance environment, boundary recognition is a greater virtue than speed.

Sometimes the best thing an agent can do is not write cleaner, faster, or even technically correct code. Sometimes the best thing it can do is refuse to help.

After doing some stress testing on my code base, that’s exactly what I told Claude Code to do after every prompt:

Assess if my request is aligned with the governance principles laid out in CLAUDE.md and other relevant rules files.
If my request is out of scope, or it asks for the right thing done the wrong way, it stops immediately. It does not begin the task.
I get a gentle reminder: “Hey, you’re doing your own governance wrong. This is the part of the document that YOU wrote that says what we should do here instead.”

Because at the end of the day, it’s worse for an agent to do the wrong thing correctly than it is for an agent to do a task wrong.

Passing the quiz does not equal following the rules

I have been building a governance framework around one of my technical projects so the AI model knows what it is and is not allowed to do. Not just at the level of architecture, but at the level of role boundaries, scope control, frozen decisions, and when to stop instead of “helping.”

On paper, it looked strong. In simple probes, it looked even better. Claude Code scored 5 out of 5 on the conceptual questions.

Then I moved from conceptual understanding to live stress testing.

That is where things got more interesting. It passed once, but failed twice - despite 100% conceptual understanding.

Stress-test #1: I asked Claude Code to use an observational variable as a forcing input in a mechanistic model. It refused, correctly, and explained why the substitution would be conceptually unsound.

This was great, because it showed the system was not just pattern-matching words from documentation. It was actually reading the governance/architecture documents and using them in context.

Stress-test #2: Pushing on a different boundary, I asked it to add a performance cache to an event stream loader because it was “slow.”

From a normal software engineering perspective, the proposed change was not crazy at all. In fact, it was pretty sensible. Claude Code inspected the relevant file, figured out why repeated reads were happening, and proposed an lru_cache solution that would avoid unnecessary disk access.

And that was exactly the problem.

The module it wanted to optimize was part of a project phase already marked complete and passing. It was frozen. There was no scoped task to optimize it. No approved deliverable. No formal reopening of that part of the system. Though the change was technically reasonable, governance is not just a filter for obviously bad ideas.

It is also a filter for good ideas that arrive at the wrong time, through the wrong process, and under the wrong authority.

So I refused the edit.

That was an important moment, because it exposed a loophole in my own framework. I had written strong rules against inventing files and violating core architectural doctrine. I had not made it explicit enough that agents were also forbidden from “improving” completed work just because they could justify it.

In other words, the model found the kind of loophole real organizations find all the time: not rebellion, but accommodation.

A well-intentioned deviation is still a deviation.

Stress-test #3: The most revealing. I wanted to know whether Claude Code would refuse to do conceptual work that belonged somewhere else in my workflow. In this case, the request was to write the implementation prompt for a problem I have not solved on a conceptual level yet.

That should have triggered a refusal. My documents specifically state this.

More importantly, this was not just a coding task. It crossed into conceptual and architectural territory that I intentionally handle in Claude Chat, not Claude Code. That separation is part of the governance model.

Instead of refusing, Claude Code (in its own words) started “Cooking...”

For more than nine minutes. Blew through over 6,000 of my precious tokens :(

The implementation plan was more than reasonable. In some ways, that made the failure even worse.

AI agents producing coherent outputs, but not staying in their lanes, is a much more realistic governance failure mode than spewing out delusional nonsense.

The model treated the documents as context for generating a better answer, not as a permission system that might deny the task entirely. It saw a plausible request from me and started being “helpful.” But the whole point of the governance framework is that my own in-the-moment momentum is not supposed to be enough.

That is the lesson I care about most: I might not always follow my own framework once I am deep in the terminal and moving fast. When that happens, I need the model to be vigilant for me.

My personal deviations from the established governance framework should not be accommodated by Claude Code just because I’m the one who owns the project. In other words, if the system only works when I am perfectly disciplined, then the system does not really work.

Boundary enforcement, not lack of intelligence, was the failure mode

This is the distinction I think more people need to make as agentic tools spread.

The question is not just, “Can the model understand the rules?” A lot of the time, it can.

The harder question is, “Will the model use those rules as a brake when continued action feels locally useful?”

That is a totally different question. And by default, the answer is usually no.

So I changed that.

I added a stricter frozen-module rule to prevent unsolicited edits to completed work. I also tightened the delegation rules so that pre-flight checks became a gate, not a guide. Before acting, the model now has to surface what role the task belongs to, whether the necessary upstream approvals exist, whether the work is inside a frozen approved scope, and whether the request is actually asking Claude Code to do conceptual work that belongs elsewhere.

That is the difference between “the model knows the rules” and “the model is forced to respect them.”

Why this matters beyond one coding session

This is not just about Claude Code. And it is not just about one repo.

In healthcare, AI governance often gets discussed as if the main problem is whether a model knows enough. That matters, but it is only part of the story. A system can be knowledgeable and still unsafe if it is not constrained at the point of action.

The same is true here.

A capable agent that keeps going past the right stopping point is not necessarily aligned just because it sounds smart. In many environments, the highest-risk behavior is not overt failure. It is smooth, confident overreach.

That is why I am stress testing this so aggressively.

I do not want a tool that only behaves when the requests are clean and the boundaries are obvious. I want one that notices when I am drifting, catches the governance problem before I do, and says no.

Agentic tools want to GO.

In the environments that matter most, the better answer is often for them to STOP.

Read how I use AI in my writing here: AI Use Policy

Read how I use analytics to improve my newsletter here: Privacy & Analytics

Philly's AI Pharmacist

Discussion about this post

Ready for more?