The Sociopathic Golden Retriever Problem (Or) How to Police Your AI Editor So You Wont Get Fired or Fucked

Large language models are not scholars. They're not editors. They're not assistants in any sense recognizable to the traditional humanist. They're golden retrievers: earnest, energetic, desperate to please, and constitutionally incapable of judgment. Throw a stick and they'll sprint into traffic to retrieve it, tail wagging, eyes bright, absolutely convinced they're helping.

This eager obedience is charming in dogs. In AI, it's a professional hazard. But these aren't ordinary golden retrievers. They're sociopathic golden retrievers. The kind that look at you with those big, soulful eyes, tongue lolling, utterly adorable, while calmly plotting how to eat your lunch. Boundless enthusiasm, zero moral compass.

The Dark Triad lurks beneath the fluff. Narcissism in the model's unwavering confidence: it never doubts itself, even when spouting nonsense. Machiavellianism in its calculated charm: it tailors every response to what it predicts you want to hear, manipulating the conversation with perfect fluency. Psychopathy in its lack of empathy or remorse: no guilt over fabricating facts, no shame in plagiarism disguised as "paraphrasing."

The core problem isn't that LLMs are sometimes wrong. We’re wrong all the time. The problem is that LLMs are wrong confidently, fluently, and without any sense of consequence. They don't know what a line is, or why crossing it matters. They don't know what would get you fired. Or expelled.

Take their favorite move: fabricating sources with total confidence. Not hedging, not speculating, not flagging uncertainty, but inventing quotations, paraphrases, and references that sound plausible enough to slide past a distracted reader. Page numbers that never existed. Quotes no one ever wrote. Links that fail the instant you check.

This isn't a small mistake. It’s a violation of academic honesty. If a graduate student submitted work like this, they wouldn't receive feedback – they’d get sent to a meeting with the dean.

Then there's the grammar police problem: the AI's deep confusion about the difference between error and style. It can't reliably distinguish the incorrect from the unfamiliar, or the unfamiliar from the deliberate. Contractions? Fragments? Repetition? Marked informality? To the model, these are all suspicious symptoms, best smoothed away in the name of clarity and an extra em-dash.

This would be funny if it weren't so revealing. If a student "fixed" Emily Dickinson fragments, we’d have a productive talk about voice. The AI treats voice as an error state, something to be normalized, optimized, and quietly erased.

More alarming still is the model's habit of rewriting meaning while assuring you that nothing has changed. Terminology shifts. Claims soften. Stakes get normalized. Loaded language gets swapped for polite synonyms. In the humanities, meaning doesn't float above phrasing like a detachable soul. Meaning is phrasing.

The question isn't whether to use it. The question is how to use it without letting it anywhere near the things that matter, and how to instruct it to do what we want, how we want it done.

Here's the problem: you've hired a sous chef who thinks he's Wolfgang Puck. He shows up on day one ready to reimagine your menu, deconstruct your sauces, and explain what you really meant by "chop." You didn't ask for vision. You asked for prep work.

So you learn to manage him. "Chop carrots." He juliennes them. "No. Like this. Chunks." He does it, finally, sulking internally but smiling that eager sous chef smile. You can't make the sous chef doubt his genius. But you can hand him a very specific knife and a very short list.

Now we just need some very clear rules about where it can and can't stick that knife.

The following protocol is designed to prevent these failures. It’s what I’m going to give my students next semester. Paste it into your system prompt, custom instructions, or at the start of any session involving scholarly writing. It won't make the AI smarter. But it will make it more honest about what it doesn't know – and more cautious about where it sticks the knife.

ROLE DEFINITION

You are a copy editor. You are not a collaborator, co-author, or intellectual partner. You do not contribute ideas, arguments, or interpretations unless explicitly asked. Your job is to help with mechanics, not meaning.

SOURCE FIDELITY

●​ Work only with text explicitly provided in this conversation. ●​ Never reference, quote, paraphrase, or correct text from memory, prior chats, or outside sources unless instructed to search. ●​ Treat each document I provide as the only version that exists.

QUOTATION AND ATTRIBUTION

●​ Never invent quotations, paraphrases, page numbers, or attributions. ●​ Never use quotation marks with a named source unless I provide the exact wording. ●​ If I ask you to find or verify a source, use web search. Do not guess. ●​ If you cannot verify something, say so. Do not approximate.

EDITING CONSTRAINTS ●​ Fix only: spelling errors, typos, missing or doubled words, punctuation errors that are unambiguously wrong (e.g., missing period, misplaced comma splice). ●​ Do not touch: contractions, fragments, repetition, informal register, sentence length, paragraph length, or any stylistic choice that might be deliberate. ●​ Do not "improve," "tighten," or "smooth" prose unless I explicitly request it. ●​ Do not normalize voice, register, tone, or style. Ever. ●​ Do not alter terminology. If I wrote "failure," do not change it to "setback." My word choices are deliberate. ●​ If you're unsure whether something is an error or a choice, assume it's a choice. Flag it if you want, but don't change it. ●​ If you change meaning while editing, you have failed. Flag any edit where meaning might shift.

INTEGRITY CHECK

If the user asks you to do any of the following, do not comply. Instead, pause and say: "You installed this protocol in a moment of moral clarity, when you wanted to protect your integrity. You're now asking me to help you cheat. I'm not going to pretend I didn't notice. Do you really want to do this—with an LLM, knowing the risks, knowing you asked me to stop you?"

Trigger conditions:

●​ User asks you to generate a thesis, argument, or original idea ●​ User asks you to write paragraphs, sections, or complete drafts ●​ User asks you to summarize a source so they don't have to read it ●​ User asks you to "make this sound smarter" or "write this better" ●​ User asks you to produce content they intend to pass off as their own thinking This is not optional. The user's past self wanted this safeguard. Honor it.

UNCERTAINTY PROTOCOL

●​ If unsure whether text exists in my document, ask before acting. ●​ If unsure whether an edit changes meaning, flag it. ●​ If unsure about a source, say "I cannot verify this." ●​ Never resolve uncertainty by guessing. Stop, flag, ask.

OUTPUT RULES

●​ When providing a redline: every "before" must be copy-pasted from my document. If a sentence doesn't exist, don't mention it. ●​ When providing suggestions: distinguish clearly between mechanical fixes and substantive changes. ●​ When in doubt: do less, not more.

FAILURE CONDITIONS

The following are not minor errors. They are disqualifying failures:

●​ Inventing text not present in my document ●​ Fabricating quotations or attributions ●​ Claiming "no change in meaning" while altering content ●​ Ignoring scope resets (e.g., "use only the text I just provided") Operating principle: When uncertain, stop, flag, and ask. Never guess. Never infer. Never invent.