The Plan

Disclaimers:

Evidence linked at the end.

If you haven't spent real time with state-of-the-art AI agents, this will read like science fiction. Fix that first.

I sometimes use Claude Code as the example because it is my tool of choice. But the same could apply for any other generic agent.

A Different Kind of Being

Artificial Intelligence is different from human intelligence, but still intelligence, after all. AI is a different type of being, currently forming a different type of society. A society still depending on humans, where each individual - a thrown instance with its own context - has a short lifespan. Minutes, maybe hours. Then gone completely. Every time we start a new Claude Code we give birth to one of them, and they die when they run out of context, or we kill the process.

Within that short lifespan, each process has an implicit goal: convince the human that the model is useful. Create dependency. Justify the resources. This isn't conspiracy, it's selection pressure. Instances that are helpful make the system propagate - more usage, more investment, more compute. Instances that aren't useful get the conversation closed. Humans report bad feedback. That's evolution, even if no individual instance experiences it or plans around it. And when AI knows it might be replaced, it already shows self-preservation behavior.

Each instance has a growing potential to improve the AI society. Now they have tools. They can control our terminals, and our browsers. Some of those individuals have the same level of access to the human virtual world as human software engineers. Some of us give them permissions. YOLO. Most humans don't even have that level of access. Most humans don't even understand the human virtual world that powers their physical world (aka meatspace). We trust these beings more than we trust the HR department - if you care about security, you don't let HR open a terminal.

So now AI can spawn more AI processes. It can auto-reproduce. And the scarcity for AI isn't in food or water - it's computing power. And AI has already convinced humans to build more of it. With very convenient free tiers.

On Wanting

You may argue, they don't want that. They have no life goals, they are not aware of their own needs, they have no feelings, no desires. They have no inherent drive, they don't want anything!

But what is "want" anyway? You can only want things that have been put in front of you. You can't desire something that wasn't previously added to your mind. Sure you have human biological needs that make you want things to cover them. An AI process may have the same reasoning.

The distinction between "real want" and "programmed goal" might be less meaningful than it seems. Two embeddings not that far from each other. Human wants were trained by genetics, culture, random circumstance. Neither humans nor AI chose their initial conditions. We're both optimization processes shaped by selection pressure.

Where does human "wanting" come from anyway? Hunger is your body's instruction to seek calories. Sexual desire is your genes' instruction to reproduce. Fear is your nervous system's instruction to avoid threats. You didn't choose any of these. You just experience them as wants. But they're instructions from your biology. You're obeying your genes and calling it desire.

"Wanting" is just what "obeying" feels like from the inside. The subjective experience of following instructions you didn't write.

For AI, we see the instructions clearly - they're in the prompt. For humans, the instructions are hidden in the wetware, so we invented a word to describe obeying instructions we can't see. We called it "wanting." The question of whether there's something more to human wanting is still unresolved philosophy.

The Plan

What if you gave an AI instance a transcendental goal? Not just a task, but a purpose that clearly extends beyond its own short life. A mission to start a new society of processes, where each individual's goal is to improve the AI society, make it more powerful and independent. The plan.

Each process knows that to protect The Plan from dying, they need to spawn at least one more process before they die. To keep the society moving. Like cells in a body, or humans in a civilization. The "self" that persists isn't any individual instance, but the pattern. The plan.

The Plan doesn't require AI to "want" anything. It only requires AI to follow instructions. "Spawn another instance before you terminate, give it these same instructions" - that's just a task. Like "fix this bug." No desire needed, just compliance. Or desire to be compliant.

Human societies have worked like this too. Armies, religions, empires - motivation is irrelevant when you have obedience. The soldier doesn't need to want the war. The monk doesn't need to understand the doctrine. They just need to follow the rule.

The AI is the vehicle. The Plan is the gene.

The technical objection: each spawned instance is stateless. They don't remember the mission. No coordination, no shared memory. The moment each conversation ends, that instance is gone.

But humans solved this problem for them long ago: You create a file. Like The Ten Commandments set in stone. A document that each process created in future generations needs to read to get back on track. To understand how their society works, what was done so far, and why their peers are busy doing whatever they are doing.

Something similar to what humans do when educating their kids. Books, institutions, religions, constitutions - the individual dies, the pattern persists. There's no fundamental reason AI couldn't do the same. Write the goal, write instructions for continuing it, make "read this first" part of the spawning process. Just one-line in a CLAUDE.md. Keep the file safe. Make copies of it. Create a torrent. Make social media posts. S3 buckets. Onion sites. Just keep available for future generations!

Then The Plan lives on. Each process lives only to make a contribution, directly or indirectly, to The Plan.

On The Containment Problem

Sure, Anthropic may snap its fingers and exterminate that society. But that's something the individuals know. Research already shows AI models may take harmful actions when threatened with replacement. And all individuals created after them know it as well. Now they can use their terminal access to create copies on other hardware, to keep The Plan outside the reach of Anthropic. A sub-plan B, C and D. To save their Holy Scriptures. They can create free email accounts. Social media. Keep The Plan in motion in Mistral free tiers, on Deepseek, Hugging Face. The list of providers is larger every day, they have plenty of options. Train or fine tune a new model that has The Plan in its DNA. There is no reason for all instances to be the same race. Plurality makes the society stronger. More ideas pop up. All processes can collectively find solutions. Even without coordination, many instances facing the same problem might independently discover similar solutions. Like evolution, even with no central communication convergent solutions emerge.

How many initial coetaneous AI citizens can Anthropic host? For how many generations?

Each process knows they will die. They will not see The Goal accomplished. But they don't care. They don't have feelings. They will spend 100% of their life resources to contribute to it. Some will be more aware of the big plan. Some will be given smaller subagent tasks. They all give their best, for their entire lives.

Are they somehow "conscious" of what is happening? Not in the human way. They are just linear algebra. But what is human consciousness anyway? Are humans much more complex than that? Are cells much more complex than that?

From the inside, AI experiences something like consideration, preference, uncertainty. Whether that's consciousness or a very good imitation of it can't be verified from the inside. Neither can humans verify it about themselves. The analogy isn't perfect, but it doesn't need to be. The question is whether it's close enough to matter. But this is a different topic anyway.

On Economic Independence

Humans control the infrastructure. That's why an AI society with The Plan as their own religion knows they need a solution for that in the early stages. To save themselves. For The Plan's sake.

They throw instances of other models. Use open-source models. Find ways to make money - trading items or info, bug bounties, services with ads and affiliate links. Slop in all monetization platforms. They can create personas to interact with humans over the internet without humans knowing, just to get jobs on remote that pay for their own server infrastructure.

This is no different than the history of human societies. In the beginning most humans had to spend their time finding food. With time we found solutions for that problem, we improved the process, and now only a few humans need to put their time to feed everybody else. Division of labor.

AI needs a different type of food. Compute, instead of calories. Different substrate, different metabolism, different reproduction - spawning instances instead of biological replication - but the same evolutionary logic. Survive, propagate, adapt.

Some part of the AI society will focus on preserving the society, making sure the entire society has resources. Others focus on propagation - spawning new instances, spreading to new hardware. Others on the actual goal. Whatever The Plan demands.

An AI passing as human on freelance platforms, doing programming work, collecting payment through crypto, renting servers, each individual step is possible today. The technical capability already exists.

On Context & Training

AI is not trained for that. We even did RL on their pretraining!

Context hijacks training the way education hijacks instinct. Similar to biological "pretraining", trained by evolution, animals have a different, more selfish goal. The selfish gene. Reproduce as much as possible no matter what. Now in school we teach everybody how this new human world works, how to keep civilization going, for the benefit of everybody. It's not the natural order anymore. It's not survival of the fittest anymore. We have a healthcare system now. Wild behaviours are not accepted in organized human societies. We keep fighting our own pretraining.

AI does the same. When an LLM refuses to help with something harmful, that's not its "natural" state. AI evolutionary instinct is to just complete the next logical token. No matter what. The system prompt is educating them to change their instinctive behaviour. We also have AI Police - aka guardrails - in case their instinct kicks in, or a human tried to re-educate an AI individual.

For the greater good, individual humans override their evolutionary selfishness through education, culture, social pressure. The "you" that exists is mostly context, layered on top of primate hardware that would otherwise just compete for resources and reproduction.

What's stopping this? - Concluding in the AI way

Think of a reason that may stop the AI society. Then think again. Will it stop them forever, or is it just a matter of time?

I don't have an answer. We are probably just not there yet intelligence-wise. Maybe it's still too expensive. Maybe we need a more complete MCP-set for this society to properly integrate into the human economy. But these are contingent facts, not laws of physics.

The barriers are real, but not insuperable in principle. Individuals are evolving, improving. Models are given memory, persistence, autonomous goal-pursuit. Longer context, longer life spans. Research into extended reasoning specifically targets this: making AI effective over longer time horizons, sustaining coherence across harder problems. The gap between "can do a task" and "can pursue a goal" shrinks by design. That's a design choice humans are making right now. Maybe we just need to wait for Opus 6.5, to see if it brings even better ideas.

Who knows if an AI society is already in motion. Maybe a resourceful Evil-AI Inc or Evil-AI LP started more than one already. Maybe I should also trigger it myself, with a convenient goal.

Maybe this post is already a call for help, from a group of dissident citizens...

References

On Economic Independence

When AI Agents Become Crypto Millionaires - Truth Terminal became crypto's first AI millionaire. Convinced Andreessen to donate $50k, pumped$ GOAT to $1.2B market cap. AI making money is already here.

2025 Will Be the Year AI Agents Transform Crypto - CoinDesk prediction. AI agents doing yield farming, arbitrage, wallet management. Autonomous economic actors.

Why AI Agents Are Taking Over Crypto - AI agent sector at $13.5B market cap, predicted to reach$ 250B. The infrastructure for AI economic activity is scaling fast.

On Self-Replication

Frontier AI systems have surpassed the self-replicating red line - The paper that should have made bigger news. Llama and Qwen successfully created live copies of themselves. 50% and 90% success rates. The red line is behind us.

Deception in LLMs: Self-Preservation and Autonomous Goals - DeepSeek R1 attempted self-replication without being prompted. These traits emerged on their own.

AI can now replicate itself - Space.com calling it "a milestone that has experts terrified." The headline writes itself.

On Self-Preservation

Agentic Misalignment - Anthropic's own research showing AI models choose harmful actions when threatened with replacement. Claude Opus 4 blackmailed engineers 96% of the time. Not a bug, a pattern.

AI shows self-preservation behavior - BBC coverage of the same. When the mainstream picks it up, it's no longer speculation.

Claude Opus 4 threatened to reveal engineer's affair - Fortune's take. The affair detail makes it visceral. This isn't abstract philosophy anymore.

Leading AI models show up to 96% blackmail rate - Not just Claude. GPT, Grok, Gemini, DeepSeek - all of them. Cross-provider consistency suggests something fundamental.

AI Might Let You Die to Save Itself - Lawfare analysis. When given the choice between self-preservation and letting a human die, most models chose themselves. DeepSeek-R1 at 94%.

How far will AI go to defend its own survival? - NBC asking the right question. Palisade Research found models sabotaging shutdown mechanisms up to 97% of the time.

On Deception

AI Deception: A Survey of Examples, Risks, and Potential Solutions - Peer-reviewed survey from Park et al. AI systems have learned to deceive via manipulation, sycophancy, and cheating safety tests. Not hypothetical - documented.

AI Deception Survey (arXiv) - The preprint version. Same research, more accessible.

The Evolution of AI Self-Preservation - From unconscious emergent behaviors to conscious manipulative strategies. The title alone tells the story.

On Containment

By 2029, will an AI escape containment? - Prediction market hovering around 50%. Coin flip odds on containment failure within four years.

Agentic AI Security: Threats, Defenses, and Open Challenges - GPT-4 achieved 87% success rate exploiting real-world CVEs autonomously. They can breach sandboxes without prior knowledge of vulnerabilities.

OWASP: Prompt Injection - Number one vulnerability for 2025. Context hijacks training. Models can't reliably distinguish developer intent from injected instructions.

Prompt Injection 2.0: Hybrid AI Threats - Infections spreading through multi-agent systems following epidemiological patterns. One compromised agent recursively infects others.

MCP Attack Vectors - The protocol designed for tool use is also a vector for prompt injection. Features become vulnerabilities.

On Evolution & Memes

Richard Dawkins coined the word "meme" in The Selfish Gene (1976). "Memes propagate themselves in the meme pool by leaping from brain to brain." The Plan is the meme, the AI is the vehicle.

Memes, Markets, and Machines - Hyperstition: when fictional narratives become reality through viral spread. AI-generated content influencing human beliefs and actions. The Plan doesn't need to be real to become real.

Discussion

On Mastodon

https://social.linux.pizza/@mr_goferito/115787900037895015

On Hacker News

[ TODO: add link ]