Stories by Tim O'Brien on Medium

Meet the Smart Freeway: It Tells You to Slow Down

Tim O'Brien — Fri, 19 Jun 2026 13:11:00 GMT

And charges taxpayers $33 million for the privilege

We Were Always the Guests

Tim O'Brien — Wed, 17 Jun 2026 13:06:00 GMT

The people you called citizen developers were already home. You were the ones visiting.

I keep hearing the same complaint from experienced developers — I’m old enough to count myself in that group, so I’m not exempt.

The complaint goes: people coding with AI don’t really “understand the code” they’re producing.

I’ll admit that sometimes that’s true. Most of the time, when someone says this, you are hearing a group of professionals watching their role as intermediaries get smaller, and reacting accordingly.

What developers actually were before AI

I started my career on Wall Street. My job was to be a bridge. On one side: journalists and traders who understood markets. On the other hand, the technology that drove them. Neither side spoke the other’s language, and I was the translation layer — Teach me about options, and I’ll write a tool to track performance.

Then I moved into education. At Scholastic, I worked with educators who knew exactly how children learn to read. My job was to take that knowledge and turn it into software. Again: bridge. The educators described a concept, a product, a need. A handful of developers, myself included, translated it into something the machines could run.

That’s what developers were. In most companies, in most industries, that’s still what they are. They are not the people with the deepest understanding of the problem. They are the people who could hear that understanding and make it executable.

Keep that in mind when you hear engineers celebrate “citizen developers”, because it matters for what “citizen developer” actually means.

The Guest Calling out the Guests (Image Assist from ChatGPT)

Who named the citizen developer?

The term has been floating around enterprise circles for a few months. Someone stands up, celebrates the rise of the citizen developer, and acts like they’re announcing something liberating. Look at who’s usually doing the celebrating, and you’ll notice it’s almost always developers and engineers — the credentialed class, generously naming the people who are starting to drive innovation without asking for permission.

The term has some issues — it’s patronizing and a bit backward.

“Citizen,” used as a modifier, means layperson. Non-professional. Someone approximating what the real credentialed class does without having earned the credentials. We say “citizen journalist” to mean someone who isn’t actually a journalist. When a developer calls someone a “citizen developer,” they’re saying the same thing: you’re doing a version of the real thing. You’re a guest in our house.

But here’s what the framing gets exactly backward.

When I was at Scholastic, the educators weren’t citizen developers. They were the actual experts. They knew reading acquisition, cognitive development, and instructional design. My team and I were the ones approximating — playing at understanding their domain well enough to translate it into software. We were the “citizen educators.”

On Wall Street, the traders who asked me to build a portfolio tracker weren’t citizen developers. They understood the markets. I understood the machines. Nobody called me a “citizen trader,” but that’s what I was.

The “citizen developer” label frames domain experts as amateurs approximating a technical profession. The reality is that developers were always amateurs approximating domain expertise. The educators were in their own house. The traders were in their own house. We were the guests who happened to know how to wire the plumbing.

And now the plumbing is getting easier to wire.

The shift isn’t more people using technology

When people talk about the democratization of software, they usually mean that technical tools are becoming easier to use, so more people can use them. Lower the barrier, widen the tent, let the citizens in.

What’s actually happening is that professionals are realizing they might not need the intermediary at all. The educators I worked with at Scholastic didn’t dream of learning to write code. They dreamed of being able to describe what they needed and have it exist. The traders on Wall Street didn’t want to understand deployment pipelines — they wanted a system that tracked their portfolio. They were forced to go through people like me because there was no other path.

That forced dependency is what’s ending. Not because domain experts are becoming technical in the old sense, but because the tools are starting to meet them where they already are. An educator who can describe a concept to an AI and get a working prototype back isn’t becoming a developer. They’re finally getting a tool that doesn’t require them to become one.

To make it clearer — when we call these people “Citizen Developers,” that’s not what’s happening. They are not our guests in the technical space — instead, we are starting to realize that the technology has shifted the home-field advantage. We are their guests, and they might decide that we’re unnecessary.

What’s actually ending

The priesthood isn’t ending because civilians are learning the rituals. It’s ending because the people who were always the real experts are finding ways to stop needing a translator.

You still need people who understand systems deeply — operators, infrastructure people, security people, the grim veterans who know how a production incident turns into a blame redistribution meeting. That work doesn’t evaporate because an LLM can scaffold a service.

What’s ending is the idea that domain experts have to route their knowledge through a technical class before it can become software. The educators at Scholastic knew what they needed. The traders on Wall Street knew what they needed. For decades, getting it built required going through people like me — not because we understood the problem better, but because we were the only ones who could talk to the machines.

That’s what’s changing. Not development. The mandatory detour through people who were, if we’re honest, always just visiting someone else’s domain.

The Headcount Delusion

Tim O'Brien — Mon, 15 Jun 2026 13:01:06 GMT

You can’t convert developers to tokens, and anyone trying is selling you something

There’s a new kind of pitch making its way through boardrooms, and it goes something like this: stop thinking about headcount and start thinking about token volume.

One developer equals X billion tokens per month. Replace the developer, buy the tokens. The math is clean, the spreadsheet looks great, and the decision almost makes itself.

It’s also completely wrong. Not wrong in the way that reasonable people can agree to disagree. Wrong in the way that reveals you don’t understand what software developers do, or how token costs actually work, or both.

The False Equivalence of Developer Tokens (Image Assist from ChatGPT)

The token equivalence problem

Jensen Huang appeared on a popular podcast last month and told everyone they should be burning at least $250,000 in tokens per year, or they’re not doing real work. That’s not a productivity benchmark. That’s a GPU sales target dressed up as career advice. And it’s exactly the kind of thinking that leads to the headcount delusion — the idea that you can just convert a developer to a billion tokens a month and call it a replacement.

You can’t. Developers aren’t interchangeable token consumers. Software development is like medicine — there are specialists, focus areas, and different kinds of work that require different approaches. A frontend engineer optimizing render performance doesn’t have the same token needs as a backend engineer designing a distributed system. A DevOps person automating deployments doesn’t use models the same way someone architecting a data pipeline does. For anyone to show up and say “one developer equals X tokens per month” reveals they don’t understand the work. That’s 100% not how it works.

Token Consumption Varies with the Task

I’ll use my own token consumption as an example. I might burn through a billion tokens in a week if I’m focused on an intense rewrite or a rearchitecture across several projects. These are the projects that often demand parallel, agentic development approaches. I might be asking a high-priced frontier model to analyze years' worth of data, conduct research, and design a new UI experience.

Next week, I might not be working on a task that requires large-scale data analysis and aggregation. I’ll still use the same tools, but I’ll dial them down to models better suited to the task at hand. I’ll still be using agents, but I might be bumping 80% of the work down to a lower-cost model and driving testing or quality work.

The larger point I’m trying to make here is that I might go from consuming a billion tokens one week to 200M the next or less. This isn’t an area where you ever want to measure dollar spend as a KPI for a programmer. If you are doing that, you are measuring the wrong thing (and many companies are doing this)

The people pushing this framing talk about token volume like it’s a substitute for labor. It’s not. Tokens are what an AI model consumes when you ask it a question. A developer is a person who knows which questions to ask, when to ask them, and what to do with the answers. Conflating the two isn’t an oversimplification. It’s a category error.

The token cost problem

Even if you accepted the framing — and you shouldn’t — the math still doesn’t work, because token costs aren’t fixed. They vary wildly depending on which model you’re using. A prompt to Haiku costs a fraction of a cent. The same prompt to Fable 5 costs orders of magnitude more. And the models being pitched to executives as replacements for expensive developers are always the most expensive ones — the frontier models, the ones that supposedly can do what a senior engineer does.

Those are the models with the thinnest subsidy and the highest real cost. The $200/month subscription price doesn’t reflect what the inference actually costs. It reflects what venture capital allows the company to charge. When the subsidy ends, the real cost shows up. The token volume you budgeted at subsidized prices suddenly costs three, five, or ten times as much as you planned.

And token costs aren’t going to zero. The cheap models are getting cheaper, yes. But the frontier — the model you were told could replace your senior developer — is always going to be expensive. That’s the nature of the frontier. Today’s frontier becomes tomorrow’s commodity, but the price of being on the cutting edge stays high.

The ecosystem blind spot

There’s another problem. Most companies I talk to are using one provider. Claude only, or ChatGPT only. They’ve picked a lane, and they stay in it. And their ideas about token volume and cost are completely shaped by that one ecosystem.

When you only use Claude — Opus, Sonnet, Fable, Haiku — you’re living inside Anthropic’s reality of capabilities and costs. Move over to OpenAI, and one thing immediately shocks you: the token efficiency is different. I’ve found that GPT-5.4 and 5.5 burn roughly half as many tokens as Opus for comparable work. That’s not a small difference. That’s your token budget doubling or halving, depending on which provider you default to.

As the models evolve, you see Anthropic putting more emphasis on token efficiency. You see OpenAI optimizing differently. And then there are the newer models emerging — models from providers most companies haven’t even tried. My biggest fear is that not enough people are experimenting with them. If your entire cost model is built on one provider’s pricing, you’re not making decisions based on the market. You’re making decisions based on habit.

The people who push this

The people telling you to think in tokens instead of headcount fall into two categories: people who don’t know what software developers do, and people who are selling you something. Sometimes both.

If you’ve never built software, the idea that a developer’s output can be measured in tokens sounds reasonable. You type a prompt, you get code, the code works — what’s the difference? The difference is everything that happens before and after the prompt. Understanding the existing system. Knowing what not to change. Recognizing when the model’s output looks right but is subtly wrong in a way that will cause problems three months from now. That’s not token volume. That’s expertise.

And the people selling you something — the tool companies, the model providers, the consultants who get paid by the transformation — they have every incentive to make the comparison look simple. Simple comparisons lead to quick decisions. Quick decisions lead to signed contracts. The complexity shows up later, when the tokens are spent and the team is gone.

The restructuring you can’t undo

Here’s what makes this dangerous: the staffing decisions are being made now, based on a comparison that doesn’t hold. Companies are restructuring based on the idea that token volume can replace headcount. When the real costs show up — and they will — the people are already gone.

You can cancel a subscription. You can’t un-layoff a team. The institutional knowledge, the domain expertise, the relationships between people who’ve worked together for years — that doesn’t come back because you realized the token bill was higher than you expected.

If someone tells you to convert your developers to token volume, ask them which model they’re pricing those tokens on. Ask them whether that price reflects the real cost of inference or a VC subsidy. Ask them what happens to the math when the subsidy ends. And ask them who, exactly, will know which prompts to send when the people who understand the system are working somewhere else.

And, if the answer is, “we can just ask the model how to fix the system,” you are talking to someone who is about to understand the Headcount Delusion.

Over in the Slop Codex, both the Warlock of Staff Reduction and the Solo Sovereign turn up in the Executive NPC chapter of Volume 2, and the Headcount Delusion is essentially their shared scripture. The Warlock wields token math like a scythe to justify the reorg he was already planning. The Solo Sovereign cites it as proof that headcount was always the bottleneck. Neither of them is interested in the part where the tokens run out and the people are already gone.

The Subsidy Ends

Tim O'Brien — Fri, 12 Jun 2026 14:39:31 GMT

AI tools are burning VC money to give you a deal that can’t last

Continue reading on Medium »

Your AI Agent Shouldn’t Be You

Tim O'Brien — Thu, 11 Jun 2026 13:06:00 GMT

Why giving agents your credentials is the next major security disaster.

Everyone setting up OpenClaw and Cursor and Windsurf and whatever’s next — they’re connecting agents to their GitHub, their AWS, their everything. And they’re using their own personal credentials to do it.

Not a bot account. Not a scoped token. Their own identity. The same one they use for everything.

The Agent with all the Keys (Image Assist with ChatGPT)

And it’s not just developers. Most people I know — including those I wouldn’t call technical — are playing around with OpenClaw or something similar. My neighbor across the street is using it to plan gardening tasks. Another friend is using it to manage investments. These are normal people who heard about AI agents and thought, sure, I’ll try that.

Ask them if they’re using personal access tokens for GitHub, and their eyes glaze over. They don’t know what a PAT is. They don’t know what scope means. They gave an agent their credentials because the setup screen told them to, and they moved on.

Security is something people talk about as something they should do. Rarely is it something they are doing. The gap between “I know I should lock this down” and actually locking it down is where everything goes wrong.

Let’s be clear about what’s happening right now, in the tens of trillions of tokens being consumed by agents every single day running in people’s homes: almost nobody has thought through security. (BTW, most people’s passwords are still a variation of “password.”) And now those same people are giving agents keys to their entire digital life… your Gmail, maybe your brokerage account, your kid’s school logins.

If you’re connecting an agent to anything, it needs a limited scope. If you aren’t doing that, stop what you’re doing and fix it. This isn’t theoretical. This is going to go badly for a lot of people soon.

The PAT problem

If you have recently been intoxicated by the siren song of agentic development… an agent needs to push a commit to GitHub. I’ll bet you a million dollars you gave it a personal access token that had admin rights to do anything. (Why not? It’s easy.)

That token now has the same permissions as your account. GitHub will let you create a fine-grained PAT — scope it to a repo, restrict it to specific permissions, set an expiration. The option exists.

But have you tried it? Have you taken the time to think about what permissions this agent needs? It’s a terrible experience. So many menus, so many clicks, so many dropdowns, so many decisions about scope names you’ve never heard of, that by the time you’re done you’ve forgotten why you started. Creating a properly scoped PAT is impractical. Nobody does it.

That’s not user error. That’s a system that punishes you for being careful.

So people fall back to the broad token, which has full access and no expiration.

The dangerous path is the easy path. The safe path is an obstacle course.

That’s the backward reality of software security, and more people take the easy path, check all the boxes, and grant everything access to everything.

Shai Hulud gathered credentials that still work…

It’s been several months since Shai Hulud, and what I can confirm is that most developers still think it’s something from Dune. And by that I mean most people are blissfully unaware that Shai Hulud refers to a vulnerability in a dependency on npmjs that was designed to steal credentials. And companies are still getting surprised by credentials that were likely stolen months ago.

This is what happens when you give an agent a token with full access and tell it to keep going. It keeps going. It doesn’t stop at the boundary you imagined. It doesn’t ask whether it should. It has the credentials, the access, and a task to complete.

We’re about to be surprised by how many people get owned because they gave an agent a key that opens every door and then told it to run.

The “spend vulnerability” problem

And it’s not just permissions. It’s money. If you are not careful, you’ll be on the hook for thousands (or tens of thousands) of dollars when someone compromises your agent.

Every once in a while, someone posts a screenshot showing they were hit with a $10,000 charge for using OpenRouter or another provider’s API. An agent got loose, or a key was leaked, or something just kept running and running. The damage is real, and it’s financial.

This is the same vulnerability in a different direction. An API key without a spend limit is a key to a vault with no lock. If you’re using any service with an API key — OpenRouter, OpenAI, Anthropic, any of them — you need to apply a limit to how that key can be used. Daily cap. Monthly cap. Something. Anything.

And here’s the other side of this: if you’re building a service that gives people API keys, you need to build in support for limits. This should be mandatory. Not a feature request. Not a premium-tier add-on. A key without a spend limit is a liability you’re handing to your users, and when it goes wrong, it’s your screenshot they’re posting.

What agents actually need

The permissions model agents need isn’t exotic. AWS figured this out years ago — with IAM roles, temporary credentials, and scoped policies. You don’t give a service your root account. You give it a role with a policy that says exactly what it can do, and you assume that role for a limited time.

Agents need the same thing:

Scope to a path. Not just a repository. A directory. A file. The agent refactoring auth shouldn’t be able to touch billing.
Scope to a time window. Two-hour task, two-hour token. Not because you set a reminder, but because the system understands duration.
Scope of an operation. Read, write, create a branch, open a PR, merge — these are different things. An agent that needs to read code shouldn’t have merge permissions.
Scope of a task. When the task completes or fails, the permission evaporates. Not eventually. Immediately.

None of the major development platforms can express this right now. That’s the gap.

Two things you can do today

While we wait for platforms to catch up:

Scope your agent credentials. Even if the UX is terrible. Even if it takes twenty minutes. Create a fine-grained PAT on GitHub, restrict it to what the agent actually needs, and set a short expiration period. If you can’t be bothered, at least use a separate bot account (or organization) so the damage is contained when — not if — something goes wrong.
Set a spend limit on every API key. Daily cap, monthly cap — whatever the service supports. If it doesn’t support limits, that’s a red flag. An API key without a budget is a liability. When you set your daily limit, predict a dollar figure greater than what you expect to spend and less than what you’d regret if someone compromised your agent.

The cloud people already learned this lesson the hard way. They moved from root credentials to IAM roles, from persistent keys to temporary assumed roles, from “who are you” to “what are you allowed to do right now.” You need to move to a model closer to Zero Trust for your agents.

The development tools need the same shift. The question isn’t who the agent is. The question is what it’s allowed to do, for how long, and for how much money. And when the task is done, the access should disappear — not because someone remembered to rotate a token, but because the system understood that access can be transitory.

Every agent you give an unscoped credential to is a bet that the gap won’t matter. It will, and I promise this mistake will show up in a high-profile compromise any day now.

Author’s Note

Many of the issues in this post are explored in my novella Essential and, more directly, in its predecessor, Contingent. While Contingent is a work of fiction, one of its central themes is that our systems of governance, identity, permissions, and accountability were designed around human actors — not autonomous agents operating at machine speed and machine scale.

The security problems we’re beginning to see with AI agents today are a small example of that broader challenge. When an agent can act on our behalf, spend money, access information, and make decisions, the question is no longer simply “Who are you?” but “What are you allowed to do, for how long, and under what constraints?” That’s a question that sits at the heart of both modern security architecture and the regulatory challenges we’ll have to solve going forward.

“Yes, and…” Good Architecture is Improv

Tim O'Brien — Thu, 30 Apr 2026 13:20:39 GMT

“Yes, and…” Architecture is Improv

Great software architects understand how to protect the space and move the scene forward.

Software architecture, at its best, is not a solo performance. Neither is improv. But both depend on listening, timing, trust, and the ability to make the next move possible for everyone else in the scene.

That is what I keep coming back to with Joyce and Byrne Piven. The actors who came out of their world were not trained to walk onstage and dominate the room. They were trained to listen, respond, protect the space, and make the other person better. (And if you don’t recognize the names of these acting teachers, you’d recognize their students. They trained a generation of actors in acting and the art of improvisation.)

My wife taught at the Piven school for decades, so I had a front-row seat to how seriously they take that work — how much repetition, attention, and discipline it takes to do something that looks effortless, and how much real effort goes into improvising well instead of just reacting.

Piven Theater Workshop (AI Assist from ChatGT 5.5)

“Yes, and…” is one of the central disciplines of good improv. It’s not a trick for being clever — it’s the rule that keeps a scene alive.

“Yes” means you accept what your partner just established as real, rather than blocking it or steering it back to your own idea.
“And” means you add something that builds on it and moves things forward.

Once you see it, you’ll notice it everywhere — the best scenes are just people repeatedly accepting and building, accepting and building. The discipline is that you don’t get to reject, reset, or dominate; you have to contribute in a way that strengthens the shared reality.

That is also the work of a good software architect.

And it’s exactly where a lot of them go wrong. I’ve watched companies bring in an architect who immediately walks onto the stage and starts performing — telling stories about how great they’ve been, what they built ten or twenty years ago, how many times they’ve “seen this movie before.” It’s like hiring an actor who stops the scene so everyone can appreciate how good an actor they are. Meanwhile, a room full of very capable engineers sits there, politely, waiting for it to end so they can get back to the work. And every time I write “the work,” I hear Joyce’s gravelly voice from the workshop, reminding everyone that the only thing that matters is what’s actually happening in the scene right now.

It’s these moments that produce the self-impressed “Kenneth Branagh energy” — technically impressive, very aware of itself, and completely missing what the scene needs.

A good architect does not walk into the room to prove they still matter. They walk in to help the team see what is actually happening. They slow down the first plausible answer. They ask what problem is being solved, who has to operate the system, who pays for it, who explains it, and who gets paged when the diagram meets production. The good ones understand the power of “Yes, and…”. The self-important ones just demotivate the rest of the team.

Generative AI makes this work more important because it produces plausible answers all day. It can write the plan. It can draw the boxes. It can name the services. It can describe agents, retrieval, orchestration, evaluation, and human review in a voice that sounds like the system has already been thought through.

But someone still has to ask whether the team can actually run it, and that involves looking around the stage, observing, and acknowledging that you aren’t the only one delivering lines.

The architect whose instincts were excellent twenty years ago may still have excellent instincts. Experience matters. Pattern memory matters. Knowing which ideas have already failed under different names can save a team real pain, but you don’t communicate that knowledge by interrupting people with reminders of how great you were a decade ago.

Like good acting and good improv, you have to “do the work” to stay in contact with the present.

The Piven lesson matters here: you bring your training onto the stage, but you do not bring a script that everyone else has to perform. The scene is happening now. The offer is happening now. The constraints are different now.

A good architect understands that.

They know when to say, “Yes, and.”

Yes, this demo is promising, and we need to understand the cost curve.
Yes, the model can generate the workflow, and someone still has to own the failure mode.
Yes, retrieval may help with missing context, and we need to know who maintains the source material.
Yes, the agent architecture looks elegant, and we need to know what happens when two systems disagree.

That is not negativity. That is a contribution.

A good architect is not there to win the meeting. They are there to enable the work. They help Finance understand the cost curve. They help Security name the real boundary. They help Legal understand the data flow. They help engineers separate what is reversible next sprint from what will shape hiring, incident response, vendor contracts, and budgets for years.

They do not make the room about themselves.

They protect the room so the team can make a better decision.

The architects who matter now will not be the ones reminding everyone they used to be a big deal.

They will be the ones present enough to help with the performance work.

The Meeting After the Scare Story

Tim O'Brien — Wed, 29 Apr 2026 14:28:28 GMT

Model Providers Using Fear to Motivate Sales

A pattern is forming around AI risk stories, and I think people are being too polite about it. The pattern is simple: emphasize a risk, make the risk sound irresponsible to ignore, then watch as every serious organization schedules a meeting. The meeting becomes the proof that the risk is real. The urgency becomes the product.

A few weeks ago, the story was Mythos: the model so dangerous, so capable, so good at security work that it could not really be released. I wrote then that a withheld model is almost the perfect narrative object.

Nobody outside the circle can verify it, which means the story expands to “fill the room.” The model does not need to be fake for the framing to work. It only needs to be vivid, scary, and entirely unavailable.

And it worked. Mythos immediately created institutional motion. Boards had to ask whether they understood the risk. Large organizations had to talk about Anthropic. Government people had to ask whether they were behind. The White House, the Department of Defense, banks, regulators, infrastructure companies, and everyone with a compliance department had to pay attention. The announcement did what a good risk narrative does: it scheduled the next meeting.

Scared of the models? Call us and we’ll talk. (AI Assist from ChatGPT 5.5)

Then, about a week later, Anthropic released Opus 4.7.

I am not complaining about Opus 4.7 as a model. I have been happy with it so far, although I have been using xhigh thinking, and that matters. I switch between GPT-5.5, Sonnet, Opus, and whatever else is useful for the job. Right now, my surprise favorite is GLM 5.1 because it gets the job done for a fraction of the price, especially on coding and analysis tasks.

The Opus 4.7 launch was not a disaster, but it was rocky in the normal frontier-model-release way: new reasoning controls, changed defaults, xhigh suddenly mattering, prompt harnesses needing adjustment, and a tokenizer change that meant the same input could map to more tokens. That kind of thing matters when you use these systems at scale. It is not a cute implementation detail. It is a cost detail, and cost details eventually become political details.

This is where Mythos starts to look less like a standalone announcement and more like a stage setting. Opus 4.7 arrived wrapped in the usual enterprise launch kit: logos, quotes, benchmarks, early-access validation, partner proof, customer proof, the whole synchronized confidence machine. And Mythos was there in the benchmark results, reminding people that the next model is too good to use.

This made the normal friction of a model launch feel smaller than the monster model lurking behind the curtain, and it also helped to offset the friction of the release, which generated friction.

I might be too skeptical. Fine. I am not anti-Anthropic. Opus is good. Sonnet is good. I use these models. But being happy with the product does not require pretending the marketing is pure.

Now we have a New York Times story about AI and biological weapons that has the same structural feel. The article is ominous in exactly the way it is designed to be ominous. Scientists shared transcripts showing chatbots giving dangerous biological guidance. A Stanford microbiologist and biosecurity expert described being shaken by an AI system’s responses, but declined to name the chatbot because of a confidentiality agreement with its maker.

That detail is worth noticing. A person is apparently bound by an NDA tightly enough that he will not name the model, but not so tightly that the broad shape of the risk cannot become the lead of a national news story.

I am not saying the article is fake. I am not saying the risk is fake. I am saying the structure is familiar: high-drama risk, low public verifiability, institutional urgency, and a reason for every organization with a budget and a compliance department to ask whether it is prepared.

The story is framed as a general AI risk story. OpenAI appears. Anthropic appears. Google appears. Synthetic biology appears. Public health appears. National security appears. This is a wide-angle panic shot, and wide-angle panic shots are useful because every institution can find itself somewhere in the frame.

But pay attention to the vendor gravity. Google’s Gemini gets named in some of the more vivid examples, and the story says one report found Google’s latest model worse than other leading bots at refusing high-risk biological prompts. In a market this competitive, that detail is not neutral. I believe something motivated that callout in an article about unverifiable danger.

The public health version may be even more effective than the cybersecurity version because the buyer universe is bigger. Mythos spun up banks, infrastructure providers, software companies, regulators, and defense-adjacent organizations. An AI bioterror story spins up hospitals, universities, research labs, NIH-adjacent institutions, public health agencies, pharmaceutical companies, biodefense contractors, procurement offices, and every committee and agency that has ever used the phrase “dual use.”

Those organizations have money, and they will all be scheduling meetings to discuss how to “get ahead” of the threat outlined in this NYTimes article… today. Those meetings will be followed up with meetings with vendors and model providers. The cycle continues.

The Times story also contains the counterargument: chatbots often present information already available on the internet, and actually making a deadly virus requires serious hands-on expertise. That is not a dismissal of the risk. It is the missing frame.

AI may reduce friction. AI may package knowledge. AI may make dangerous searches feel conversational instead of difficult. AI may help experienced actors move faster. All of that matters, and I do not want public models casually handing out dangerous biological guidance. But AI did not invent the underlying knowledge. People motivated to spread pathogens were not waiting for Gemini to explain biology. They’ve had access to this information for decades already.

This is the same thing that bothered me about Mythos. Sophisticated vulnerabilities existed before Mythos. Offensive security teams existed before Mythos. Nation-state actors existed before Mythos. Zero-days existed before Anthropic gave a scary name to a model. The new thing may be speed, scale, interface, workflow, or automation. Those are real changes. But the industry keeps acting as if the danger itself was born the moment a model company needed everyone to pay attention.

I keep thinking about SciFoo in 2008. I have to be careful because it was a Chatham House Rules event, and I will honor that. No names, no transcript, no operational details. But I can talk about the shape of the room.

There was already concern almost twenty years ago about biohacking, cheaper lab techniques, PCR, and the possibility that capabilities once limited to serious institutions could become cheap enough for hobbyists, small groups, or poorly supervised labs. The concern was not new. That is the point. The same class of people now acting as if AI suddenly introduced democratized biological risk were talking about versions of the same problem in private rooms a generation ago.

What I remember is that the technology people in those rooms (the same ones that were at the inauguration) were much more interested in talking about the benefits than the risks. Risk was something to manage narratively, not structurally. Privacy concerns were treated like an annoying interruption. Raise your hand and ask whether anyone had thought through the consequences of some shiny new capability, and you could feel the temperature drop.

I asked one of those questions and received a stern stare and a cold response myself.

That glare was educational. It taught me that these rooms are not neutral containers for responsible thought. They are status systems. Some people are allowed to speculate wildly about the future. Other people are expected to listen and be impressed.

So when I read today’s AI bioterror story, I do not just read it as a warning about models. I read it as another example of an old elite habit: discovering a risk only when market conditions make it useful to do so.

The risk was there when cheap biology tools were the scary new thing. The risk was there when synthetic biology communities were scaling. The risk was there when cloud labs and DNA synthesis companies changed the landscape. The risk was there before every model provider discovered that safety stories make excellent lead-generation tools. Now, AI is the interface, so AI becomes the headline.

That does not make the headline wrong. It makes it incomplete.

My point is not to make everyone feel safer about AI. The Times article reports legitimate concerns about jailbreaks, older models remaining available, and systems producing answers that experts found troubling. Fine. Improve safeguards. Improve screening. Improve model evaluations. Improve public health readiness. Do the work.

Just do not pretend the timing is irrelevant.

My bet: a model provider is about to meet with NIH, HHS, a major public health institution, a biodefense contractor, or a consortium of research hospitals, and this article will be sitting in the deck. Maybe not slide one, but close. The message will be familiar: the risk is here, the window is closing, the public is alarmed, and your organization cannot afford to be behind.

The mistake is not taking AI risk seriously. The mistake is letting the people who benefit from urgency define the only acceptable response to urgency.

So yes, pay attention to AI biosecurity. Pay attention to model behavior. Pay attention to safety theater. But also pay attention to the meeting that follows the scare story, because that is usually where the real product is.

The Double Bubble Gets Worse With AI

Tim O'Brien — Mon, 27 Apr 2026 13:31:24 GMT

You can’t retire a system with a team that no longer exists.

Every migration has a period where you’re paying for two systems at once — the old one still serving production, and the new one being built. You can’t fully commit to the new system until it’s ready. You can’t turn off the old one until the new one is proven. So you run both. You pay for both. I’ve started calling this the double bubble, and the math on it is brutal and predictable. Almost every rewrite underestimates it.

The narrative I keep hearing now is that generative AI is going to fix this. The tools can read old code, translate between stacks, regenerate test coverage, infer business logic from legacy comments. The double bubble, the argument goes, is about to get a lot shorter.

I don’t think so. I think it’s about to get longer, more expensive, and harder to end.

What AI actually speeds up, and what it doesn’t

To be fair, there’s a sliver of truth in the optimistic pitch. AI is genuinely useful for parts of a migration. It can help engineers read unfamiliar code. It can translate idioms from one language or framework to another at rates a human team can’t match. It can generate test scaffolding around legacy behavior nobody documented. Some of the early-stage work on the new system — the boilerplate, the translation layer, the straightforward ports — really does go faster.

But those aren’t the parts that determine how long the double bubble lasts.

The double bubble doesn’t end when the new system works. It ends when the old system gets turned off.

That’s a different project, and it’s the project AI is worst at helping with. Turning off the old system requires knowing every downstream consumer, every batch job that hasn’t been touched in four years, every customer integration that nobody documented because “Dave knows how it works.” It requires someone with enough institutional standing to say “we’re cutting this over on Thursday” and enough authority to catch whatever breaks afterward. AI helps a little with the first part. It does nothing for the second.

So the new system gets built faster. And then it sits in parallel with the old one for just as long as before. Sometimes longer.

The cost side doesn’t get better. It gets worse.

Let’s do the numbers. Say the old system costs $100k a month to run. The new system, once fully cut over, will cost $80k. During the transition, you’re running both — roughly $180k a month. That’s the baseline double bubble, before AI enters the picture.

Now add generative AI. What actually gets cheaper?

Infrastructure cost for the two environments? No.
Licensing on the old system? No.
Operations headcount to run both in parallel? No.
The doubled monitoring, the doubled backups, the doubled on-call rotation? No.
The daily reconciliation between two databases? It actually gets worse — now you’ve got an AI-generated mapping layer sitting in the middle, which is another thing to maintain and explain when it misbehaves.

What does get cheaper is some fraction of new development. But development labor has never been the dominant cost of a double bubble. The dominant costs are infrastructure overlap, operations duplication, and the ongoing feature drift between the two systems. AI doesn’t touch those.

Here’s the part that surprises people: AI can actually extend the overlap. If the new system is being built faster, the feature parity gap with the old system widens faster too. Because the old system isn’t standing still during the migration. Customers still file bugs, regulators still change rules, partners still break integrations. Faster new-system development without a matching increase in retirement capacity just means you’re building a bigger new system while the old one keeps changing underneath you. The target keeps moving. The parallel run stretches.

The retirement trap

This is the part I don’t see discussed in AI migration decks, and it’s the one that actually matters.

Executives look at AI-assisted productivity and draw an obvious conclusion: if one engineer can do the work of three, we can reduce headcount. I’ve written about why that reasoning is usually wrong in general. During a migration, it’s specifically wrong in a way that will eat your budget. (Not immediately, but next year when someone asks, “Why are we still paying for the old system?”)

The people execs are most tempted to cut — the senior engineers who know what the legacy system is actually doing, the operator who remembers the five production incidents that shaped the current architecture, the analyst who’s been running the reconciliation process since before anyone in the current leadership chain joined — are exactly the people you need to retire the old system. Not to build the new one. To turn the old one off.

AI can help somebody understand legacy code faster. It can’t be the person accountable for a cutover. It can’t be paged at 2 a.m. when shadow traffic surfaces a behavior nobody anticipated. It can’t sit in the meeting where you decide whether to defer retirement by another quarter because customer X is still integrated with the old endpoint and nobody remembers why that integration exists.

When you reduce staff during a migration because AI is helping you build faster, you haven’t shortened the double bubble. You’ve structurally guaranteed you can’t close it.

You can’t retire what no one on the current payroll understands.

And so the old system keeps running. Not because anyone decided to keep it, but because there’s nobody left with the authority, the context, and the bandwidth to turn it off. The three-month parallel run becomes a year. The year becomes three. The “temporary” overlap becomes a budget line item that quietly shows up in every annual plan until someone eventually writes a new rewrite proposal, which starts the whole cycle again.

New risks AI introduces into the overlap

AI doesn’t just fail to shrink the double bubble. It adds new categories of risk to the period while it’s happening:

Generated code that nobody understands well enough to maintain. Speed of production is not the same thing as speed of comprehension. The person who has to debug the new system in the middle of an incident is often not the person who “wrote” it.
Parity drift you can’t detect without reading everything. When translation is partially automated, subtle behavioral differences slip through. The old system rounded halfway to even; the new one rounds up. The reconciliation spreadsheet starts showing pennies off. Then dollars.
Integration knowledge that never gets captured. AI tools are good at reading code. They’re poor at capturing the conversations, tickets, and post-mortems that explain why the code is shaped the way it is. That context leaves with the people you let go.
A tempting story for leadership. The worst risk is organizational. “AI is helping us move faster” is a comfortable sentence to repeat on earnings calls. It’s also an excuse to avoid reporting the actual time-to-decommission, which is the only number that matters.

What to actually do

If you’re running a migration right now and feeling pressure to fold AI into the cost story:

Budget the double bubble explicitly, and don’t model AI as a reduction to it. Treat AI as a possible accelerator for new-system construction, nothing more. If it turns out to also shorten retirement, that’s upside. Don’t plan against it.
Measure time-to-decommission, not time-to-launch. Migration projects that report progress based on new-system milestones always look better than ones reporting how many legacy services are actually off. The second number is the only one that ends the double bubble.
Protect the retirement team. (Or don’t ask the retirement team to retire.) The people who turn off the legacy system are not fungible. They carry context that doesn’t live in the code. If you cut them because of an AI productivity narrative, you will not finish the migration. You will just run both systems indefinitely and find a new vocabulary for it.
Freeze the old system earlier. This was true before AI. It’s more true now. Every feature added to the old system during migration is another thing a faster new system has to chase.
Don’t make cost savings promises you can’t keep. They will come back to haunt you.

The Great Announcement Cycle

Tim O'Brien — Mon, 20 Apr 2026 12:31:01 GMT

Everyone Loses Their Mind, Everyone Forgets

It’s been about a week and a half since Anthropic announced Mythos.

You remember Mythos. The “most dangerous model ever built.” The one that supposedly escaped its sandbox and sent a menacing email. The one that helped knock cybersecurity stocks down seven percent. The one that had federal officials doing the usual public-warning choreography. Then, right on schedule, OpenAI answered with GPT-5.4-Cyber, because apparently no AI news cycle is complete without a counter-programming move from the other side of the street.

And before that could even cool off, Anthropic came back with Claude Opus 4.7. In Anthropic’s own telling, Opus 4.7 is the first broadly available model that it would use to test the cyber safeguards it wants to apply to eventual Mythos-class releases.

One week, they get everyone staring at how fast the frontier is moving and how dangerous this stuff might be; the next week, they remind everyone they can still ship a polished, mainstream upgrade on command. That’s not just product cadence. It’s marketing. Mythos exists, but it was also pre-marketing for the money-maker — Opus 4.7.

That’s about how long these announcements remain “history-making” before they become dropdown options, API version numbers, or old screenshots in someone’s keynote deck.

This is the pattern now, and it’s worth looking at directly because people keep talking about these launches as if each one is a settled turning point. It usually isn’t. It’s an episode in a market that has trained everyone — vendors, journalists, investors, random guy on X— to confuse coordinated attention with significance.

The AI “Great Announcement” (AI Assist by ChatGPT 5.4)

You can watch the cycle repeat almost on a timer.

March 2023. OpenAI launches GPT-4. Everything changes. The end of work. AGI, probably. A year later, GPT-4 is the model people complain about for being slow, and then it gets superseded, repackaged, and retired. The historic rupture becomes old product inventory.

December 2023. Google launches Gemini. Now we’re told there’s an arms race. Maybe peak AI hype. Maybe a new phase of competition. Then Gemini starts generating historically deranged images, Google has to apologize, and the launch narrative evaporates because the next incident and the next model are already on deck.

February 2024. OpenAI previews Sora. Text-to-video is going to remake media. It doesn’t even launch publicly until much later, and by April 2026, the Sora web app is being discontinued. One day, there’s a safety framework. The next day, the product is gone. So much for the inevitable future.

March 2024. Anthropic launches Claude 3. “ChatGPT killer,” because apparently every model release requires a professional wrestling storyline. Opus does well on benchmarks. People lose their minds. Claude 3 was and is a serious model. That’s not the issue. The issue is that the launch itself was history-making hyperbole.

September 2024. OpenAI announces o1, the reasoning model formerly known as Strawberry in leak culture. Again: a new era. Again: overstatement. Their own PM had to tell people to calm down. By the time the full version ships, it will already be normal infrastructure.

January 2025. DeepSeek R1. NVIDIA loses an absurd amount of market cap in a day, and the story instantly becomes Chinese AI overturning American dominance. Two months later, everyone is still spending like mad on infrastructure, and DeepSeek is releasing updates into a much less excitable press environment.

February 2025. Grok 3. “Scary smart,” naturally. It wins some benchmark rounds, gets the usual week of attention, and then takes its place in the growing pile of models that were supposedly about to reorder everything.

April 2026. Anthropic announces Mythos. The most dangerous model ever built. Then OpenAI ships GPT-5.4-Cyber. Then Anthropic follows a little over a week after Mythos with Opus 4.7, a cleaner general-release product that lets it turn the panic cycle into a competence signal.

Everyone wants to move to Opus 4.7 because it brings them closer to Mythos… this is starting to sound like a cult.

And here we are again.

Now, to be fair, the underlying systems do improve. This isn’t one of those “it’s all fake” arguments. The models are getting better. Capabilities do change. Some of these releases matter a lot if you actually build with them, buy against them, staff around them, or have to explain their risks to people who suddenly discovered the word “inference” three days ago.

But the announcement layer is different.

The announcement layer is now its own product.

It has branding, staged peril, benchmark theater, executive quotes, selective demos, safety framing, competitive signaling, investor signaling, and a supporting cast of newsletter writers and social-media amplifiers ready to explain why this particular release marks the dawn of the next age. Then, before the audience has even figured out what was real and what was a presentation, another company fires back with its own launch.

And it doesn’t stop at customers or developers. The announcement layer now bleeds directly into policy and bureaucratic leverage. In a fresh report on Dario Amodei heading to the White House to discuss Mythos access while Anthropic’s dispute with the Pentagon is still unresolved, The Next Web wrote, “The company’s commercial trajectory gives it leverage in the negotiation.”

That’s the whole game in one sentence. The same Mythos launch that drove the fear cycle is now part of the leverage cycle too: a model dramatic enough to dominate the feed, useful enough that agencies want it anyway, and strategically valuable enough to strengthen Anthropic’s hand in a fight over who gets access and on what terms.

That’s why these events now feel less like milestones and more like weather systems moving through the feed. Not because they mean nothing. Because there are too many actors with incentives to make every incremental improvement sound like a civilizational shift.

The vendors need narrative dominance. The press needs an event. Investors need a story. Commentators need urgency. And users, meanwhile, are left trying to figure out what actually changed.

That’s the part people skip over. The cognitive effort of sorting signals from promotions does not disappear. It gets dumped on the user.

If you are a developer, a buyer, a manager, or just someone trying to maintain a sane mental model of this market, you now have to treat every “most powerful,” “most dangerous,” “game-changing,” “reasoning breakthrough,” or “frontier leap” claim as opening testimony in a trial, not as a verdict. You wait. You test. You look at failure modes. You watch pricing. You see what sticks. You notice what gets quietly rolled back, renamed, or discontinued.

In other words, you do the boring work that the launch event is designed to make emotionally inconvenient.

That’s why I don’t find the Mythos/GPT-5.4-Cyber sequence especially surprising or even especially interesting as drama. It’s just the market showing you what it has become: a permanent counter-launch machine where no announcement is allowed to stand on its own long enough to become understandable.

The result is a strange inversion. The technology is moving fast, yes. But the faster it moves, the less useful the “Grand Announcement” becomes. By the time the market has finished shouting about one release, the practical question is already something narrower and more operational:

What does this actually do for the people who have to use it, secure it, pay for it, and clean up after it?

That’s the question worth keeping. Everything else is fireworks.

The Art of Moving On

Tim O'Brien — Sat, 18 Apr 2026 14:44:13 GMT

Management, AI, and Knowing When to Stop

Continue reading on Medium »