AI Operating System

What Monday morning used to look like

Six companies. Seven email accounts. Design reviews for one product, a compliance issue at another, a contractor onboarding pipeline that’s behind, a partner integration that’s broken, and a knowledge base that nobody’s updated in two weeks. The first two hours of every day were just figuring out what to pay attention to.

I’m a product and operations person. Fifteen years building systems that help companies scale. I know how to build operating models. But I was still the bottleneck — not because I didn’t know what to do, but because the operational surface area exceeded what one person could hold in their head.

Context chaos — overlapping information from six companies competing for attention — Six companies' worth of information competing for attention at a single point, with no clear organization or priority.

What changed

AI changed the game for everyone. Almost overnight, design systems, coding pipelines, business intelligence, workflow orchestration, product operations — all needed to be rethought from the ground up.

But AI out of the box didn’t actually help much. I’d ask it to do something operational and get back something generic — technically correct but useless in context. It didn’t know our brand standards. It didn’t know which contractor relationships were sensitive. It didn’t know that last week’s design review had already rejected the approach it was suggesting.

The problem wasn’t the AI. The problem was that nobody had done the information architecture work to make it useful. Product managers spend their careers doing something similar — figuring out what information people need, when they need it, and how to present it so they can act. The instinct is the same, even if the medium is different. We just never called it context engineering.

The first version was bad

My first attempt was to dump everything into one system — all the company context, all the rules, all the history. The output was unfocused, contradictory, and slow. It would mix brand guidelines from one company into deliverables for another. It would follow instructions from three months ago that had been superseded. It would confidently produce work that looked right but used the wrong framework for the wrong product.

The failure taught me something specific: AI attention isn’t uniform. Information at the beginning and end of the context gets more weight. The middle gets lost. Relevance beats volume. Ten focused lines outperform a hundred scattered ones.

What the system looks like now

I redesigned around four questions that drive every context decision:

What does the AI need to know to do this task? Not everything — just this task.
What’s most important, and does it load first?
What changes between uses, and how does it stay current?
What can I leave out?

The Four Questions — information filtered through progressive decision gates — Four questions filter raw information into focused context — each gate removes what the agent doesn't need for this task.

The architecture is a network of specialized AI agents, each handling a different business function. Each agent has its own identity, its own rules, its own domain knowledge, and its own correction history. They don’t share context they don’t need.

Business operations. One system tracks design system parity across applications — which components exist where, what’s current, what’s drifted. It manages brand imagery and workflow orchestration. Before, this information lived in six places and nobody’s head. Now it’s one query.

The bootstrap pattern. Every agent needs to know four things before it can do useful work: what rules it operates under, what role it’s playing, what it knows about its domain, and what it’s gotten wrong before. This is the context engineering layer. Without it, you get output that could have come from anyone’s ChatGPT. With it, you get an agent that knows the job, knows the standards, and knows what it got wrong last time.

The Bootstrap — agents loading knowledge layers in sequence — Every agent loads four knowledge layers before it can do useful work — from empty to fully active.

The build discipline. Plans get written before work starts. Reviews happen independently — with a fresh perspective, because self-review doesn’t catch blind spots. Implementation follows the plan. The correction loop only works if corrections get implemented correctly. A system that learns from its mistakes but implements the fixes sloppily just creates new mistakes.

The learning loop

In product, the learning loop is how teams get better — you ship, you measure, you learn, you adjust. It’s one of the most important principles in product management, and most teams do it informally at best.

With agents, I made a deliberate design choice to bake the learning loop into the architecture itself. Every agent has access to a shared learning database. When an agent gets something wrong, the correction gets logged — what happened, what should have happened, and why. Next session, that correction loads automatically during bootstrap.

The real design decision was making this work across agents, not just within them. When the product agent learns that a certain type of scope doc causes problems downstream, that learning is available to every agent that touches scope. When the dev agent discovers that a particular implementation pattern breaks across products, that correction informs the next plan review. The agents learn from each other.

The Learning Loop — corrections compound into systemic improvement — Corrections flow through the system — individual mistakes become patterns, patterns become rules, and agents learn from each other.

The system has built-in escalation. When corrections on the same issue start repeating, the system flags it — first as a pattern worth reviewing, then as something that should change the agent’s standing instructions, and eventually as a sign that the foundational assumption was wrong. The pipeline from individual mistake to systemic improvement is structural, not something someone has to remember to do.

This changes the speed at which an organization can learn. You can ideate, prototype, and test with real users and data faster than ever before to validate ideas. The cycle from “we think this might work” to “here’s what actually happened” compresses from weeks to days.

But accelerating faster to the wrong target isn’t progress. And this is where the real experience matters — where AI systems need product discipline underneath them, not just engineering. A good product practice has to underpin your AI system and workflows. Knowing how to balance customer needs against market strategy, when to chase signal vs. ignore noise, how to frame the right problem before you solve it fast — and how to apply and balance those learnings — that’s not something you automate. That’s the judgment layer that makes the speed useful.

Most companies skip the learning loop entirely. They build the AI demo, it works in the meeting, and then it slowly degrades because nobody designed the feedback mechanism. The difference between AI that demos well and AI that actually runs your business is whether corrections compound into improvement or evaporate between sessions — and whether the improvement is pointed at something that actually matters.

And that’s true even if you think your company isn’t ready for this yet. The companies that figure out AI operations now will have a structural advantage that compounds every month. The ones that wait for the tools to mature are waiting for a maturity that comes from doing the work, not from the tools themselves.

What I learned

Context engineering is the transferable skill. The difference between AI that works and AI that doesn’t is the information architecture around it — what gets loaded, when, in what order, with what guardrails.

Change management is most of the job. Not half. Most. The technology works. Getting teams to trust AI-assisted workflows, training people on new operating models, figuring out what stays human and what doesn’t — I’ve done this before at Bechtel’s 9,000-person jobsite and at SmartAC scaling through Series B. The technology was different. The change management challenge was identical.

The discipline matters more than the tooling. The AI harness will change. The architecture — the context engineering, the governance model, the learning loop, the build discipline — transfers regardless. Companies that invest in the discipline will be able to swap models. Companies that invest only in the tooling will start over.

What Monday morning looks like now

I open one interface. The system already knows which companies need attention based on what’s changed. Design reviews surface with the relevant brand standards and component registry already loaded. The contractor pipeline shows where it’s behind and what the blockers are. Partner integrations have status from the last interaction, not from my memory of the last interaction.

I still make the decisions. The system presents the right information at the right time so I can decide faster and with better context. Some mornings that saves an hour. Some mornings it surfaces something I would have missed entirely.

The real measure isn’t time saved. It’s that the operational surface area of six companies is manageable by one person without the feeling that something’s falling through the cracks. The system holds the context I can’t.

I built this because I needed it. Now it’s what I do — I help companies build their own version of this. Not the same tools, but the same discipline: context engineering, learning loops, build rigor, and the product judgment to point it all in the right direction.

I lead product and operations teams through AI acceleration.