What If the Model Only Ran Once?

Click a button, wait for inference. Filter a table, wait again. That’s how most AI-generated UIs work right now: the model sits in the loop on every interaction, reasoning about what to render next, and the user pays the latency cost each time. For a chatbot that’s fine, but for a tool someone reaches for during an incident it really isn’t.

At Cased, we were building operational UIs that pull from GitHub, AWS, Sentry, Datadog, services where auth is per-customer and data shapes vary. Static dashboards couldn’t adapt to each customer’s setup, and real-time generation couldn’t provide the responsiveness people need when things are breaking. I kept thinking there had to be something in between, and I think what we landed on generalizes well beyond what we built.

Two decisions that shape everything

Any system using an LLM to drive UI sits somewhere on two axes.

How constrained is the interface? Can the model output arbitrary HTML, or does it select from a fixed set of components backed by a typed API?

When does the model run? On every user interaction, or once upfront to generate logic that executes later?

These two decisions determine almost everything about the system’s reliability, latency, cost, and debuggability. Most discussions about “AI-generated UI” skip over them entirely.

Why constraints help

In theory, a sufficiently capable model could operate over any interface: arbitrary HTML, unbounded toolsets, whatever you give it. In practice, open-ended interfaces explode the space of possible behaviors. Every new thing the model can do is another thing you have to test, debug, and trust. The model renders a table one way on Monday, a different way on Tuesday, and now you’re debugging the UI generator instead of the UI.

What we landed on was constraining what the model can produce while leaving how it decides open.

Google’s A2UI protocol gets this right on the presentation layer. Agents send declarative component descriptions, and clients render them using native widgets from a fixed catalog of pre-approved components. No arbitrary HTML injection.

We did something similar at Cased. Controllers return structured data that maps to a fixed set of UI components, so the model can’t render whatever it wants. That sounds limiting at first, but HTML itself is a small set of preset elements; a well-designed component library gives you endless combinations on top of controlled primitives.

We went further and constrained the runtime too. Instead of giving the model access to unbounded tools, we exposed a deterministic, capability-scoped API, and let the model generate glue logic connecting the UI schema to it. The model decides the how, but the where and what stay fixed. Constraining both layers narrows the blast radius of model mistakes: if the model hallucinates, it hallucinates within bounds.

But that still leaves the latency problem

Constraints handle what can go wrong, but they don’t help with speed.

A2UI keeps the model in the loop at runtime. Agents reason on every interaction and emit UI descriptions that clients render as they stream in, which means each user action can trigger a new inference cycle. You end up with a responsive frontend sitting on top of an inherently slow backend, and the UI feels sluggish because every interaction round-trips through a model.

I think there’s a better approach for most operational tools: get the model out of the hot path entirely.

When someone at Cased asked for a view like “show me open PRs with failing checks,” the model didn’t generate a static report, it generated a controller: a small function that knows how to fetch and shape that data. We’d validate the controller against real data, store it, and from that point forward every render just executed the function directly, with no model in the loop.

Filtering, sorting, and drilling down all happen instantly because you’re running code instead of waiting for inference.

This is closer to the distinction between compiling and interpreting. Move reasoning out of the hot path and you get responsiveness, replayability, predictable behavior. The view might be useful for ten minutes during an incident, or it might become something you check every morning. Either way, the data stays current and the interaction stays fast.

Refinement still works through conversation. “Add a column for comments.” “Filter to PRs from the last week.” Each modification generates a new controller derived from the existing one, which is more precise than starting from scratch because the system already knows what data is available and what the current view does. Once the new controller is generated, the model steps aside again.

Where this doesn’t apply

I should be honest about the tradeoff.

Just-in-time reasoning makes more sense when the UI needs to adapt to unpredictable agent state, when interactions are conversational or exploratory, when latency is acceptable, or when flexibility matters more than repeatability. Most agent-driven chat experiences fit here.

Ahead-of-time generation fits better when the same view gets reused and shared, when the UI needs to feel snappy, or when you need to test and audit the exact behavior. Most operational and infrastructure tools land here.

The mistake is treating these as the same problem. They’re different points in the design space, and I think a lot of the skepticism around AI-generated UI comes from applying the wrong approach to the wrong context.

A vocabulary problem

“AI-generated UI” could mean an LLM writing HTML on every request, a protocol streaming component descriptions, or a system that generates code once and runs it deterministically. These have radically different reliability, latency, and cost profiles, but we talk about them as if they’re one thing.

I don’t think we have a shared vocabulary for these tradeoffs yet. A2UI and the work we did at Cased are just two points in the space, and naming the axes (constraint level and generation timing) hopefully makes it easier to reason about where other approaches sit.

My bet is that constrained interfaces plus ahead-of-time generation is the practical path for most tools today. Models will keep getting faster and more reliable, and the interfaces can expand later. For now, getting the model out of the runtime loop is what turns an AI-generated UI from something you demo into something you actually use.