How Kernel keeps shipping fast with Firetiger
Mason and Sayan, founding engineers at Kernel, talk about what changed when Kernel turned on Firetiger Change Monitors across their pull request workflow.
I sat down with Mason and Sayan, founding engineers at Kernel, to talk about how Firetiger Change Monitors have increased deployment frequency, reduced change failure rate, and improved mean time to recover across their pull request and release workflows.
Kernel builds browser infrastructure for web agents and automations. Internally, they also build the platform their own engineers use to orchestrate that work: a cloud agent surface called HypeShip, built on top of their VM orchestrator HypeMan.
The team operates the way a lot of AI-native engineering organizations now do: PR volume keeps climbing, coding agents like Claude Code and Cursor are in everyone's daily workflow, and traditional dashboards have stopped being the thing engineers actually look at when something feels off.
The Kernel stack itself does some heavy lifting for its browser hungry customers, and Kernel wants to keep things running smoothly while rapidly shipping new features and code changes.
This is the regime Change Monitor was built for.
Actionable monitoring of every shipped change
The first thing Mason said about Firetiger wasn't about a regression or a rollback, it was about email:
"When I see a Firetiger GitHub email, and Firetiger is almost in the first line, I automatically read it. I think that they're very high signal."
Most of the alerts hitting a working engineer's inbox are either things they already know about, things that aren't actually for them, or things that are too vague to act on.
Each Firetiger notification is anchored to a specific change, making it directly understandable and actionable. The agent has read the PR diff, drafted a monitoring plan for what that change is supposed to do, watched the deploy, and produced a verdict that either greenlights the change or names what looks wrong. There's no "for your awareness" version of that.
A monitoring plan on every PR
Kernel plugged Firetiger into their existing setup with no significant changes to their telemetry stack. The Change Monitor reads each pull request, generates a monitoring plan specific to what the change touches, and watches the rollout across environments. The plan and the verdict both land on the PR, which is where the team already lives.
"Using Firetiger Change Monitoring is standard practice across the company on the majority of PRs."
The plans Firetiger writes for Kernel read like checklists from somebody who has just spent focused time reading the diff and studying available observability data:
- "Refactors health check TTL tracking to DB-based; watch for idle detection failures…" — on a PR that moved a session-health TTL out of memory.
- "Fixes StartupBurnTimeMs capture; watch for customer pool values shifting…" — on a billing-relevant timer-accounting change.
- "Extends envoy readiness check to all envoy sessions; watch for browser creation…" — on an infrastructure-layer readiness change in the browser pool.
- "Fixes QEMU startup cleanup with faster failure detection; watch for spawn errors…" — on a HypeMan change that touched VM spawn lifecycle.
When engineers at Kernel are prepping to ship a change, they'll open their PR, see the Firetiger summary, and skim the plan to confirm it's looking at the right thing. When the plan is too aggressive or too narrow they’ll quickly work with an agent to update the monitoring plan. After that they don’t need to think about it again until they hear back from Firetiger.
Catching issues other monitoring missed
The reason the verdicts matter is that they catch a class of regression that doesn't show up in a global dashboard.
Mason walked me through a few recent ones. One was a latent race condition in a managed auth session workflow, surfaced after a traffic spike a few weeks before our conversation, which is exactly the kind of regression that hides until enough concurrency comes through the system. The signal hit when Kernel's managed-auth health check success rate fell from its normal range to an unhealthy one, which Change Monitor caught and attributed back to the responsible code path before a customer reported it.
Another was a set of Temporal version-gate deprecations: after enough time had passed since a workflow version bump, the team removed the gates expecting clean removal, and Firetiger picked up the resulting error increase; visible in temporal_workflow_failed and temporal_activity_execution_failed counts that climbed against the prior baseline, before any of the existing alerts did.
These are the regressions that a traditional observability regime would not catch. They sit inside one code path, or one customer cohort, or one rollout window. The diff is the thing that knows where to look, and the only way to use that knowledge is to read the diff and watch the deploy with it in mind.
The Firetiger pattern Kernel relies on is the per-PR loop: a monitoring plan written from the diff, a watching window around the deploy, and a verdict that names what happened. When a regression is detected, enough context is delivered that the engineer can act on cleaning up the change without scrambling to understand what exactly is going on.
"The most valuable thing is that it finds something that happened as a cause of this ... and I can take that and run with it."
Sayan also pointed out an unexpected secondary use: week-over-week trend analysis on usage patterns. Their existing logs platform was built for point-in-time queries and times out on multi-day aggregates. Firetiger handles those queries reliably because the underlying telemetry lives in a data lake designed for them.
What Firetiger is watching
The Change Monitor's plans on Kernel's PRs are anchored to the specific telemetry Kernel emits and jobs its product is doing for its customers, which lets the monitor track things like:
- Managed-auth lifecycle — health-check success, session duration, recording-start counts.
- Temporal workflow surface — workflow and activity failures, end-to-end latency, sticky-cache hit rate.
- Browser pool and hotpool — idle counts, target counts, wake latency, miss rate, warming-state counts.
- Customer session pool — circuit-breaker state, consecutive failures, provisioning failure percentage, lease duration.
- CUA auth-flow simulator — per-scenario success/failure across the synthetic auth flows Kernel runs continuously.
When a PR touches any of those code paths, the plan it gets is scoped to the relevant slice. Verdicts are similarly scoped, the engineer doesn't get told "errors went up," but rather much more context-heavy messages like:
"managed-auth health checks fell from normal to under 10% over four hours after deploy of PR #X, here's the metric and the time window."
Where we're headed
A theme that came up several times in the conversation was the consolidation pattern: as engineering teams pull more of their workflow into Slack, Linear, and their own cloud-agent surfaces, the question is how external tools meet engineers where they already are.
Kernel's team uses HypeShip every day. They want every reliability signal that matters: change verdicts, regression reports, incident context, to flow into that surface, in a way that lets agents pick up the work from there. Firetiger's role, as Mason described it, is to be the layer that watches each deploy and hands the result to whichever agent or human needs to act next.
"The way that Firetiger talks to HypeShip is, like, perfect."
That's the shape of the loop we're building toward. Coding agents write the change. Firetiger watches the change in production. When something is wrong, the verdict lands in the engineer's surface of choice: Slack, the PR, the incident timeline, an internal agent platform, with enough context that the next step is obvious. The team's job, as it has always been, is to keep the iteration loop short and the context clean.
When that loop works, the team gets to keep shipping fast. Which is, in the end, the thing the work was always supposed to support.
Firetiger uses AI agents to watch production for change-caused regressions, triage incidents back to the responsible deploy, and hand investigation context to your engineers and coding agents. Learn more about Firetiger or get started for free.