Dynamically generating agents to monitor PRs

By Spencer Nelson — 23 Feb 2026

Aka deploying changes faster with Firetiger

We're excited to announce Deploy Monitoring with Firetiger, available right now. This new agent lets you ship faster and with more confidence by building sophisticated agents on-the-fly to watch signal after you deploy, automatically detecting bugs and confirming expected behavior.

Connect Firetiger to your GitHub, and we'll monitor pull requests directly, telling you when they're working (or not!) in real time. We'll check out your code, read the PR metadata, and cross-reference your existing telemetry data. This gets turned into an agent that will be activated once the PR is actually shipped and running in your production environment.

You can use this today! Sign up at firetiger.com and give it a try.

┌──────────────────────┐
 │       Open PR        │
 └──────────┬───────────┘
            │
            ▼
 ┌──────────────────────┐ ───┐
 │ Develop a Monitoring │    │ refine via comments
 │         Plan         │ ◀─┘
 └──────────┬───────────┘
            │ merge PR
            ▼
 ┌──────────────────────┐
 │    Deploy the Code   │
 └──────────┬───────────┘
            │
            ▼
 ┌─────────────────────────────────────────────────────┐
 │           FIRETIGER MAKES SURE YOUR CODE            |
 |            IS ACTUALLY WORKING IN PROD              │
 │                                                     │
 │   ┌──────────────────────────────────────────────┐  │
 │   │              checks run on schedule          │  │
 │   └──────────────────────────────────────────────┘  │
 │          │                │                │        │
 │          ▼                ▼                ▼        │
 │  ✅ Intended      🔄 All quiet,     🚨  Side effect  │
 │     effect          watching           detected     │
 │     confirmed       for issues        → alert       │
 │                                                     │
 └─────────────────────────────────────────────────────┘

How we use deploy monitoring internally

At Firetiger, we've been using AI assistance in coding pretty aggressively. As the pace of development (especially code authoring) quickens, we've found ourselves grinding against new bottlenecks - particularly verifying that LLM-generated code is correct and safe in production.

We used to do that with a lot of manual work, and a moderate dose of anxiety. We'd push out some change, and perhaps poke around on the website and try to trigger the new feature, and hope that generalized metric and alarms would catch any issues.

And this works alright, in low volume. But we found that as our small team cranked up the number of changes we deploy per day, it became unrealistic for us to manually check that all features had their intended effects, and bugs were slipping through the cracks.

A good example came up a few weeks ago while working on our in-house query engine. I wanted to add a particular kind of "predicate pushdown" - that is, logic to skip processing big chunks of data on disk, given the filtering conditions in a query. It was a fairly targeted change but there's plenty that could go wrong and which would be hard to find.

Furthermore, I had expectations that it should improve query performance, but measuring performance improvements with traditional alarms would be a crude and noisy approach. I wanted an agent to help with the process of confirming that the change had its intended effect, too.

LLMs are good at targeted monitoring

We experimented with a new idea. Given a pull request description, a check out of the code, and the existing bank of knowledge on metrics, logs, and system infrastructure, could Firetiger agents dynamically generate highly targeted expectations for a single PR's changes?

The intuition here is that a lot of the difficulty lies in the pure tedium of figuring out exactly what metrics, logs, and traces provide useful signal, and computing the right baseline for their metrics. But LLMs are terrific at tedious tasks like this, particularly when they have access to code and good tools.

Our initial experiments were promising! By running actual metrics queries, and by having an excellent base of knowledge built up from observing the production system in action, agents were able to come up with detailed plans that made sense.

Things really started crackling into high quality once we gave our agent access to Bash tooling and the source code and Git history. With these capabilities, we're able to get a rich sense of the history of code changes and code efforts, and search for exact log text to craft very precise signals that indicate whether a PR has succeeded in its intent.

The thought process of our agent as it plans out monitoring for a PR related to query predicate application.

Meeting engineers in their existing workflows

Bad AI tools force you out of your normal flow, asking you to chat with an assistant widget somewhere. Good ones meet you in your existing infrastructure and workflows.

So, we configured the agent to work entirely through interactions on GitHub. To summon it, you mention "@firetiger" in a comment on a PR; it develops a plan and gives you a quick summary:

This is already useful; the agent's work distilled things into a more useful set of things to check than I gave it.

But the kicker came once Firetiger's agent reported back on its findings. Initially, it reported that more data files were getting accessed, rather than fewer, against expectations. But after a few more minutes, it corrected itself by noting that the day had unusually high traffic - and that, proportionally speaking, we were on track and the optimization appeared to be working. This is the sort of subtle complexity that makes this stuff hard, and the agent navigated it effectively.

Deploying with confidence

The net result of this was that I felt much more confident rolling out changes quickly. Interestingly, I don't even feel the need to wait for complete confirmation of all effects before moving on and building forward. Just knowing that Firetiger is watching and will alert me quickly if things go poorly is enough for me to keep rolling forward, much more confident in the changes.

You can use this today

We've opened this up to general availability on Firetiger. It's pretty effortless if you're already using GitHub Deployments:

Sign up for an account (it's completely free for light usage!) at firetiger.com
Connect your GitHub (this is prompted during onboarding, you can't miss it)
And tag @firetiger in a PR

More docs are on our site, too, at docs.firetiger.com/deploy-monitoring/.

We think you'll really like it. Agents and LLMs are changing software development fast, and it's an exciting time. Let us know how it goes!