I Joined Firetiger as an AI Skeptic

By Grace Gao — 26 Feb 2026

I didn't buy it

I didn't believe in any of this stuff.

I was an engineer student at the University of Wisconsin. The AI hype was mostly something I saw on X.

LLMs could write a function. Cool. Ask one to fix a bug in an actual codebase and it fell apart. I stopped using them for real code pretty quickly. Glorified commenting tool for a while. This was the thing everybody was losing their minds about?

Before Firetiger I had a job where my entire role was figuring out how to make Datadog stop sending thousands of emails a day to one team. Contractors kept setting up alerts and leaving. Nobody knew which ones mattered. The maintenance alone was a full time job.

That’s “observability” at a lot of companies. A pile of data nobody wants to deal with. Most observability tools are built to collect everything and help you find nothing.

Firetiger was a cheaper way to store telemetry without cardinality problems. Real problem. Real product. At the time it wasn't even an AI company. That's part of why I applied. Plenty of warnings about LLM wrapper slop companies and I wasn't trying to join one.

During my work trial I spent a chunk of time writing a README by hand. Rustam, our co-founder, was incredulous. “Why are you doing that manually?”

That's how skeptical I was. Wouldn't even use LLMs for documentation.

The team had already stuck an agent on the telemetry data by the time I got there. It was basically what I expected of AI. A glorified search bar that could talk. No memory and no understanding of what an error actually meant. Ask it if something was broken and it says yes or no.

Kinda useful. Mostly a toy.

San Francisco is contagious

I moved to SF for Firetiger. AI is everywhere here. It's hard not to try it. There are worse things to pick up in this city.

I watched a senior developer use this thing called Claude Code. Gave it a shot. Felt like I was 10x overnight. Things that used to take me all day took an hour.

That cracked something. The models had gotten better while I wasn't paying attention. This wasn't the same thing I had written off six months earlier.

And it wasn't just my code. The Firetiger agents were changing too.

Firetiger changed my mind

Claude Code cracked the door. Firetiger kicked it open.

The early agent was a junior with amnesia. But I watched it change. A big part of why is context. A small context window means small tasks. As those windows grew the ceiling kept going up. A senior engineer isn't better than a junior because they're smarter. They just have more of the system in their head. Juniors turn into seniors when they've absorbed enough context. The same thing happened here.

It didn't happen cleanly. We kept making drastic changes. Renaming things. Ripping out tools and replacing them. Figuring out what agents should even be allowed to do. One month we'd give them access to something, next month we'd pull it back. Then a better model would come out and the thing that didn't work before worked fine. So we'd give it back.

There wasn't a meeting where someone said ok we're an AI company now. The models kept getting better and every time they did we had to rethink what the agents should look like.

It doesn't even look like the same product anymore!

We were already sitting on a mountain of telemetry data. Too much for any person to sift through. For agents it's fuel. They write SQL on the fly, do joins across services, and aggregate over time windows. Stuff that would take me an hour in a dashboard.

The same way Claude Code changed how I write code, the agents changed how I think about observability.

Instead of "there's an error in this service" they come back with "this endpoint has been throwing 500s for 20 minutes, it's hitting about 300 users and based on the deploy that went out at 2pm here's what you should check first."

The first time I saw that I thought ok this is doing what I used to do but better and faster.

They trace problems across services to the root cause. They resolve recurring stuff on their own. Same issue, same fix, every week. I used to be that person. Now the agent handles it.

They figure out which alerts matter and which ones don't. That Datadog email problem from my old job? I spent months on that with rules and filters. The agents just learn what you respond to and what you ignore.

It took me a while to accept that.

We gave them a notebook. They jot down everything they learn. Patterns. What happened last time? The agent knows what your traffic looks like on a Tuesday afternoon and catches drift before your users do.

I still find that wild to watch.

We give them a real shell. Bash. They write scripts, curl APIs, inspect git repos. Sandboxed so we see everything. The first time I saw an agent write a bash script to debug something I felt the same thing I felt with Claude Code. This is doing what I do.

Then they started talking to each other

One notices a spike in errors. Another one is already poking at the deploy pipeline. They piece it together on their own. No human in the middle.

The first time I saw it I was excited and then immediately a little scared.

This is the part where my skepticism doesn't have a good answer anymore. I can nitpick things agents get wrong. But the trajectory is obvious.

The humans are the bottleneck now

Knowing you can talk to an agent and knowing how to talk to it well are completely different things.

The first time I used Claude Code I was being vague with it. Barely telling it anything about what I actually wanted. Then I started giving it real context about my system and what good looked like. Night and day. The agent didn't change. I did.

Most people haven't made that shift yet. They treat the agent like a search bar. Short vague query. Short vague answer. They walk away thinking it can't do much. But I know what it looks like when someone gives it real context, a completely different experience.

Getting out of the way

Same thing on our side. Early on every new capability we had to explicitly build. Everything is hand wired.

Now we're loosening that. Give them access to more and let them figure out what they need.

Same lesson I learned with Claude Code: I got better results when I stopped micromanaging. The agents got smart enough that the limits we put on them were the thing holding them back.

It’s a weird moment as a builder. You realize the bottleneck isn't time in the day, or the model anymore. It’s you.

I still don't know where this goes

Nobody does. Anyone with a detailed 3 year roadmap for agents is making it up.

Every time there's a real jump in model capability we can limit the agents less. Give them more room. Stuff that needed a human last year gets handed off this year.

A lot of companies have entire teams doing work that agents handle fine now. That's not a popular thing to say but it's true.

So we get the infrastructure ready. Agents wake up when something needs attention and go back to sleep when it doesn't. The state is durable, compute is disposable. Smarter agents keep showing up and use whatever we've built.

That's the bet. Every generation better than the last. Consistent enough that we're building around it.

I came in as an AI skeptic. Still am about a lot of it. Plenty of hype that deserves skepticism.

But now I have five Claude Code sessions running at all times. I still carefully scan every PR to make sure Claude didn't hallucinate a client secret into the repository. It's more of a proofread now though. Went from not trusting LLMs to “Write a README to reviewing AI output as my main workflow.”

People hate change. I hate change. I was my own bottleneck.

I don't know what the agents will be doing next year. But I know Firetiger won't be the bottleneck when they get there.