Agentically optimizing LLM prompt cache TTLs for fun and profit
A case study on production objective hill climbing
Firetiger runs a few hundred large language model (LLM) agents in production, and prompt caching is a critical tool to manage the cost of running such a workload. Properly setting cache time-to-live (TTL), how long a cached prefix survives before the next