08: Budget Crunched
The inevitable result of all that unbridled maxxing
Fieldnotes is a weekly read for the people inside companies who shape how AI gets used. Learn more about Superadditive.
This week’s main thing
Last month a single company ran up a $500 million Claude bill without realizing it. Stories like that travel fast (sweet sweet schadenfreude), and the natural response is for companies everywhere to swiftly set spending limits.
The trouble, though, is what a budget can and can’t see. A budget can set a limit on how much money is leaving the business, but it can’t tell you what you’re getting (or not) for that spend.
You can be under budget and sitting on a bunch of useless agent runs. You can also be at your limit, just one run shy of the finding you need to pivot your entire roadmap for the better.
Yes, tokenmaxxing was reductive. But so is a hard limit.
AI makes production cheaper. Starting new work, as a result, is cheaper. But it’s not cost-free (as the costs of these tokens show). How you constrain new work is far more important than limiting overall volume.
The oil business learned this lesson when drilling equipment got cheaper. Yes, they could afford to drill 10x more holes, but the most successful companies didn’t. After all, the goal isn’t more holes: it’s more oil. Instead, they invested in imaging tech so they could make educated guesses about the best places NOT to drill. A cheap survey with 90% accuracy was better than expending the effort to drill and know for the extra 10% certainty (especially when multiplied by every possible opportunity).
What to say to your CEO this week: a budget on AI spending is fine, and we should set one. But a budget tells us how much we spent, not whether we should have, and we’re also flying blind on that second number.
In addition to a budget, we need to be re-training our teams on how to explore new opportunities with these tools.
This week’s move: Scoping the search
A budget governs how much you spend on AI runs. These four rules govern why and what for. You don’t need new tools to make these work, just time sharing these simple habits with your teams.
Write down the question you’re trying to answer and what answer would result in stopping your search. Examples: Is there meaningful whitespace among these competitors? Is it possible to integrate these two areas of functionality without confusing the user? We live in an age of obsequious machines, don’t rely on the tool to tell you you’re asking the wrong question or headed in the wrong direction. Spend more time here than you currently do, this is a new skill to build for this era.
Keep your search budget separate from your build budget. Looking for a possible pathway to an outcome and running a tested road you’ve already found are different kinds of work, judged by opposite rules. In a search, waste is the cost of looking, expected and fine. In a build, waste is just waste. Mix them and you either strangle the search or excuse a shoddy build.
Set the stopping point while you’re cold. Decide the spend and the depth before you start, because halfway down, “we’re so close” always wins, and it’s usually wrong. You can extend, but only on new information that raised the odds, never trying to make good on the money you’ve already spent.
Put dead ends back on the calendar. A search that failed last quarter can succeed after a model upgrade, so abandoning a road isn’t as permanent as it used to be. Don’t reopen everything constantly. Schedule the look, so dead ends get one honest revisit instead of either haunting you or vanishing for good.
Top stories
The streaming feed fills with ghosts. University of Chicago researchers reported that roughly half of newly uploaded tracks on major music platforms now carry signatures of AI generation, detectable through artifacts a listener can’t hear. The figure measures uploads, not plays, and most tracks get almost no listens. But the same week, an AI-generated “artist” with no human performer drew millions of streams and a comment section of fans who found the songs moving, which makes the authenticity question harder to wave off. University of Chicago News
The biggest tech-worker union in the country just formed around AI. IT employees across the University of California system voted to unionize, joining UPTE and bringing the combined unit to several thousand workers. Organizers were explicit that the goal includes limiting AI-driven layoffs and getting a say in how the systems get deployed. It is the clearest sign yet that the response to workplace AI is moving into formal channels: contracts, bargaining units, and the slow machinery of labor law. Blood in the Machine
Meta pulls back a monitoring tool after staff revolt. Meta scaled back an employee-monitoring program following internal pushback, after it emerged that the data it gathered was being used in part to train AI systems. The reversal lands amid wider scrutiny of workplace surveillance software, including research finding that monitoring platforms routinely share worker data with outside firms. The notable part is the retreat itself, at a company otherwise reorganizing aggressively around AI. HR Grapevine
Statehouses start writing the rules. AI employment bills advanced across several states this week. Vermont moved to ban therapy chatbots. California pushed forward a slate that includes a measure requiring 90 days’ notice before technology-driven displacement affecting a quarter or more of a workforce. Illinois sent several AI bills to its governor. The throughline is that the workforce questions employers have been answering privately are starting to get answered by law. Transparency Coalition
The new bottom of the labor market is filming itself. A crop of platforms now pays people small sums to record themselves doing ordinary chores, filling a kettle, emptying a dishwasher, so the footage can train domestic and humanoid robots. One company offers to clean your apartment free in exchange for filming the work. A WIRED report lays out the model and how little it pays. Beneath the expensive AI build-out, a new tier of gig work is forming to feed it. WIRED
Last time around
In the autumn of 1979, a program called VisiCalc went on sale for the Apple II. Before it, a financial projection lived on paper. You ran the columns by hand, and if you wanted to know what happened when the growth rate changed, you redid the page. The cost of asking “what if” was your afternoon.
VisiCalc, and Lotus 1-2-3 behind it, made that cost vanish. Recalculation became instant and free. An analyst could run a hundred scenarios before lunch.
The thing everyone expected was that this would make analysts faster. The thing that actually happened was subtler. When running a scenario costs nothing, running scenarios stops being the skill. Choosing which ones to run becomes the skill. The constraint moved from the arithmetic to the judgment, and the people who thrived were not the ones generating the most projections. They were the ones who knew which projection would change a decision.
The spreadsheet didn’t replace the analyst. It relocated what the analyst was for.
Agents have collapsed the cost of “just try it,” the way the spreadsheet collapsed the cost of “just model it.” And the discipline that survives the collapse is the same one it was in 1979. Not how many questions you can afford to ask. Which ones are worth asking.
From the frontier
In Utah this spring, a service called Doctronic, nicknamed Doc in a Box, started handling prescription-refill requests with an AI front end. Clinicians built it to escalate any uncertain case to a physician and tuned it to err toward caution, toward a needless review rather than a wrong refill. The AI handled the routine volume. In the roughly seven in ten cases where it recommended a refill on its own, two independent physicians later agreed with the call about 97 percent of the time, and the uncertain cases went to human experts.
Potpourri
From someone doing it. Watching companies stand up “token-spend leaderboards” to encourage AI use, Gergely Orosz of The Pragmatic Engineer named the problem precisely: the moment you make spend the score, you’ve invited Goodhart’s Law to the party, and people will work to run the number up. A leaderboard measures effort. It was never going to measure whether the effort was worth anything, and rewarding it teaches a team to drill more holes, not better ones. Gergely Orosz
From the weirder edge of the week. A small web toy called Weather Rothkos pulls the live weather for wherever you are and serves you the Mark Rothko painting whose palette best matches the sky outside your window. Grey morning, and it hands you a slab of slate and plum. It does nothing useful, costs nothing, and is quietly perfect. Weather Rothkos




