LLM News May 2026: AI Starts Counting Pennies

Some months pass quietly. May 2026 was not one of them.

In just a few weeks we saw a string of news items that, taken on their own, would each deserve a post. The industry is starting to stop asking “what can AI do?” and is beginning to ask “how do we put it to work without draining the wallet?”

Let's dive in.

1. Hermes: the agent that doesn't forget everything every morning

Imagine hiring a brilliant collaborator. On day one you explain how the company works, the processes, the exceptions, the tricks of the trade. On day two they show up at the office and ask: “Hi, I'm new, what do you do here?”

Well, that's more or less the problem with AI agents up to now.

Hermes is an open-source agent that learns from every completed task and becomes more capable over time.

The mechanism is simple: every time Hermes completes a complex task, it autonomously creates a “skill”, that is, a structured document that captures the procedure, the known pitfalls and the verification steps. The next time a similar task comes up, the agent loads the skill instead of reasoning from scratch. Like an employee who takes notes and reads them back.

Hype or a real breakthrough? Probably both. An agent that accumulates experience instead of resetting every session is structurally worth more than one that, however brilliant, wakes up every morning remembering nothing. Like a character out of Memento, but with a monthly API bill.

Hermes works with Claude, GPT, Gemini, DeepSeek and any model running locally via Ollama. For anyone working with sensitive data who doesn't want to ship it around the world, that's no small detail.

2. The great token savings: or, how to stop burning money elegantly

Here begins the common thread that ties the next three topics together.

The underlying problem is this: every time an AI model returns structured data, you're paying for every single token consumed, and the format that data arrives in makes a huge difference to the final bill.

So far the default format has been JSON: convenient, universal, supported everywhere. But JSON was never designed for language models: quotation marks, curly braces, commas and repeated keys inflate the token count without adding any real value. The model doesn't understand more thanks to all that punctuation, it just pays the cost of it.

Enter TOON (Token-Oriented Object Notation). A lightweight format designed specifically to reduce structural overhead in LLM responses. At first glance it looks like a hybrid between JSON and CSV, but the goal is only one: use fewer tokens to return the same information.

In practice, instead of asking the model to respond in JSON, you ask it to use TOON in its reply. Same content, leaner structure, lower cost. The numbers are hard to ignore: for structured, repetitive data such as transactions, events, logs and catalogs, the savings reach 30-60%, with a direct impact on API costs.

That said, TOON is no magic wand: for deeply nested and irregular data, JSON remains clearer. TOON shines on tabular and uniform data. The right tool for the right problem.

At Codebaker we're experimenting with it internally these very weeks and we'll be evaluating whether to adopt it.

3. Graphify and Understand Anything: give the AI a map, not an archive

Let's stay on the token-savings thread, because it gets even more interesting.

Ever used Claude Code on a medium-to-large codebase? The agent starts exploring the files one by one, building context piece by piece like a detective rummaging through a disorganized archive. Every read is tokens, every session starts from scratch. It's a bit like using Excel to manage the supply chain of a Formula 1 single-seater: really?

Graphify and Understand Anything propose a different approach: instead of making the agent re-read everything every time, you build once a structured map of the knowledge, a graph of entities and relationships, that the agent can query directly.

Graphify converts the entire folder — whether it's code, documentation, PDFs, images or meeting transcripts — into this navigable graph. The claimed savings reach up to 70% on tokens compared to traditional file exploration, with more precise answers. Understand Anything does something similar for codebases: it turns files and dependencies into an interactive map with guided tours, semantic search and a visualization of the connections between components.

The metaphor is simple: instead of sending the agent into an archive to dig through a thousand boxes, you give it a structured index. Fewer tokens, more precision, lower costs. The exact same common thread as TOON, applied to the structure of knowledge rather than the format of data.

The idea, incidentally, stems from the LM Wiki concept introduced by Karpathy.

4. Markdown vs HTML: the format of AI documents (and the token conspiracy theory)

This is the point that sparked the most discussion in the tech community this month.

The thesis, made viral by Thariq Shihipar, lead engineer of Claude Code at Anthropic, is simple: Markdown is becoming inadequate for AI-generated technical documents and HTML is replacing it.

Markdown was born to be written and read by humans with ease. It works great for that. But when AI generates technical documentation, headings, bold text and bullet points are no longer enough: people need something more visual. With HTML, the agent can embed real charts, interactive tables, structured layouts and working components directly into the document, and the file opens anywhere with no extra tools.

Everything perfect? Not quite. A conspiracy theory is circulating among the more mischievous developers: HTML is 2 to 4 times more verbose than Markdown. It generates many more tokens. And guess who profits every time you burn more tokens on Claude? Anthropic. That same Anthropic that enthusiastically promotes the adoption of HTML.

The practical point remains, though: for READMEs on GitHub or Slack chats, Markdown is and will stay perfect. For complex technical documentation, AI-generated HTML is becoming the standard in many teams.

The common thread…

If you've followed along this far, the pattern is clear.

TOON reduces tokens in structured data. Graphify and Understand Anything reduce tokens in knowledge navigation. Hermes accumulates experience instead of resetting every session. Karpathy uses Claude to improve Claude. And HTML, whether a stroke of genius or a well-orchestrated conspiracy, makes documents richer and more queryable.

They're all movements in the same direction: from AI as a variable and unpredictable expense, toward systems that learn, remember, optimize and ultimately cost less over time.

The industry is starting to stop posturing like a prima donna with spectacular benchmarks and has begun counting the pennies in its pocket. About time.

For us at Codebaker, who work every day on Data Alchemy, our Intelligent Document Processing system, these aren't conference talking points. They're the levers we pull every week to make our systems more accurate, cheaper and more useful for our clients.

If you want to dig deeper into one of these topics, or think through how they apply to your context, write to us. We're as curious as you are. And we promise not to reply in XML.

One more thing…

Big news: Karpathy joins Anthropic

On May 19, Andrej Karpathy, co-founder of OpenAI and former AI director at Tesla, announced that he had joined Anthropic, where he'll work with the pre-training team using Claude itself to accelerate research on the models.

The news went everywhere, mostly told as a soap-opera episode about the AI talent war. But there's a detail worth highlighting: Karpathy is the father of vibe coding, the approach where you describe what you want in natural language and the AI writes the code for you. Now he's working on the very model that powers it. Almost poetic, if we weren't talking about billions of dollars of compute.

Anthropic's hiring pattern tells the story by itself: CTOs of companies like Workday, Instagram and Box have left their roles to become individual researchers at Anthropic. Not to lead divisions. To do research. Head down. In an industry where everyone chases visibility, that's already a statement of intent.

This post is part of our monthly LLM news column. Follow the Codebaker page on LinkedIn so you don't miss the next ones.

Want to think through how these topics apply to your company?

At Codebaker we work every day on LLMs and Intelligent Document Processing with Data Alchemy. If you want to explore one of these topics or figure out how to cut AI costs in your context, write to us.

May 2026: the month AI stopped putting on airs and started counting the pennies in its pocket