9 Comments
User's avatar
Carl Boettiger's avatar

Thanks Ben, great piece as always!

I think an overlooked corollary to this is that you don't actually need a super LLM sophisticated LLM for tool calling to work. Even the largest LLMs still can't add reliably, but they've all learned not try, they just use calculators. But this is a game small/local/open LLMs can do to. Have you poked at the open LLM ecosystem for tool calls (err, "agents")?

Sure, if the tool is just "here's a bash shell" then yeah, an LLM needs to still be pretty clever. But you can give an LLM tools with more narrowly scoped and clearly explained uses, and voila, even a tiny model can suddenly be very powerful. The beautiful thing about this is, as you point out so nicely here, building a tool doesn't involve any GPUs or transformers, here in good conventional-software development land of JSON schemas and function calls. We've had great success building simple MCP tools that a model like gpt-oss or nvidia nemotron-3 can easily outperform what Opus can do with only the generic tools claude-code gives it...

Ben Recht's avatar

That makes too much sense and sounds very neat!

Notger Heinz's avatar

Probably the best explanation of how agentic system work, I have seen so far. Thank you!

James Cham's avatar

You might find the work that Justin McCarthy at StrongDM has been publishing--parts of it echo what you've been exploring here: https://factory.strongdm.ai/

Onid's avatar

> LLMs, by contrast, remain deeply weird and mysterious. I don’t care how many pictures someone draws to explain transformers. It’s been a decade, and not a single explainer of LLM architecture has helped me understand what they actually do or why.

I've found that the easiest way to understand neural networks intuitively is to think of them in terms of information flow across layers - any other perspective just gets too complex.

I suspect you're probably asking about much deeper questions that I wouldn't be able to answer, but I think most practitioners who actually designing neural architectures are primarily thinking in terms of what information a given piece of the network has access to, and which architectural properties allow it to more easily access the information it needs.

That's how I always think about it, anyway.

Mauricio Arango's avatar

Thanks for the very interesting analogy.

As I understand it, the LLM's function is to produce a process workflow based on the input prompt. In applications where the number of tools is large and there are constraints on the inputs and outputs of each of the tools, the agent is going to be very complex and may not be verifiable. Consider for example financial systems.

Avik De's avatar

Excellent analogy and article!

Dragor's avatar

The subject matter of this essay was pretty interesting and seemed to explain something neat about the nature of LLMs (albeit one that my rudimentary and abandoned coding ability did not enable my sleep deprived brain to grant me understanding off. But. That subtitle! Uggh! I can’t say whether I believe or disbelieve what happened next, but I certainly believe that phrase is used in contexts where it is either untrue or the concept of dis/belief does not apply in a proportion of contexts that approximates to always. I felt serious desire to unsubscribe, but for seeing in that desire a certain petulance I would have. Uggh!

Cagatay Candan's avatar

On LLMs: I can understand the word embedding step a bit, after the success of recommendation systems (Netflix challenge) and I can relate to or even accept its training process (pre-training process, if I am correct) by next or neighboring token/word prediction (in spite of all difficulties of all different languages...) But; I do not understand at all how a simple inner product of current token/word with the tokens/word in history (attention block) can yield something that useful? I would like to guess this million dollar question that there is a minute advantage brought by each inner product stage at every cascade of these steps (through the depths of the neural network) and their overall effect compounds to something useful. You can say that I am just trying to convince myself that this works like the boosting schemes with weak learners (adaboost); but, who is to tell? It is possible that another LLM will show us how and why LLMs work and then give us a metric to track the progress of training. until that time it is boosting for me!