But don’t all of the best scientists from the biggest American companies tell the thought leaders in congress that open models are dangerous?
Love the OLMo project from the Allen Institute - theyve already spent all the gpu money and given away competitive models up to 32B parameters, with open data, open training and some great insights into how the models work.
I'm also optimistic this can be done, just consider the explosion of new entrants last year. Large companies have unlimited GPUs but they also have an unlimited capacity to waste them, they get better at wasting them as they grow.
The back-story on this is that I was out of Google and bored. I had first hand experience tuning such runs and it the iteration was so painful that I was sure I could get better results by iterating rapidly with OSS and some cloud credits. Fast.AI had a good PyTorch implementation of a single machine ImageNet model, I offered to parallelize it and wrote a harness for fast iteration. Iterating fast felt like we could try 10x more things per day than I could at Google. For instance Andrew Shaw discovered a better way to initialize batch norm layer this way among many other things he tried.
But don’t all of the best scientists from the biggest American companies tell the thought leaders in congress that open models are dangerous?
Love the OLMo project from the Allen Institute - theyve already spent all the gpu money and given away competitive models up to 32B parameters, with open data, open training and some great insights into how the models work.
https://allenai.org/olmo
https://arxiv.org/abs/2504.07096
Edit: I am reading more of Nathan’s work and appreciating his previous analysis of OLMo and other open weight models like Gemma.
I'm also optimistic this can be done, just consider the explosion of new entrants last year. Large companies have unlimited GPUs but they also have an unlimited capacity to waste them, they get better at wasting them as they grow.
Shh! You're not supposed to say that out loud. ;)
Appreciate you adding some more color to my piece. Lovely
And a few months after 30 minute TPU record, some riff raff beat them with 18 minutes and $40 to do a GPU run -- https://www.technologyreview.com/2018/08/10/141098/small-team-of-ai-coders-beats-googles-code/
It's entry #4 on DawnBench site https://dawn.cs.stanford.edu/dawnbench
The back-story on this is that I was out of Google and bored. I had first hand experience tuning such runs and it the iteration was so painful that I was sure I could get better results by iterating rapidly with OSS and some cloud credits. Fast.AI had a good PyTorch implementation of a single machine ImageNet model, I offered to parallelize it and wrote a harness for fast iteration. Iterating fast felt like we could try 10x more things per day than I could at Google. For instance Andrew Shaw discovered a better way to initialize batch norm layer this way among many other things he tried.
We put the code up at https://github.com/cybertronai/imagenet18 and it ran out of the box for at least one other person, not connected to us.
Lol, I love it. Time to bring your talents to open LLMs!
great post!