AI large language models have been especially weak on math. There are now several papers from Google Deep Mind, Alibaba and other universities where AI large language models are at Math Olympiad levels and multiple step reasoning even with small models.
It's finally here. Q* rings true. Tiny LLMs are as good at math as a frontier model.
By using the same techniques Google used to solve Go (MTCS and backprop), Llama8B gets 96.7% on math benchmark GSM8K!
That’s better than GPT-4, Claude and Gemini, with 200x less parameters! pic.twitter.com/ajpEJlqvtD
— Deedy (@deedydas) June 15, 2024
Slowly, then suddenly. https://t.co/UnZ6ITNqTS pic.twitter.com/5r2Rzqy3qG
— Teortaxes▶️ (@teortaxesTex) June 15, 2024

Brian Wang is a Futurist Thought Leader and a popular Science blogger with 1 million readers per month. His blog Nextbigfuture.com is ranked #1 Science News Blog. It covers many disruptive technology and trends including Space, Robotics, Artificial Intelligence, Medicine, Anti-aging Biotechnology, and Nanotechnology.
Known for identifying cutting edge technologies, he is currently a Co-Founder of a startup and fundraiser for high potential early-stage companies. He is the Head of Research for Allocations for deep technology investments and an Angel Investor at Space Angels.
A frequent speaker at corporations, he has been a TEDx speaker, a Singularity University speaker and guest at numerous interviews for radio and podcasts. He is open to public speaking and advising engagements.