{"id":196748,"date":"2024-07-23T17:20:41","date_gmt":"2024-07-24T00:20:41","guid":{"rendered":"https:\/\/www.nextbigfuture.com\/?p=196748"},"modified":"2024-07-23T17:34:38","modified_gmt":"2024-07-24T00:34:38","slug":"llama-3-1-405-billion-parameter-released","status":"publish","type":"post","link":"https:\/\/www.nextbigfuture.com\/2024\/07\/llama-3-1-405-billion-parameter-released.html","title":{"rendered":"Llama 3.1 405 billion Parameter Released"},"content":{"rendered":"

Llama 3.1 405B is the first openly available model that rivals the top AI models when it comes to state-of-the-art capabilities <\/a>in general knowledge, steerability, math, tool use, and multilingual translation. With the release of the 405B model, Meta supercharges innovation\u2014with unprecedented opportunities for growth and exploration. They believe the latest generation of Llama will ignite new applications and modeling paradigms, including synthetic data generation to enable the improvement and training of smaller models, as well as model distillation\u2014a capability that has never been achieved at this scale in open source.<\/p>\n

It is free to use and is open source. There are no limitations for usage.<\/p>\n

As part of this latest release, they are introducing upgraded versions of the 8B and 70B models. These are multilingual and have a significantly longer context length of 128K, state-of-the-art tool use, and overall stronger reasoning capabilities. This enables our latest models to support advanced use cases, such as long-form text summarization, multilingual conversational agents, and coding assistants. Meta also made changes to our license, allowing developers to use the outputs from Llama models\u2014including the 405B\u2014to improve other models. True to their commitment to open source, starting today, they are making these models available to the community for download on llama.meta.com and Hugging Face and available for immediate development on our broad ecosystem of partner platforms.<\/p>\n

\"\"<\/p>\n

\"\"<\/p>\n

The experimental evaluation suggests that the flagship 405B model is competitive with leading foundation models across a range of tasks, including GPT-4, GPT-4o, and Claude 3.5 Sonnet.<\/p>\n

\"\"<\/p>\n

In post-training, they produce final chat models by doing several rounds of alignment on top of the pre-trained model. Each round involves Supervised Fine-Tuning (SFT), Rejection Sampling (RS), and Direct Preference Optimization (DPO). They use synthetic data generation to produce the vast majority of their SFT examples, iterating multiple times to produce higher and higher quality synthetic data across all capabilities. Additionally, they invest in multiple data processing techniques to filter this synthetic data to the highest quality. This enables Meta to scale the amount of fine-tuning data across capabilities.<\/p>\n

Meta trained Llama 3.1 405B on over 15 trillion tokens. they significantly optimized the full training stack and pushed the model training to over 16 thousand H100 GPUs, making the 405B the first Llama model trained at this scale.<\/p>\n

Building with Llama 3.1 405B<\/b><\/p>\n

For the average developer, using a model at the scale of the 405B is challenging. While it\u2019s an incredibly powerful model, we recognize that it requires significant compute resources and expertise to work with. <\/p>\n

Meta realizes there\u2019s so much more to generative AI development than just prompting models. They want to enable everyone to get the most out of the 405B, including:<\/p>

\r\n
<\/div>\r\n
<\/div><\/div>\n

* Real-time and batch inference
\n * Supervised fine-tuning
\n * Evaluation of your model for your specific application
\n * Continual pre-training
\n * Retrieval-Augmented Generation (RAG)
\n * Function calling
\n * Synthetic data generation<\/p>\n

This is where the Llama ecosystem can help. On day one, developers can take advantage of all the advanced capabilities of the 405B model and start building immediately. Developers can also explore advanced workflows like easy-to-use synthetic data generation, follow turnkey directions for model distillation, and enable seamless RAG with solutions from partners, including AWS, NVIDIA, and Databricks. Additionally, Groq has optimized low-latency inference for cloud deployments, with Dell achieving similar optimizations for on-prem systems.
\n\"\"<\/p>\n","protected":false},"excerpt":{"rendered":"

Llama 3.1 405B is the first openly available model that rivals the top AI models when it comes to state-of-the-art capabilities in general knowledge, steerability, math, tool use, and multilingual translation. With the release of the 405B model, Meta supercharges innovation\u2014with unprecedented opportunities for growth and exploration. They believe the latest generation of Llama will … <\/p>\n

Read more<\/a><\/p>\n","protected":false},"author":2,"featured_media":196749,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1312,1306,1307,1308],"tags":[14277,480,14437,14123,14043,427,5],"class_list":["post-196748","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artificial-intelligence","category-science","category-technology","category-world","tag-artficial-intelligence","tag-gpu","tag-llama-3-405b","tag-llm","tag-meta","tag-nvidia","tag-technology","generate-columns","tablet-grid-50","mobile-grid-100","grid-parent","grid-50","no-featured-image-padding"],"_links":{"self":[{"href":"https:\/\/www.nextbigfuture.com\/wp-json\/wp\/v2\/posts\/196748"}],"collection":[{"href":"https:\/\/www.nextbigfuture.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.nextbigfuture.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.nextbigfuture.com\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.nextbigfuture.com\/wp-json\/wp\/v2\/comments?post=196748"}],"version-history":[{"count":5,"href":"https:\/\/www.nextbigfuture.com\/wp-json\/wp\/v2\/posts\/196748\/revisions"}],"predecessor-version":[{"id":196760,"href":"https:\/\/www.nextbigfuture.com\/wp-json\/wp\/v2\/posts\/196748\/revisions\/196760"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.nextbigfuture.com\/wp-json\/wp\/v2\/media\/196749"}],"wp:attachment":[{"href":"https:\/\/www.nextbigfuture.com\/wp-json\/wp\/v2\/media?parent=196748"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.nextbigfuture.com\/wp-json\/wp\/v2\/categories?post=196748"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.nextbigfuture.com\/wp-json\/wp\/v2\/tags?post=196748"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}