Like every Big Tech company these days, Meta has its own flagship generative AI model, called Llama. Llama is somewhat unique among major models in that it’s “open,” meaning developers can download and use it however they please (with certain limitations). That’s in contrast to models like Anthropic’s Claude, Google’s Gemini, xAI’s Grok, and most of OpenAI’s ChatGPT models, which can only be accessed via APIs.
In the interest of giving developers choice, however, Meta has also partnered with vendors, including AWS, Google Cloud, and Microsoft Azure, to make cloud-hosted versions of Llama available. In addition, the company publishes tools, libraries, and recipes in its Llama cookbook to help developers fine-tune, evaluate, and adapt the models to their domain. With newer generations like Llama 3 and Llama 4, these capabilities have expanded to include native multimodal support and broader cloud rollouts.
Here’s everything you need to know about Meta’s Llama, from its capabilities and editions to where you can use it. We’ll keep this post updated as Meta releases upgrades and introduces new dev tools to support the model’s use.
What is Llama?
Llama is a family of models — not just one. The latest version is Llama 4; it was released in April 2025 and includes three models:
Scout: 17 billion active parameters, 109 billion total parameters, and a context window of 10 million tokens.
Maverick: 17 billion active parameters, 400 billion total parameters, and a context window of 1 million tokens.
Behemoth: Not yet released but will have
Continue Reading on TechCrunch
This preview shows approximately 15% of the article. Read the full story on the publisher's website to support quality journalism.