What is Llama 3?
Llama 3 is Meta AI's sophisticated open-source language model, available in 8B and 70B parameter sets. The models are available in both basic and instruction-tuned versions suitable for conversation applications. Key additions include an enlarged 128K token vocabulary for better multilingual performance, CUDA graph acceleration for up to 4x quicker inference, and compatibility with 4-bit quantization for use on consumer GPUs.
Llama 3: Hardware requirements
Llama 3 8B: This model works with GPUs that have at least 16GB of VRAM, such as the NVIDIA GeForce RTX 3090 or RTX 4090. It may also be quantized to 4-bit precision, reducing the memory footprint to around 7GB and making it compatible with GPUs with lower memory capacity, such as 8GB.Llama 3 70B: For this bigger variant, you'll need more powerful hardware, including at least one GPU with 32GB or more of VRAM, like the NVIDIA A100 or forthcoming H100 GPUs. To maximize performance, use many high-end GPUs or tensor cores.