Intel Gaudi 2 AI Accelerators Able To Generate Text With Llama 2 Models With Up To 70B Parameters

Hassan Mujtaba
Intel Gaudi 2 AI Accelerators Now Able To Generate Text With Llama 2 Models With Up To 70B Parameters 1

Intel's Gaudi 2 AI accelerators are the most viable alternative to NVIDIA's chips and Hugging Face has demonstrated its text generation capability using Llama 2.

Intel Gaudi 2 Accelerators Demoed In Text-Generation Using Open-Source Llama 2 LLMs With Up To 70 Billion Parameters

As Intel expands its AI software ecosystem, the company is targeting the most popular AI workloads which include LLMs (Large Language Models). The work is made possible using Habana Optimum which serves as the interface transformers & diffusers libraries and the Intel Habana Gaudi processors such as Gaudi 2. The company has already demonstrated the AI capabilities and performance of its Gaudi 2 processors against NVIDIA's A100 GPUs which are one of the popular options in the market but Gaudi 2 does a commendable job in offering faster performance at a competitive TCO.

Related Story Intel, AMD, Microsoft & Others Team Up For Develop The “UALink”, Direct Competitor To NVIDIA NVLink

For the latest demonstration, Hugging Face shows the ease of generating text with Llama 2 (7b, 13b, 70b) using the same Optimum Habana pipeline and the Intel Gaudi 2 AI accelerator. The end result shows that not only was the Gaudi 2 chip able to accept single/multiple prompts, but it was very easy to use and could also handle custom plugins within scripts.

With the Generative AI (GenAI) revolution in full swing, text-generation with open-source transformer models like Llama 2 has become the talk of the town. AI enthusiasts as well as developers are looking to leverage the generative abilities of such models for their own use cases and applications. This article shows how easy it is to generate text with the Llama 2 family of models (7b, 13b and 70b) using Optimum Habana and a custom pipeline class – you'll be able to run the models with just a few lines of code!

This custom pipeline class has been designed to offer great flexibility and ease of use. Moreover, it provides a high level of abstraction and performs end-to-end text-generation which involves pre-processing and post-processing. There are multiple ways to use the pipeline - you can run the run_pipeline.py script from the Optimum Habana repository, add the pipeline class to your own python scripts, or initialize LangChain classes with it.

We presented a custom text-generation pipeline on Intel Gaudi 2 AI accelerator that accepts single or multiple prompts as input. This pipeline offers great flexibility in terms of model size as well as parameters affecting text-generation quality. Furthermore, it is also very easy to use and to plug into your scripts, and is compatible with LangChain.

via Hugging Face

Intel is committed to accelerating its AI segment in the coming years. This year, the company has plans to introduce the third iteration of Gaudi known as Gaudi 3 which is expected to utilize a 5nm process node and is reportedly faster than the NVIDIA H100 at a significantly lower price. Similarly, the company also plans to move to a fully in-house design with the next-gen Falcon Shores GPU which is expected for 2025. The company is also opening up AI capabilities such as the Llama 2 interface with PyTorch for its consumer-tier Arc A-Series GPUs.

Share this story

Deal of the Day

Comments