German AI Startup Aleph Alpha Launches Pharia-1-LLM Model Family

7 months ago 125
  • Last updated August 26, 2024
  • In AI News

German AI Startup Aleph Alpha has announced the release of its latest foundation model family, Pharia-1-LLM, featuring Pharia-1-LLM-7B-control and Pharia-1-LLM-7B-control-aligned. These models are now publicly available under the Open Aleph License, which permits non-commercial research and educational use.

Pharia-1-LLM-7B-control is designed to produce concise, length-controlled responses and is optimized for German, French, and Spanish languages. The model has been trained on a multilingual base corpus and adheres to EU and national regulations, including copyright and data privacy laws. It is specifically engineered for domain-specific applications in industries such as automotive and engineering.

The Pharia-1-LLM-7B-control-aligned variant includes additional safety features through alignment methods. This model is tailored for use in conversational settings like chatbots or virtual assistants, where safety and clarity are prioritized.

The training of Pharia-1-LLM-7B involved two phases. Initially, the model was pre-trained on a 4.7 trillion token dataset with a sequence length of 8,192 tokens, using 256 A100 GPUs. In the second phase, the model was trained on an additional 3 trillion tokens with a new data mix, utilizing 256 H100 GPUs. The training was performed using mixed-precision strategies and various optimisation techniques to enhance throughput and performance.

In terms of performance, Pharia-1-LLM-7B-control and Pharia-1-LLM-7B-control-aligned were evaluated against similarly sized weight-available multilingual models, including Mistral’s Mistral-7B-Instruct-v0.3 and Meta’s llama-3.1-8b-instruct. 

The comparison results, detailed in the model card, provide insights into the models’ effectiveness across multiple languages, including German, French, and Spanish. The evaluation highlighted areas where Pharia-1-LLM-7B outperforms or matches its peers in specific benchmarks and use cases.

Pharia detailed the model architecture, hyperparameters, and training processes in a comprehensive blog post. The models underwent evaluations against comparable weight-available multilingual models, with results available in the model card.

Picture of Siddharth Jindal

Siddharth Jindal

Siddharth is a media graduate who loves to explore tech through journalism and putting forward ideas worth pondering about in the era of artificial intelligence.

Association of Data Scientists

Tailored Generative AI Training for Your Team

Upcoming Large format Conference

Sep 25-27, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

26 July 2024 | 583 Park Avenue, New York

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

September 25-27, 2024 | 📍Bangalore, India

discord icon

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Read Entire Article