ChatGPT is an advanced chatbot engine, based on the GPT-3.5 and GPT-4 models by OpenAI. It is a powerful model but it can be interesting to consider open-source alternatives.
Exploring open-source alternatives to ChatGPT allows for customization and adaptation to specific needs or projects, potentially offering greater control over the technology while preserving data privacy. Open-source models ensure transparency and allow users to understand the underlying mechanisms of the AI model.
There are very good open-source ChatGPT alternatives available today like LLaMA 3, Mixtral 8x7B, Yi 34B, and DBRX. Let's investigate these alternatives.
ChatGPT is derived from GPT-3.5 and GPT-4, modern generative AI models based on the Transformer architecture. The transformer architecture is a specific type of neural network invented by Google in 2017. See more here.
Generative AI models are basically good at generating some text based on a specific input. Depending on your input, you can tell your AI model to do various things for you. For example you can ask your model to categorize a piece of text, extract specific entities from a piece of text, summarize large contents, paraphrase some content, answer questions... and of course act as a chatbot.
All the models introduced below are "foundational" models, meaning that they are raw models that usually require few-shot learning or fine-tuning to properly follow your instructions. It also means that these models do not implement any kind of restrictions by default.
In order to understand how to leverage these generative AI models more deeply, we do recommend that you read our guide about how to use generative models with few-shot learning: read it here.
ChatGPT is a generative model that has been specifically instructed to behave like a chatbot. In the rest of this article we are going to explore open-source alternatives to ChatGPT. In order to use them in conversational mode you will either need to use few-shot learning for conversational AI or fine-tuning. Learn more about few-shot learning for conversational AI here. Learn more about fine-tuning here.
Meta has launched the LLaMA 3 series of large language models (LLMs), which is a suite of generative text models that have been pre-trained and fine-tuned, varying in size from 7 to 70 billion parameters. The versions of these models specifically fine-tuned for conversation, known as Llama-2-Chat, are designed for dialogue applications. Compared to freely available chat models, Llama-2-Chat models demonstrate superior performance across most evaluated benchmarks and, based on our assessments of usefulness and safety, they match the performance of some well-known proprietary models like ChatGPT and PaLM.
LLaMA 3 incorporates an auto-regressive language model built on an enhanced transformer framework. Its improved versions undergo supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to better align with human expectations regarding usefulness and safety.
The development of LLaMA 3 spanned from January to July 2023, with its pre-training phase leveraging over 2 trillion tokens from data accessible to the public. The fine-tuning phase utilized publicly available instruction datasets and included more than a million new examples annotated by humans. None of the data used in either the pre-training or fine-tuning phases comes from Meta's user data. While the pre-training data was collected until September 2022, some of the data for fine-tuning is more recent, extending to July 2023.
LLaMA 3 is designed for both commercial and research applications primarily in English. The fine-tuned models are tailored for creating chat applications akin to digital assistants, while the pre-trained models are versatile enough to be adjusted for diverse natural language generation uses.
You can easily use LLaMA 3 on NLP Cloud: try it here.
Mixtral surpasses LLaMA 3 70B in the majority of assessments and delivers six times faster inference rates. It stands out as the most powerful model with open access and the top choice when considering cost-efficiency. Specifically, it equals or exceeds the performance of GPT3.5 in most recognized tests.
The capabilities of Mixtral include managing up to 32k tokens smoothly, supporting multiple languages such as English, French, Italian, German, and Spanish, exhibiting exceptional code generation capabilities, and the ability to be fine-tuned to follow instructions, achieving a score of 8.3 on MT-Bench.
At its core, Mixtral is a sparse mixture-of-experts network, functioning as a decoder-only model. Its structure allows for the selection of 8 different parameter groups within the feedforward block. A dedicated router network at each layer selects two of these groups, or "experts," to process each token, combining their results in an additive manner.
This method enables the expansion of a model's parameters while efficiently managing cost and latency by utilizing only a portion of available parameters for each token. Specifically, Mixtral possesses a total of 46.7B parameters but applies only 12.9B parameters per token, thereby achieving the processing speed and cost equivalent to a 12.9B model.
Mixtral was developed using data from the public internet, with the training of experts and routers occurring simultaneously.
You can easily try Mixtral 8x7B on NLP Cloud: try it here.
The Yi series models represent the latest advancement in open-source large language models developed from the ground up by 01.AI. These models, aimed at bilingual use, have been trained on a massive 3-terabyte multilingual dataset, positioning them as among the most powerful large language models globally with strong capabilities in understanding language, reasoning, and reading comprehension.
The Yi-34B-Chat model secured the second position, just behind GPT-4 Turbo, and surpassed other large language models like GPT-4, Mixtral, and Claude on the AlpacaEval Leaderboard, with this ranking based on data up to January 2024. In terms of open-source models, the Yi-34B claimed the top spot for both English and Chinese language tasks across several benchmarks, outshining models like Falcon-180B, Llama-70B, and Claude, according to standings on the Hugging Face Open LLM Leaderboard (pre-trained) and C-Eval, with data considered up to November 2023.
Structured similarly to the Llama model architecture, the Yi series allows users to access and utilize the existing ecosystem of tools, libraries, and resources designed for Llama. This compatibility simplifies the process for developers, eradicating the need for new tool development and enhancing productivity in development processes.
You can easily try Yi 34B on NLP Cloud: try it here.
DBRX is a large language model built on a transformer architecture focusing solely on decoding and employs a method known as next-token prediction for its training. It features a detailed mixture-of-experts (MoE) structure, boasting a grand total of 132 billion parameters, out of which 36 billion are utilized for any given input. The model underwent pre-training on a vast corpus of 12 trillion tokens, encompassing both text and code, up until a cutoff in December 2023. This blend of training data prominently includes natural language as well as coding examples, with a significant portion in English.
DBRX stands out for its fine-grained approach in the use of experts, operating with 16 experts and selecting 4 for each task, in contrast to other MoE models like Mixtral-8x7B and Grok-1, which have 8 experts but only choose 2. This approach yields 65 times more potential expert combinations, leading to a notable enhancement in the model's performance. DBRX incorporates advanced features such as rotary position encodings (RoPE), gated linear units (GLU), and grouped query attention (GQA) for its operations.
For its pre-training, DBRX was fed with 12 trillion tokens from a meticulously compiled dataset, with a context range extending up to 32,000 tokens. The team behind it, Databricks, believes this dataset offers twice the quality per token compared to data used for the MPT model family.
The dataset was created using Databricks' comprehensive toolkit, which includes Apache Spark™ and Databricks notebooks for processing the data, along with Unity Catalog for managing and governing it. Databricks implemented a curriculum learning approach during the pre-training phase, adjusting the data mix in a manner that significantly uplifted the model's quality.
DBRX is programmed to process only text-based inputs and is capable of handling inputs up to 32,768 tokens in length.
ChatGPT is an amazing chatbot engine that is able to answer very advanced questions. This AI engine is actually even more relevant than most humans in many fields.
However, ChatGPT can raise data privacy issues and is restricted for many use cases. It is interesting to compare ChatGPT to the most advanced open-source alternatives: LLaMA 3, Mixtral 8x7B,Yi 34B, and DBRX. And no doubt that even more advanced open-source AI models are going to be released soon.
If you want to use LLaMA 3, Yi 34B, and Mixtral 8x7B, in production, don't hesitate to have a try on the NLP Cloud API (try it here)!
Juliette
Marketing manager at NLP Cloud