The World's Largest Open Source Alternative to ChatGPT is Here, Supporting 35 Languages

Technology Author: Yunfeng Zhang Apr 18, 2023 07:59 PM (GMT+8)

Since ChatGPT was made available for public testing last November, OpenAI has been making headlines on major tech sites and has become the tool of choice for many developers. chatGPT has been implemented not only to provide code suggestions, summarize long texts, answer questions, etc., but more importantly to usher in a new era of AIGC.

ChatGPT

However, due to the problem of OpenAI no longer being open, the tool has been controversial on the way to being highly acclaimed. In this context, a group of open-source practitioners has started various attempts to replicate a ChatGPT in the large model dimension, and OpenAssistant is one of the open-source products competing.

OpenAssistant machine learning model is operated by a German non-profit organization LAION. Recently, the organization announced that the OpenAssistant model, training data and code are now available, and called the model "the world's largest open-source replica of ChatGPT".

The OpenAssistant project began in December 2022, shortly after OpenAI released ChatGPT.

"We don't stop at replicating ChatGPT; we want to build the assistant of the future that can not only write emails and cover letters, but also do meaningful work, use APIs, dynamically research information, and more, and can be personalized and extended by anyone. We want to do this in an open and accessible way, which means not only building a great assistant but also making it small and efficient enough to run on consumer hardware," the OpenAssistant project maintainers write on their GitHub page.

In a nutshell, the goal of Open Assistant is to create an open-source AI assistant with the same capabilities as ChatGPT. With this project, the project maintainers believe they can improve the language itself, similar to how a steady stream of people create new art and images. 

To demonstrate the effectiveness of the OpenAssistant Conversation dataset, the research team came up with the notion that OpenAssistant is the first fully open-source, large-scale command tuning model trained on human data.

In parallel, the team used the collected data to focus on fine-tuned language models for Meta's LLaMA model and EleutherAI's Pyhtia model. Of these, Pythia is a state-of-the-art language model with a generous open-source license, while LLaMA is a powerful language model with a customized non-commercial license.

However, the model also has some limitations. The paper shows that the training data collected by the research team was mostly contributed by male annotators with a median age of 26 years. The paper says, "This demographic profile may inadvertently introduce bias in the dataset, as it will necessarily reflect the values, opinions and interests of the annotators."

However, the team also took steps to detect and remove harmful information from the dataset, but the system is not infallible." Given the limitations discussed above, we advocate using our LLM only in the context of academic research," the paper says, "and we strongly encourage researchers to thoroughly investigate the safety and biases of these models before employing them in downstream tasks. It is important to recognize that published models may exhibit insecure behavior and are likely to be vulnerable to injection attacks."

Overall, using the open-source OpenAssistant does serve as a substitute when you can't use OpenAI's API or Plus. In response, there are also many users who have said:

"This is an exciting event. I'm stopping my ChatGPT subscription. Wish there was an easy way to copy my ChatGPT conversations to Google Docs or directly to OpenAssistant so I could try them out and see if they work for OpenAssistant as well."