The Shock Release of GPT-4, A New Round of AI Revolution

GPT-4 achieves leaps and bounds in the following areas: powerful map reading capability; text input limit increased to 25,000 characters; significant improvement in response accuracy; ability to generate lyrics, creative text, and easy style changes.

GPT-4: SAT score of 710, still able to practice law

GPT-4 is a large multimodal model that accepts image and text input and then outputs correct text responses. Experiments have shown that GPT-4 performs at human levels on a variety of professional tests and academic benchmarks. For example, it passed the mock bar exam, scoring in the top 10% of test takers; by comparison, GPT-3.5 scored in the bottom 10%.

OpenAI spent six months iterating and tuning GPT-4, using lessons learned from the adversarial testing program and ChatGPT to achieve the best results ever in terms of realism, manageability, and more.

Over the past two years, OpenAI has rebuilt its entire deep learning stack and worked with Azure to design a supercomputer from the ground up for its workloads. OpenAI first attempted to run the supercomputer a year ago while training GPT-3.5, and since then has continued to find and fix bugs and improve its theoretical underpinnings. The result of these improvements was an unprecedentedly stable training run of GPT-4, so much so that OpenAI was able to accurately predict the training performance of GPT-4 in advance, becoming the first major model to do so. openAI says it will continue to focus on reliable scaling and further refining its methods to achieve more robust predictive performance and the ability to plan for the future, which is critical.

Interestingly, the difference between GPT-3.5 and GPT-4 is subtle. The difference becomes apparent when the complexity of the task reaches a sufficient threshold - GPT-4 is more reliable, more creative, and able to handle more subtle instructions than GPT-3.5.

Limitations of GPT-4

Despite its already powerful capabilities, GPT-4 still has limitations similar to earlier GPT models, not the least of which is that it is still not completely reliable. openAI says that GPT-4 still produces illusions, generates wrong answers, and makes inference errors.

For now, the use of language models should be carefully scrutinized for output content, with precise protocols that meet the needs of a particular use case if necessary (e.g., manual review, additional context, or avoidance of use altogether).

Overall, GPT-4 has significantly mitigated the illusion problem compared to previous models (after several iterations and improvements). In OpenAI's internal adversarial realism evaluation, GPT-4 scored 40% higher than the latest GPT-3.5 model.

Risk-averse mitigation

OpenAI said the research team has been iterating on GPT-4 to make it safer and more consistent from the start of training, with efforts including pre-training data selection and filtering, evaluation and expert engagement, model safety improvements, and monitoring and enforcement.

GPT-4 presents similar risks to previous models, such as generating malicious suggestions, incorrect code, or inaccurate information. At the same time, the additional capabilities of GPT-4 resulted in a new set of risks. To understand the scope of these risks, the team engaged more than 50 experts in AI adaptation risk, cybersecurity, biorisk, trust and safety, and international security to conduct adversarial testing of the model's behavior in high-risk areas. These areas required expertise to assess, and feedback and data from these experts provided the basis for mitigation and model improvements.

ChatGPT supports the upgrade to GPT-4.

OpenAI upgraded ChatGPT immediately after the release of GPT-4. ChatGPT Plus subscribers can get access to GPT-4 with usage caps at chat.openai.com.

To access the GPT-4 API (which uses the same ChatCompletions API as gpt-3.5-turbo), subscribers can register to wait. openAI will invite a few developers to experience it.

Once access is granted, users can currently send plain text requests to GPT-4 models (image input is still in a limited alpha stage). Pricing is $0.03 per 1k prompt token and $0.06 per 1k completion token. The default rate is limited to 40k tokens per minute and 200 requests per minute.

The above is most of what OpenAI has said about GPT-4 today. The dissatisfying point is that OpenAI's public technical report does not contain any more information about the model architecture, hardware, computing power, etc., which can be said to be very un-Open.

Anyway, those who can't wait have probably already started testing.