On September 12, 2024, OpenAI announced the release of o1-mini, a cost-efficient reasoning model that excels in STEM fields, particularly math and coding. This new model nearly matches the performance of OpenAI o1 on evaluation benchmarks such as AIME and Codeforces.
Key Features and Performance
- o1-mini is now available to tier 5 API users at a cost 80% cheaper than OpenAI o1-preview.
- ChatGPT Plus, Team, Enterprise, and Edu users can use o1-mini as an alternative to o1-preview, with higher rate limits and lower latency.
- In mathematics, o1-mini achieves a 70.0% score on the AIME competition, competitive with o1 (74.4%) and outperforming o1-preview (44.6%).
- For coding, o1-mini reaches a 1650 Elo rating on Codeforces, comparable to o1 (1673) and superior to o1-preview (1258).
- The model performs well on STEM-related academic benchmarks, sometimes outperforming GPT-4o on science and math tests.
Specialized Training and Efficiency
Unlike larger language models pre-trained on vast text datasets, o1-mini is a smaller model optimized specifically for STEM reasoning during pretraining. This specialization allows it to achieve comparable performance on many useful reasoning tasks while being significantly more cost-efficient.
Safety and Alignment
o1-mini is trained using the same alignment and safety techniques as o1-preview. It demonstrates improved jailbreak robustness compared to GPT-4o and has undergone rigorous safety evaluations before deployment.
Limitations
Due to its STEM specialization, o1-mini’s factual knowledge on non-STEM topics is comparable to smaller language models. OpenAI plans to address these limitations in future versions and explore extending the model to other modalities and specialties outside of STEM.
This release represents a significant step forward in creating more efficient and specialized AI models, potentially opening up new applications in fields requiring advanced reasoning capabilities at a lower cost.