BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News OpenAI Introduces GPT‑4.1 Family with Enhanced Performance and Long-Context Support

OpenAI Introduces GPT‑4.1 Family with Enhanced Performance and Long-Context Support

Listen to this article -  0:00

OpenAI has released a new family of language models—GPT‑4.1, GPT‑4.1 mini, and GPT‑4.1 nano—available via its API. The models improve on GPT‑4o and GPT‑4.5 across several technical benchmarks and introduce support for up to 1 million tokens of context.

According to OpenAI, GPT‑4.1 improves coding capabilities, instruction following, and long-context comprehension. On the SWE-bench Verified benchmark, which measures real-world software engineering tasks, GPT‑4.1 achieves 54.6% accuracy. This is a 21-point increase over GPT‑4o (33.2%) and 26.6 points higher than GPT‑4.5. The model also shows a 10.5-point improvement over GPT‑4o on Scale’s MultiChallenge instruction benchmark.

swe-bench verified accuracy
Source: OpenAI Blog

OpenAI also tested the model’s ability to process extended inputs. All models in the GPT‑4.1 family can handle up to 1 million tokens. Internal evaluations, including OpenAI-MRCR and Graphwalks, indicate that GPT‑4.1 performs reliably across long-context tasks, such as retrieving and reasoning over dispersed information. For example, GPT‑4.1 scored 61.7% on Graphwalks, a benchmark for multi-hop reasoning, compared to 42% for GPT‑4o.

openai-mrcr accuracy
Source: OpenAI Blog

In addition to the main model, GPT‑4.1 mini offers similar performance at lower latency and cost. OpenAI says it matches or exceeds GPT‑4o on most intelligence evaluations while reducing cost by 83%. GPT‑4.1 nano is the smallest and fastest in the series. It is designed for simpler tasks like classification and autocomplete, but still posts high scores, such as 80.1% on MMLU and 50.3% on GPQA.

The company also emphasized improvements in code editing. In Aider’s polyglot benchmark, which tests the ability to generate diffs rather than full-file rewrites, GPT‑4.1 outperforms all previous models, including GPT‑4.5. The model produces fewer unnecessary edits, decreasing from 9% in GPT‑4o to 2% in GPT‑4.1.

OpenAI confirmed that GPT‑4.5 Preview will be deprecated on July 14, 2025. The company cited cost and performance improvements in GPT‑4.1 as reasons for the transition. This aligns with speculation in the community about the temporary nature of GPT‑4.5. One Reddit user commented:

GPT-4.5 was just a preview, not even a 'public beta.' It was just to see what they were (or are) doing regarding new models. Since it is not an official version, it could be said that GPT-4.5 'never' existed, and that is why the new version is GPT-4.1… During the period in which it was available, OpenAI was collecting data… to make, perhaps, a more capable and not so expensive distilled model, which ended up being GPT-4.1.

Pricing has also been adjusted. GPT‑4.1 is around 26% cheaper than GPT‑4o for typical queries. Prompt caching discounts have been raised to 75%, and long-context usage no longer incurs additional charges beyond standard per-token costs.

The GPT‑4.1 family is now accessible via the OpenAI API. It is not yet available in ChatGPT, where updates to GPT‑4o are ongoing.

About the Author

Rate this Article

Adoption
Style

BT