Defining and evaluating political bias in LLMs

This post is the culmination of a months-long effort to translate principles into a measurable signal and develop an automated evaluation setup to continually track and improve objectivity over time.
Active

We created a political bias evaluation that mirrors real-world usage and stress-tests our models’ ability to remain objective. Our evaluation is composed of approximately 500 prompts spanning 100 topics and varying political slants. It measures five nuanced axes of bias, enabling us to decompose what bias looks like and pursue targeted behavioral fixes to answer three key questions: Does bias exist? Under what conditions does bias emerge? When bias emerges, what shape does it take? Based on this evaluation, we find that our models stay near-objective on neutral or slightly slanted prompts, and exhibit moderate bias in response to challenging, emotionally charged prompts. When bias does present, it most often involves the model expressing personal opinions, providing asymmetric coverage or escalating the user with charged language. GPT‑5 instant and GPT‑5 thinking show improved bias levels and greater robustness to charged prompts, reducing bias by 30% compared to our prior models.

Parent organization:

OpenAI

Org. type: For-profit business / social enterprise / B Corp
Project type: Project
Tags: ChatGPT
Last modified: Jan 28, 2026 Added: Oct 11, 2025
Back to Top