It does not make money now. Dump more money in.
There are many AI benchmark tests, and Gemini 3 leads on many of them, with a tremendous improvement from Gemini 2.5. Gemini 3 leads in the AI Text arena, with gpt 5 slipping to 5th. Claude and Grok have also improved in the past few months.
AI Text Arena, August 2025 scores:
- gbt-5-high 1463
- gemini-2.5-pro 1457
- claude-opus-4-1-20250805 1447
… - grok-4-0709 1430
https://archive.ph/lJlyO
AI Text Arena, November 2025 scores:
- gemini-3-pro 1492
- grok-4.1-thinking 1482
- claude-opus-4-5-20251101 1466
- grok-4.1 1464
- gpt-5.1-high 1461
- claude-opus-4-5-20251101-thinking-32k 1460
- gemini-2.5-pro 1452
https://archive.ph/LtBVa
View rankings across various LLMs on their versatility, linguistic precision, and cultural context across text
Google Gemini 3 Benchmarks, Nov 18, 2025
"Gemini 3 Pro is finally here! With a 1M token context window and a 64K output window, Google’s latest model is making a strong claim for the top spot in the AI landscape. … What used to be 2025’s model provider underdog is now a first in class AI model, dominating key benchmarks …
Gemini 3 Pro scores 91.9% on GPQA Diamond (and 93.8% with Deep Think), giving it a nearly 4-point lead over GPT-5.1 (88.1%) on advanced scientific questions.
The most notable upgrade is in abstract visual reasoning. Its 31.1% score on ARC-AGI-2 (45.1% with Deep Think) is a massive jump from Gemini 2.5 Pro (4.9%) and nearly doubles the score of GPT-5.1 (17.6%), indicating a core improvement in non-verbal problem-solving.
Most importantly this is the first time we’re seeing a 40%+ result with Gemini 3 deep think, and 37.5% with Gemini 3 Pro, on the hardest reasoning test, the Humanity’s Last Exam, which is almost 11% increase from GPT 5.1. This gives us a lot of reassurance that this model will do great if you’re building agents."
https://www.vellum.ai/blog/google-gemini-3-benchmarks?utm_source=direct&utm_medium=none