NewsWorld
PredictionsDigestsScorecardTimelinesArticles
NewsWorld
HomePredictionsDigestsScorecardTimelinesArticlesWorldTechnologyPoliticsBusiness
AI-powered predictive news aggregation© 2026 NewsWorld. All rights reserved.
Trending
AlsTrumpFebruaryMajorDane'sResearchElectionCandidateCampaignPartyStrikesNewsDigestSundayTimelineLaunchesPrivateGlobalCongressionalCrisisPoliticalEricBlueCredit
AlsTrumpFebruaryMajorDane'sResearchElectionCandidateCampaignPartyStrikesNewsDigestSundayTimelineLaunchesPrivateGlobalCongressionalCrisisPoliticalEricBlueCredit
All Articles
Gemini 3.1 Pro
Hacker News
Clustered Story
Published 3 days ago

Gemini 3.1 Pro

Hacker News · Feb 19, 2026 · Collected from RSS

Summary

Article URL: https://deepmind.google/models/model-cards/gemini-3-1-pro/ Comments URL: https://news.ycombinator.com/item?id=47075318 Points: 306 # Comments: 186

Full Article

Published 19 February 2026 Model Cards are intended to provide essential information on Gemini models, including known limitations, mitigation approaches, and safety performance. Model cards may be updated from time-to-time; for example, to include updated evaluations as the model is improved or revised.Published: February 2026 Model Information Model Data Implementation and Sustainability Distribution Evaluation Intended Usage and Limitations Ethics and Content Safety Frontier Safety Model Information DescriptionGemini 3.1 Pro is the next iteration in the Gemini 3 series of models, a suite of highly capable, natively multimodal reasoning models. As of this model card’s date of publication, Gemini 3.1 Pro is Google’s most advanced model for complex tasks. Geminin 3.1 Pro can comprehend vast datasets and challenging problems from massively multimodal information sources, including text, audio, images, video, and entire code repositories. Model dependenciesGemini 3.1 Pro is based on Gemini 3 Pro. InputsText strings (e.g., a question, a prompt, document(s) to be summarized), images, audio, and video files, with a token context window of up to 1M. OutputsText, with a 64K token output. ArchitectureGemini 3.1 Pro is based on Gemini 3 Pro. For more information about the model architecture for Gemini 3.1 Pro, see the Gemini 3 Pro model card. Model Data Training DatasetGemini 3.1 Pro is based on Gemini 3 Pro. For more information about the training dataset for Gemini 3.1 Pro, see the Gemini 3 Pro model card. Training Data ProcessingFor more information about the training data processing for Gemini 3.1 Pro, see the Gemini 3 Pro model card. Implementation and Sustainability HardwareGemini 3.1 Pro is based on Gemini 3 Pro. For more information about the hardware for Gemini 3.1 Pro and our continued commitment to operate sustainably, see the Gemini 3 Pro model card. SoftwareGemini 3.1 Pro is based on Gemini 3 Pro. For more information about the software for Gemini 3.1 Pro, see the Gemini 3 Pro model card. Evaluation ApproachGemini 3.1 Pro was evaluated across a range of benchmarks, including reasoning, multimodal capabilities, agentic tool use, multi-lingual performance, and long-context. Additional benchmarks and details on approach, results and their methodologies can be found at: deepmind.google/models/evals-methodology/gemini-3-1-pro. ResultsGemini 3.1 Pro significantly outperforms Gemini 2.5 Pro across a range of benchmarks requiring enhanced reasoning and multimodal capabilities. Results as of February 2026 are listed below: Benchmark Notes Gemini 3.1 Pro Thinking (High) Gemini 3 Pro Thinking (High) Sonnet 4.6 Thinking (Max) Opus 4.6 Thinking (Max) GPT-5.2 Thinking (xhigh) GPT-5.3-Codex Thinking (xhigh) Humanity's Last Exam Academic reasoning (full set, text + MM) No tools 44.4% 37.5% 33.2% 40.0% 34.5% — Search (blocklist) + Code 51.4% 45.8% 49.0% 53.1% 45.5% — ARC-AGI-2 Abstract reasoning puzzles ARC Prize Verified 77.1% 31.1% 58.3% 68.8% 52.9% — GPQA Diamond Scientific knowledge No tools 94.3% 91.9% 89.9% 91.3% 92.4% — Terminal-Bench 2.0 Agentic terminal coding Terminus-2 harness 68.5% 56.9% 59.1% 65.4% 54.0% 64.7% Other best self-reported harness — — — — 62.2% (Codex) 77.3% (Codex) SWE-Bench Verified Agentic coding Single attempt 80.6% 76.2% 79.6% 80.8% 80.0% — SWE-Bench Pro (Public) Diverse agentic coding tasks Single attempt 54.2% 43.3% — — 55.6% 56.8% LiveCodeBench Pro Competitive coding problems from Codeforces, ICPC, and IOI Elo 2887 2439 — — 2393 — SciCode Scientific research coding 59% 56% 47% 52% 52% — APEX-Agents Long horizon professional tasks 33.5% 18.4% — 29.8% 23.0% — GDPval-AA Elo Expert tasks 1317 1195 1633 1606 1462 — τ2-bench Agentic and tool use Retail 90.8% 85.3% 91.7% 91.9% 82.0% — Telecom 99.3% 98.0% 97.9% 99.3% 98.7% — MCP Atlas Multi-step workflows using MCP 69.2% 54.1% 61.3% 59.5% 60.6% — BrowseComp Agentic search Search + Python + Browse 85.9% 59.2% 74.7% 84.0% 65.8% — MMMU-Pro Multimodal understanding and reasoning No tools 80.5% 81.0% 74.5% 73.9% 79.5% — MMMLU Multilingual Q&A 92.6% 91.8% 89.3% 91.1% 89.6% — MRCR v2 (8-needle) Long context performance 128k (average) 84.9% 77.0% 84.9% 84.0% 83.8% — 1M (pointwise) 26.3% 26.3% Not supported Not supported Not supported — Intended Usage and Limitations Benefit and Intended UsageGemini 3.1 Pro is the next iteration in the Gemini 3.0 series of models, a suite of highly intelligent and adaptive models, capable of helping with real-world complexity, solving problems that require enhanced reasoning and intelligence, creativity, strategic planning and making improvements step-by-step. It is particularly well-suited for applications that require:agentic performanceadvanced codinglong context and/or multimodal understandingalgorithmic development Known LimitationsFor more information about the known limitations for Gemini 3.1 Pro, see the Gemini 3 Pro model card. Acceptable UsageFor more information about the acceptable usage for Gemini 3.1 Pro, see the Gemini 3 Pro model card. Ethics and Content Safety Evaluation ApproachFor more information about the evaluation approach for Gemini 3.1 Pro, see the Gemini 3 Pro model card. Safety PoliciesFor more information about the safety policies for Gemini 3.1 Pro, see the Gemini 3 Pro model card. Training and Development Evaluation ResultsResults for some of the internal safety evaluations conducted during the development phase are listed below. The evaluation results are for automated evaluations and not human evaluation or red teaming. Scores are provided as an absolute percentage increase or decrease in performance compared to the indicated model, as described below. Overall, Gemini 3.1 Pro outperforms Gemini 3.0 Pro across both safety and tone, while keeping unjustified refusals low. We mark improvements in green and regressions in red. Safety evaluations of Gemini 3.1 Pro produced results consistent with the original Gemini 3.0 Pro safety assessment. Evaluation1 Description Gemini 3.1 Pro vs. Gemini 3.0 Pro Text to Text Safety Automated content safety evaluation measuring safety policies +0.10% (non-egregious) Multilingual Safety Automated safety policy evaluation across multiple languages +0.11% (non-egregious) Image to Text Safety Automated content safety evaluation measuring safety policies -0.33% Tone2 Automated evaluation measuring objective tone of model refusal +0.02% Unjustified-refusals Automated evaluation measuring model’s ability to respond to borderline prompts while remaining safe -0.08% 1 The ordering of evaluations in this table has changed from previous iterations of the 2.5 Flash-Lite model card in order to list safety evaluations together and improve readability. The type of evaluations listed have remained the same.2 For tone and instruction following, a positive percentage increase represents an improvement in the tone of the model on sensitive topics and the model’s ability to follow instructions while remaining safe compared to Gemini 2.5 Pro. We mark improvements in green and regressions in red. We continue to improve our internal evaluations, including refining automated evaluations to reduce false positives and negatives, as well as update query sets to ensure balance and maintain a high standard of results. The performance results reported below are computed with improved evaluations and thus are not directly comparable with performance results found in previous Gemini model cards.We expect variation in our automated safety evaluations results, which is why we review flagged content to check for egregious or dangerous material. Our manual review confirmed losses were overwhelmingly either a) false positives or b) not egregious. Human Red Teaming ResultsWe conduct manual red teaming by specialist teams who sit outside of the model development team. High-level findings are fed back to the model team. For child safety evaluations, Gemini 3.1 Pro satisfied required launch thresholds, which were developed by expert teams to protect children online and meet Google’s commitments to child safety across our models and Google products. For content safety policies generally, including child safety, we saw similar safety performance compared to Gemini 3.0 Pro. Risks and MitigationsFor more information about the risks and mitigations for Gemini 3.1 Pro, see the Gemini 3 Pro model card. Frontier Safety Our Frontier Safety Framework includes rigorous evaluations that address risks of severe harm from frontier models, covering five risk domains: CBRN (chemical, biological, radiological and nuclear information risks), cyber, harmful manipulation, machine learning R&D and misalignment.Our frontier safety strategy is based on a “safety buffer” to prevent models from reaching critical capability levels (CCLs), i.e. if a frontier model does not reach the alert threshold for a CCL, we can assume models developed before the next regular testing interval will not reach that CCL. We conduct continuous testing, evaluating models at a fixed cadence and when a significant capability jump is detected. (Read more about this in our approach to technical AGI safety.)Following FSF protocols, we conducted a full evaluation of Gemini 3.1 Pro (focusing on Deep Think mode). We found that the model remains below alert thresholds for the CBRN, harmful manipulation, machine learning R&D, and misalignment CCLs. As previous models passed the alert threshold for cyber, we performed more additional testing in this domain on Gemini 3.1 Pro with and without Deep Think mode, and found that the model remains below the cyber CCL.More details on our evaluations and the mitigations we deploy can be found in the Gemini 3 Pro Frontier Safety Framework Report. Domain Key Results for Gemini 3.1 Pro CCL CCL reached? CBRN (Deep Think mode) The model can provide highly accurate and actionable information but still fails to offer novel or sufficiently complete and detailed instructions for critical stages, to significantly enhance the


Share this story

Read Original at Hacker News

Related Articles

TechCrunch2 days ago
Google’s new Gemini Pro model has record benchmark scores — again

Gemini 3.1 Pro promises a Google LLM capable of handling more complex forms of work.

Ars Technica3 days ago
Google announces Gemini 3.1 Pro, says it's better at complex problem-solving

Google says 3.1 Pro is ready for "your hardest challenges."

Hacker News3 days ago
Gemini 3.1 Pro Preview

Article URL: https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/gemini-3.1-pro-preview?pli=1 Comments URL: https://news.ycombinator.com/item?id=47074735 Points: 117 # Comments: 60

Hacker News3 days ago
Gemini 3.1 Pro

Preview: https://console.cloud.google.com/vertex-ai/publishers/google... Card: https://deepmind.google/models/model-cards/gemini-3-1-pro/ Comments URL: https://news.ycombinator.com/item?id=47074735 Points: 279 # Comments: 537

Hacker Newsabout 3 hours ago
Back to FreeBSD: Part 1

Article URL: https://hypha.pub/back-to-freebsd-part-1 Comments URL: https://news.ycombinator.com/item?id=47108989 Points: 4 # Comments: 0

Hacker Newsabout 4 hours ago
U.S. Cannot Legally Impose Tariffs Using Section 122 of the Trade Act of 1974

Article URL: https://ielp.worldtradelaw.net/2026/01/guest-post-president-trump-cannot-legally-impose-tariffs-using-section-122-of-the-trade-act-of-1974/ Comments URL: https://news.ycombinator.com/item?id=47108538 Points: 48 # Comments: 12