AITopics | frontier model

Collaborating Authors

frontier model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

NVIDIA launches 'Open Secure AI Alliance' initiative to improve cyber defense

EngadgetJul-27-2026, 10:39:52 GMT

NVIDIA has gathered together some of the biggest tech companies in the world to form the Open Secure AI Alliance with the goal of improving cybersecurity. Among the 27 founding members are Microsoft, SpaceX, Dell, The Linux Foundation and NVIDIA itself. "The Open Secure AI Alliance -- building on the leadership of the Linux Foundation's Akrites initiative and OpenSSF community work -- will work to remediate and disclose vulnerabilities using open technologies," NVIDIA wrote. The group argues that to best defend against attacks in the age of AI models like Anthropic's Mythos 5, security researchers need access to both open and closed frontier models. As a prime example, NVIDIA cited OpenAI's recent rogue attack on Hugging Face.

artificial intelligence, hugging face, nvidia, (8 more...)

Engadget

Industry:

Information Technology > Security & Privacy (1.00)
Information Technology > Hardware (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

Position: Require Frontier AILabs To Release Small " Analog " Models Shriyash Upadhyay Martian Chaithanya Bandi Martian Narmeen Oozeer Martian Philip Quirke Martian

Neural Information Processing SystemsJun-17-2026, 07:12:02 GMT

Recent proposals for regulating frontier AI models have sparked concerns about the cost of safety regulation, and most such regulations have been shelved due to the safety-innovation tradeoff. This paper argues for an alternative regulatory approach that ensures AI safety while actively promoting innovation: mandating that large AI laboratories release small, openly accessible "analog models"--scaled-down versions trained similarly to and distilled from their largest proprietary models. Analog models serve as public proxies, allowing broad participation in safety verification, interpretability research, and algorithmic transparency without forcing labs to disclose their full-scale models. Recent research demonstrates that safety and interpretability methods developed using these smaller models generalize effectively to frontier-scale systems. By enabling the wider research community to directly investigate and innovate upon accessible analogs, our policy substantially reduces the regulatory burden and accelerates safety advancements. This mandate promises minimal additional costs, leveraging reusable resources like data and infrastructure, while significantly contributing to the public good. Our hope is not only that this policy be adopted, but that it illustrates a broader principle supporting fundamental research in machine learning: deeper understanding of models relaxes the safety-innovation tradeoff and lets us have more of both.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Law (1.00)
Government > Regional Government > North America Government > United States Government (0.95)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)

Add feedback

Google says Gemini 3.5 Flash rivals 'large flagship models' for coding and agentic tasks

EngadgetMay-19-2026, 17:45:00 GMT

Google says Gemini 3.5 Flash rivals'large flagship models' for coding and agentic tasks Google says Gemini 3.5 Flash rivals'large flagship models' for coding and agentic tasks It can complete tasks in a fraction of the time of other frontier models, Google claims. Google has unveiled Gemini 3.5, starting with the Gemini 3.5 Flash model that promises to outperform Gemini 3.1 Pro in real-world agentic and coding tasks. Announced at Google I/O 2026, this will be Google's default AI model (not to be confused with Flash-Lite), designed to deliver better speed than the current Gemini Pro models at a more affordable price. The tradeoff is lower performance than the 3.5 Pro model (coming next month) in tasks that require deep reasoning and high-context understanding. However, Google has reduced the compromise between the Pro and Flash models, saying Gemini 3.5 Flash delivers intelligence that rivals large flagship models on multiple dimensions.

large language model, machine learning, natural language, (12 more...)

Engadget

Industry:

Information Technology > Services (1.00)
Leisure & Entertainment > Games > Computer Games (0.73)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Introducing ARFBench: A time series question-answering benchmark based on real incidents

AIHubMay-18-2026, 08:40:27 GMT

More than a trillion dollars are lost every year due to system failures. To resolve them, engineers must troubleshoot outages quickly. An important task in incident response involves analyzing observability metrics, or time series data that snapshot the health of software systems. For example, an engineer for a service may use Datadog to answer questions like "When did latency start increasing?" and "What metrics outside of latency are also behaving abnormally?" to localize the root cause of the anomalous behavior. These time series question-answering (TSQA) tasks are essential for engineers, and present challenging and necessary tasks for SRE models and agents to perform.

arfbench, natural language, question answering, (18 more...)

AIHub

Industry: Information Technology > Security & Privacy (0.70)

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.71)

Add feedback

Reid Hoffman Thinks Doctors Should Ask AI for a Second Opinion

WIREDApr-30-2026, 09:00:00 GMT

The LinkedIn cofounder now has an AI drug discovery startup--and thinks not asking chatbots for medical advice is "bordering on committing malpractice." Following a three-decade career at the helm of some of Silicon Valley's most powerful companies--cofounding LinkedIn and sitting on the boards of PayPal and OpenAI-- Reid Hoffman recently turned his attention to health care. Hoffman's startup, Manas AI, is building an AI engine that aims to fast-track the traditionally slow process of drug discovery for various cancers. Inspired by a dinner with renowned cancer physician Siddhartha Mukherjee, the company's cofounder and CEO, its mission statement is to "shift drug discovery from a decade-long process to one that takes a few years." But Hoffman's enthusiasm for generative AI, in particular, stretches far beyond novel drug targets and small molecules.

artificial intelligence, machine learning, natural language, (15 more...)

WIRED

Country: North America > United States > California (0.36)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.58)

Add feedback

Amazon Has New Frontier AI Models--and a Way for Customers to Build Their Own

WIREDDec-2-2025, 16:00:00 GMT

Nova Forge lets Amazon's customers train frontier models for different tasks--a potential breakthrough in making AI actually useful for businesses. Amazon has announced a new family of frontier artificial intelligence models--and a new way for customers to build frontier models of their own. The ecommerce giant announced the second generation of its Nova AI models at re:Invent, a company conference held in Las Vegas. The models are nowhere near as popular as those offered by rivals like OpenAI and Google, but Amazon's plan to make them highly customizable could see them gain traction with its cloud users. Amazon detailed two improved large language models, Nova Lite and Nova Pro; a new realtime voice model called Nova Sonic; and a more experimental model called Nova Omni that performs a simulated kind of reasoning using images, audio, and video as well as text.

amazon, large language model, machine learning, (17 more...)

WIRED

Country:

North America > United States > Nevada > Clark County > Las Vegas (0.25)
North America > United States > California (0.05)
Europe > Slovakia (0.05)
(2 more...)

Industry:

Retail (0.73)
Health & Medicine (0.70)
Information Technology > Services (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.37)

Add feedback

A Rosetta Stone for AI Benchmarks

Ho, Anson, Denain, Jean-Stanislas, Atanasov, David, Albanie, Samuel, Shah, Rohin

arXiv.org Artificial IntelligenceDec-2-2025

Most AI benchmarks saturate within years or even months after they are introduced, making it hard to study long-run trends in AI capabilities. To address this challenge, we build a statistical framework that stitches benchmarks together, putting model capabilities and benchmark difficulties on a single numerical scale. This acts as a "Rosetta Stone", allowing us to compare models across a wide range of abilities and time, even if they are not evaluated on the same benchmarks. Moreover, this works without assuming how capabilities evolve across time or with training compute. We demonstrate three applications of this framework. First, we use it to measure the speed of AI progress over time, and to forecast future AI capabilities. Second, we estimate the rate of improvements in algorithmic efficiency, finding estimates that are higher, but broadly consistent with prior work. Finally, we find that our approach can be used to detect rapid accelerations in AI progress.

benchmark, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2512.00193

Genre: Research Report (1.00)

Industry:

Education (0.46)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

AA-Omniscience: Evaluating Cross-Domain Knowledge Reliability in Large Language Models

Jackson, Declan, Keating, William, Cameron, George, Hill-Smith, Micah

arXiv.org Artificial IntelligenceNov-18-2025

We introduce AA-Omniscience, a benchmark designed to measure both factual recall and knowledge calibration across 6,000 questions. Questions are derived from authoritative academic and industry sources, and cover 42 economically relevant topics within six different domains. The evaluation measures a model's Omniscience Index, a bounded metric (-100 to 100) measuring factual recall that jointly penalizes hallucinations and rewards abstention when uncertain, with 0 equating to a model that answers questions correctly as much as it does incorrectly. Among evaluated models, Claude 4.1 Opus attains the highest score (4.8), making it one of only three models to score above zero. These results reveal persistent factuality and calibration weaknesses across frontier models. Performance also varies by domain, with the models from three different research labs leading across the six domains. This performance variability suggests models should be chosen according to the demands of the use case rather than general performance for tasks where knowledge is important.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.13029

Country:

North America > United States > California (0.28)
North America > United States > Arkansas (0.28)

Genre: Research Report (0.52)

Industry:

Law (0.93)
Health & Medicine (0.93)
Government > Regional Government > North America Government > United States Government (0.46)
Materials > Metals & Mining (0.31)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

GRDD+: An Extended Greek Dialectal Dataset with Cross-Architecture Fine-tuning Evaluation

Chatzikyriakidis, Stergios, Papadakis, Dimitris, Papaioannou, Sevasti-Ioanna, Psaltaki, Erofili

arXiv.org Artificial IntelligenceNov-11-2025

We present an extended Greek Dialectal Dataset (GRDD+) 1that complements the existing GRDD dataset with more data from Cretan, Cypriot, Pontic and Northern Greek, while we add six new varieties: Greco-Corsican, Griko (Southern Italian Greek), Maniot, Heptanesian, Tsakonian, and Katharevusa Greek. The result is a dataset with total size 6,374,939 words and 10 varieties. This is the first dataset with such variation and size to date. We conduct a number of fine-tuning experiments to see the effect of good quality dialectal data on a number of LLMs. We fine-tune three model architectures (Llama-3-8B, Llama-3.1-8B, Krikri-8B) and compare the results to frontier models (Claude-3.7-Sonnet, Gemini-2.5, ChatGPT-5).

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2511.03772

Country:

Europe > Greece (0.28)
North America > United States (0.26)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Decomposition-Enhanced Training for Post-Hoc Attributions In Language Models

Balasubramanian, Sriram, Basu, Samyadeep, Goswami, Koustava, Rossi, Ryan, Manjunatha, Varun, Santhosh, Roshan, Zhang, Ruiyi, Feizi, Soheil, Lipka, Nedim

arXiv.org Artificial IntelligenceNov-7-2025

Large language models (LLMs) are increasingly used for long-document question answering, where reliable attribution to sources is critical for trust. Existing post-hoc attribution methods work well for extractive QA but struggle in multi-hop, abstractive, and semi-extractive settings, where answers synthesize information across passages. To address these challenges, we argue that post-hoc attribution can be reframed as a reasoning problem, where answers are decomposed into constituent units, each tied to specific context. We first show that prompting models to generate such decompositions alongside attributions improves performance. Building on this, we introduce DecompTune, a post-training method that teaches models to produce answer decompositions as intermediate reasoning steps. We curate a diverse dataset of complex QA tasks, annotated with decompositions by a strong LLM, and post-train Qwen-2.5 (7B and 14B) using a two-stage SFT + GRPO pipeline with task-specific curated rewards. Across extensive experiments and ablations, DecompTune substantially improves attribution quality, outperforming prior methods and matching or exceeding state-of-the-art frontier models.

attribution, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2510.25766

Country: North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback