AITopics

The proliferation of probabilistic AI has prompted proposals for specialized stochastic computers. Despite promising efficiency gains, these proposals have failed to gain traction because they rely on fundamentally limited modeling techniques and exotic, unscalable hardware. In this work, we address these shortcomings by proposing an all-transistor probabilistic computer that implements powerful denoising models at the hardware level. A system-level analysis indicates that devices based on our architecture could achieve performance parity with GPUs on a simple image benchmark using approximately 10,000 times less energy.

boltzmann machine, large language model, machine learning, (21 more...)

2510.23972

Country:

Europe (0.67)
North America > United States (0.67)
North America > Canada > Ontario (0.28)

Genre: Research Report (1.00)

Industry: Energy (0.93)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(4 more...)

UltraCUA: A Foundation Model for Computer Use Agents with Hybrid Action

Yang, Yuhao, Yang, Zhen, Dou, Zi-Yi, Nguyen, Anh, You, Keen, Attia, Omar, Szot, Andrew, Feng, Michael, Ramrakhya, Ram, Toshev, Alexander, Huang, Chao, Yang, Yinfei, Gan, Zhe

Computer-use agents face a fundamental limitation. They rely exclusively on primitive GUI actions (click, type, scroll), creating brittle execution chains prone to cascading failures. While API-driven agents harness rich capabilities through structured interfaces and tools, computer-use agents remain constrained to low-level visual interactions. We present UltraCUA, a foundation model that transcends this limitation through hybrid action-seamlessly unifying primitive GUI operations with high-level tool execution. Our innovation rests on four critical advances. First, an automated pipeline extracts and scales tool capabilities from software documentation and code repositories. Second, a synthetic data engine produces 17,000+ verifiable tasks capturing real-world computer-use complexity. Third, comprehensive hybrid action trajectory collection incorporates both GUI primitives and strategic tool calls. Fourth, a two-stage training methodology combines supervised fine-tuning with online reinforcement learning, enabling intelligent action selection between GUI and API. Evaluation with our 7B and 32B UltraCUA models reveals transformative performance gains. On OSWorld, UltraCUA achieves 22% relative improvement while executing 11% faster than existing approaches, averagely. Cross-domain validation on WindowsAgentArena demonstrates robust generalization with 21.7% success rate, surpassing Windows-trained baselines. The hybrid action paradigm proves essential, reducing error propagation while improving execution efficiency. This work establishes a scalable paradigm bridging primitive GUI interactions and high-level tool intelligence, enabling more resilient and adaptable computer use agents for diverse environments and complex real-world tasks.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

2510.1779

Genre:

Research Report (0.82)
Workflow (0.69)

Technology:

Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Greco, Davide, Rawlik, Konrad

Same model, better performance: the impact of shuffling on DNA Language Models benchmarking

Large Language Models are increasingly popular in genomics due to their potential to decode complex biological sequences. Hence, researchers require a standardized benchmark to evaluate DNA Language Models (DNA LMs) capabilities. However, evaluating DNA LMs is a complex task that intersects genomic's domain-specific challenges and machine learning methodologies, where seemingly minor implementation details can significantly compromise benchmark validity. We demonstrate this through BEND (Benchmarking DNA Language Models), where hardware-dependent hyperparameters -- number of data loading workers and buffer sizes -- create spurious performance variations of up to 4% for identical models. The problem stems from inadequate data shuffling interacting with domain specific data characteristics. Experiments with three DNA language models (HyenaDNA, DNABERT-2, ResNet-LM) show these artifacts affect both absolute performance and relative model rankings. We propose a simple solution: pre-shuffling data before storage eliminates hardware dependencies while maintaining efficiency. This work highlights how standard ML practices can interact unexpectedly with domain-specific data characteristics, with broader implications for benchmark design in specialized domains.

large language model, machine learning, natural language, (18 more...)

2510.12617

Genre: Research Report (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.60)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.34)

ARE: Scaling Up Agent Environments and Evaluations

Froger, Romain, Andrews, Pierre, Bettini, Matteo, Budhiraja, Amar, Cabral, Ricardo Silveira, Do, Virginie, Garreau, Emilien, Gaya, Jean-Baptiste, Laurençon, Hugo, Lecanu, Maxime, Malkan, Kunal, Mekala, Dheeraj, Ménard, Pierre, Bertran, Gerard Moreno-Torres, Piterbarg, Ulyana, Plekhanov, Mikhail, Rita, Mathieu, Rusakov, Andrey, Vorotilov, Vladislav, Wang, Mengjue, Yu, Ian, Benhalloum, Amine, Mialon, Grégoire, Scialom, Thomas

We introduce Meta Agents Research Environments (ARE), a research platform for scalable creation of environments, integration of synthetic or real applications, and execution of agentic orchestrations. ARE provides simple abstractions to build complex and diverse environments, each with their own rules, tools, content, and verifiers, helping to bridge the gap between model development and real-world deployment. We also propose Gaia2, a benchmark built in ARE and designed to measure general agent capabilities. Beyond search and execution, Gaia2 requires agents to handle ambiguities and noise, adapt to dynamic environments, collaborate with other agents, and operate under temporal constraints. Unlike prior benchmarks, Gaia2 runs asynchronously, surfacing new failure modes that are invisible in static settings. Our experiments show that no system dominates across the intelligence spectrum: stronger reasoning often comes at the cost of efficiency, and budget scaling curves plateau, highlighting the need for new architectures and adaptive compute strategies. Perhaps more importantly, ARE abstractions enable continuous extension of Gaia2 to other environments, empowering the community to rapidly create new benchmarks tailored to their domains. In AI's second half, progress increasingly depends on defining meaningful tasks and robust evaluations to drive frontier capabilities forward.

large language model, machine learning, natural language, (24 more...)

2509.17158

Country: Europe > Germany (0.28)

Genre: Research Report (1.00)

Industry: Information Technology (1.00)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Reparameterized LLM Training via Orthogonal Equivalence Transformation

Qiu, Zeju, Buchholz, Simon, Xiao, Tim Z., Dax, Maximilian, Schölkopf, Bernhard, Liu, Weiyang

While large language models (LLMs) are driving the rapid advancement of artificial intelligence, effectively and reliably training these large models remains one of the field's most significant challenges. To address this challenge, we propose POET, a novel reParameterized training algorithm that uses Orthogonal Equivalence Transformation to optimize neurons. Specifically, POET reparameterizes each neuron with two learnable orthogonal matrices and a fixed random weight matrix. Because of its provable preservation of spectral properties of weight matrices, POET can stably optimize the objective function with improved generalization. We further develop efficient approximations that make POET flexible and scalable for training large-scale neural networks. Extensive experiments validate the effectiveness and scalability of POET in training LLMs.

large language model, machine learning, natural language, (20 more...)

2506.08001

Country: Asia (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

The GuardianDec-11-2025, 22:25:51 GMT

Disney wants you to AI-generate yourself into your favorite Marvel movie

Users of OpenAI's video generation app will soon be able to see their own faces alongside characters from Marvel, Pixar, Star Wars and Disney's animated films, according to a joint announcement from the startup and Disney on Thursday. Perhaps you, Lightning McQueen and Iron Man are all dancing together in the Mos Eisley Cantina. Sora is an app made by OpenAI, the firm behind ChatGPT, which allows users to generate videos of up to 20 seconds through short text prompts. Disney announced that it would invest $1bn in OpenAI and, under a three-year deal perhaps worth even more than that large sum, that it would license about 200 of its iconic characters - from R2-D2 to Stitch - for users to play with in OpenAI's video generation app. Examples of content generated by OpenAI's Sora with Disney properties.

disney, large language model, machine learning, (13 more...)

The Guardian

Country:

Europe > Ukraine (0.08)
Oceania > Australia (0.05)
North America > United States > Florida > Orange County (0.05)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

EngadgetDec-11-2025, 21:30:09 GMT

Disney's deal with OpenAI is about controlling the future of copyright

It's no accident the company picked a partner it could control. This morning Disney and OpenAI announced a three-year licensing agreement: Starting in 2026, ChatGPT and Sora can generate images and videos incorporating Disney IP, including more than 200 characters from the company's stable of Star Wars, Pixar and Marvel brands. To say these companies make for strange bedfellows is an understatement. Before OpenAI released Sora, the company reportedly notified studios and talent agencies they would need to opt out of having their work appear in the new app. The law effectively froze the advancement of the public domain in the United States, with Disney being the greatest beneficiary. On the face of it, it's unclear OpenAI is getting much value out of the deal.

disney, large language model, machine learning, (16 more...)

Engadget

Country: North America > United States (0.50)

Industry:

Leisure & Entertainment (1.00)
Law > Intellectual Property & Technology Law (0.71)
Media > Film (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

The Atlantic - TechnologyDec-11-2025, 21:07:02 GMT

I Am Time Magazine's Person of the Year

It's rude to boast, but here in 2025, you've got to take the wins where you can get them. This morning, magazine announced its Person of the Year, and it's me. If you want to get all technical about it, 's Person of the Year is not a person at all but a collection of people: the architects of AI. One of the two covers released is a re-creation of the "Lunch Atop a Skyscraper" photograph from 1932, which depicted blue-collar ironworkers suspended hundreds of feet in the air during the construction of 30 Rockefeller Plaza. In its image, replaces these laborers with tech personalities such as Mark Zuckerberg, Elon Musk, Sam Altman, and Jensen Huang.

artificial intelligence, large language model, natural language, (10 more...)

The Atlantic - Technology

Genre: Personal > Honors (0.86)

Industry:

Information Technology (0.74)
Law (0.72)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.32)

EngadgetDec-11-2025, 18:50:29 GMT

OpenAI releases GPT-5.2 to take on Google and Anthropic

OpenAI releases GPT-5.2 to take on Google and Anthropic The new model is all about professional work. OpenAI's code red response to Google's Gemini 3 Pro has arrived . On the same day the company announced a Sora licensing pact with Disney, it took the wraps off GPT-5.2 . OpenAI is touting the new model as its best yet for real-world, professional use. "It's better at creating spreadsheets, building presentations, writing code, perceiving images, understanding long contexts, using tools, and handling complex, multi-step projects," said OpenAI.

large language model, machine learning, natural language, (15 more...)

Engadget

Genre: Press Release (0.57)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

EngadgetDec-11-2025, 18:31:41 GMT

Lawsuit accuses ChatGPT of reinforcing delusions that led to a woman's death

Lawsuit accuses ChatGPT of reinforcing delusions that led to a woman's death Stein-Erik Soelberg killed his mother and took his own life back in August. OpenAI has been hit with a wrongful death lawsuit after a man back in August, . The suit names CEO Sam Altman and accuses ChatGPT of putting a target on the back of victim Suzanne Adams, an 83-year-old woman who was killed in her home. The victim's estate, 56-year-old Stein-Erik Soelberg, engaged in delusion-soaked conversations with ChatGPT in which the bot validated and magnified certain paranoid beliefs. The suit goes on to suggest that the chatbot eagerly accepted delusional thoughts leading up to the murder and egged him on every step of the way.

large language model, machine learning, natural language, (14 more...)

Engadget

Industry: Law > Litigation (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.31)