Goto

Collaborating Authors

 Large Language Model


ScheduleFree+: Scaling Learning-Rate-Free & Schedule-Free Learning to Large Language Models

arXiv.org Machine Learning

Schedule-Free Learning has shown promise as a practical anytime training method for machine learning, showing success across dozens of standard benchmark problems. However, strong performance for LLM training has only been demonstrated at small scales. We identify a number of fixes necessary to scale up Schedule-Free Learning to larger batch sizes and model sizes, and present a learning-rate-free and schedule-free method (ScheduleFree+) for training large language models which greatly outperforms Warmup-Stable-Decay (WSD) schedules. We also demonstrate that Schedule-Free Learning is most effective for long duration training, and at 1000 tokens per parameter, it outperforms SOTA schedules by 31%. Schedule-Free Learning provides a theoretical foundation for the use of model averaging and checkpoint merging during pretraining.


HalluWorld: A Controlled Benchmark for Hallucination via Reference World Models

arXiv.org Machine Learning

Hallucination remains a central failure mode of large language models, but existing benchmarks operationalize it inconsistently across tasks such as summarization, question answering, retrieval-augmented generation, and agentic interaction. This fragmentation makes it unclear whether a mitigation that works in one setting actually reduces hallucinations across contexts. Current hallucination benchmarks either require human annotation and fixed references that may eventually be memorized, or rely on naturalistic observations often recorded in settings that are difficult to reproduce or test systematically. To enable further research on the root causes of hallucination, we introduce HALLUWORLD, an extensible benchmark framework grounded in an explicit reference-world formulation: a model hallucinates when it produces an observable claim that is false with respect to this reference world. Building on this view, we construct a family of synthetic and semi-synthetic benchmark environments in which the reference world is fully specified, the model's observable view is controlled, and hallucination labels can be generated automatically by construction. HALLUWORLD spans multiple settings that are classically representative for AI, i.e., gridworlds, chess, and realistic terminal tasks. This enables controlled variation of key factors such as world complexity, observability, temporal change, and source-conflict policy, allowing us to disentangle hallucinations into more fine-grained error categories. We evaluate frontier and open-weight language models across these settings and find consistent patterns across domains: perceptual hallucination on directly observed information is near-solved for frontier models, while multi-step state tracking and causal forward simulation are still difficult for frontier models, and are not generally solved by extended thinking.


Google's Android XR smart glasses hope to succeed where AI-first wearables have failed

Popular Science

Gear Wearables Google's Android XR smart glasses hope to succeed where AI-first wearables have failed The audio-only frames pair with Android and iOS so a Gemini agent can run errands on your phone while you stay heads-up. More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. We may earn revenue from the products available on this page and participate in affiliate programs. Google put AI on people's faces more than a decade ago with its Google Glass wearable. It was designed to put a computer directly on your face, but the world (and to some extent, the hardware) wasn't quite ready for that yet.


Everything Announced at Google I/O 2026: Gemini, Search, Smart Glasses

WIRED

Google is sprucing up its Gemini models, revamping search, and enabling AI agents in everything. There are also some spiffy new smart glasses coming this fall. Google just wrapped its keynote address at its annual I/O developer event . The company showed off a swath of new agentic AI features and some demos of its upcoming Android-powered smart glasses. As it has in the past few years, the spectacle largely revolved around Google's perpetual stream of AI efforts.


Google's Gemini Spark is an agentic AI assistant

Engadget

Google's Gemini Spark is an agentic AI assistant Google's Gemini Spark is an agentic AI assistant The AI agent is rolling out to testers this week. Google has announced a 24/7 personal AI agent called Gemini Spark at this year's I/O developer conference. The company says Spark transforms Gemini from a standard AI assistant to an active partner that actually perform tasks for you. Spark is powered by Gemini 3.5 and is deeply integrated with Google Workspace apps, including Gmail, Docs and Slides. You can teach it to perform various tasks, such as creating a list of critical deadlines in your Gmail and sending it to you, or writing up a summary of ongoing updates in lengthy email threads.


Google's Gemini Omni can generate 'anything from any input,' starting with video

Engadget

Google's Gemini Omni can generate'anything from any input,' starting with video Google's Gemini Omni can generate'anything from any input,' starting with video Google didn't forget AI creators in its latest round of Gemini announcements. Google didn't forget AI creators in its latest round of Gemini announcements as part of Google I/O . The company just officially revealed Gemini Omni, a new model that can create anything from any input -- starting with video, according to Google. The first model called Gemini Omni Flash is rolling out today to the Gemini app, Google Flow and YouTube Shorts. Google called Gemini Omni the next step up from Nano Banana and, presumably, its current video generator, Veo 3.1 .


Google says Gemini 3.5 Flash rivals 'large flagship models' for coding and agentic tasks

Engadget

Google says Gemini 3.5 Flash rivals'large flagship models' for coding and agentic tasks Google says Gemini 3.5 Flash rivals'large flagship models' for coding and agentic tasks It can complete tasks in a fraction of the time of other frontier models, Google claims. Google has unveiled Gemini 3.5, starting with the Gemini 3.5 Flash model that promises to outperform Gemini 3.1 Pro in real-world agentic and coding tasks. Announced at Google I/O 2026, this will be Google's default AI model (not to be confused with Flash-Lite), designed to deliver better speed than the current Gemini Pro models at a more affordable price. The tradeoff is lower performance than the 3.5 Pro model (coming next month) in tasks that require deep reasoning and high-context understanding. However, Google has reduced the compromise between the Pro and Flash models, saying Gemini 3.5 Flash delivers intelligence that rivals large flagship models on multiple dimensions.


Google's Response to OpenClaw's 24/7 AI Agent

WIRED

Google's always-running, data-hungry AI agent is designed to spend your money and send your emails. Gemini Spark is Google's take on a steroided-out assistant agent that knows everything about you, announced as part of the company's updates to its Gemini chatbot app at this year's I/O developer conference . Software companies have been talking up AI agents for some time now, but I wasn't impressed until I tried Anthropic's Claude Cowork in January. I sat back as the bot organized the scattered screenshots littering my desktop into labeled folders without a single click, and felt convinced that this might be a turning point for how people interact with their computers. Many other early adopters in San Francisco experienced similar moments when they set up the mega-viral OpenClaw bot earlier this year, not just to help complete a few tasks but to run their whole online lives.


Google Search Goes Agentic--and Doesn't Need You Anymore

WIRED

Instead of clicking on a bunch of random website links, I was reading an AI summary positioned at the top of my search results and sometimes clicking through to double-check the accuracy of the output. The next evolution of Search that Google is building asks for even less active participation from users. You're really the most involved at the start of the journey, and that's it. You tell the agents what you want to know, and they do the clicking and even calling on your behalf. Rather than you going off on some online adventure, it's the agent that's hoovering up anything it can find and bouncing between different sites.


Demis Hassabis Thinks AI Job Cuts Are Dumb

WIRED

The CEO of Google DeepMind tells WIRED that companies should use the productivity gains of AI to do more, not lay people off. Demis Hassabis, the CEO of Google DeepMind, is keen to talk about the coding skills of his company's newest model, Gemini 3.5 Flash. The model has been trained to perform complex agentic coding tasks: translate large code bases from one language to another; find and fix bugs lurking deep in knotty code; and even write entire operating systems from scratch. Hassabis does not, however, think this spells doom for software developers. "I have no idea why people are going around talking with certainty about that," Hassabis tells WIRED ahead of the new model reveal at today's Google's I/O event .