Large Language Model
Slot-VLM: Object-Event Slots for Video-Language Modeling
Video-Language Models (VLMs), powered by the advancements in Large Language Models (LLMs), are charting new frontiers in video understanding. A pivotal challenge is the development of an effective method to encapsulate video content into a set of representative tokens to align with LLMs. In this work, we introduce Slot-VLM, a new framework designed to generate semantically decomposed video tokens, in terms of object-wise and event-wise visual representations, to facilitate LLM inference.
AutoManual: Constructing Instruction Manuals by LLM Agents via Interactive Environmental Learning
Large Language Models (LLM) based agents have shown promise in autonomously completing tasks across various domains, e.g., robotics, games, and web navigation. However, these agents typically require elaborate design and expert prompts to solve tasks in specific domains, which limits their adaptability. We introduce AutoManual, a framework enabling LLM agents to autonomously build their understanding through interaction and adapt to new environments. AutoManual categorizes environmental knowledge into diverse rules and optimizes them in an online fashion by two agents: 1) The Planner codes actionable plans based on current rules for interacting with the environment.
Decompose, Analyze and Rethink: Solving Intricate Problems with Human-like Reasoning Cycle
In this paper, we introduce DeAR (), a framework that iteratively builds a reasoning tree to tackle intricate problems within a single large language model (LLM). Unlike approaches that extend or search for rationales, DeAR is featured by 1) adopting a tree-based question decomposition manner to plan the organization of rationales, which mimics the logical planning inherentin human cognition; 2) globally updating the rationales at each reasoning step through natural language feedback. Specifically, the stage decomposes the question into simpler sub-questions, storing them as new nodes; the stage generates and self-checks rationales for sub-questions at each node evel; and the stage updates parent-node rationales based on feedback from their child nodes. By generating and updating the reasoning process from a more global perspective, DeAR constructs more adaptive and accurate logical structures for complex problems, facilitating timely error correction compared to rationale-extension and search-based approaches such as Tree-of-Thoughts (ToT) and Graph-of-Thoughts (GoT). We conduct extensive experiments on three reasoning benchmarks, including ScienceQA, StrategyQA, and GSM8K, which cover a variety of reasoning tasks, demonstrating that our approach significantly reduces logical errors and enhances performance across various LLMs. Furthermore, we validate that DeAR is an efficient method that achieves a superior trade-off between accuracy and reasoning time compared to ToT and GoT.
Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs
The recent development of chain-of-thought (CoT) decoding has enabled large language models (LLMs) to generate explicit logical reasoning paths for complex problem-solving. However, research indicates that these paths are not always deliberate and optimal. The tree-of-thought (ToT) method employs tree-searching to extensively explore the reasoning space and find better reasoning paths that CoT decoding might overlook. This deliberation, however, comes at the cost of significantly increased inference complexity. In this work, we demonstrate that fine-tuning LLMs leveraging the search tree constructed by ToT allows CoT to achieve similar or better performance, thereby avoiding the substantial inference burden. This is achieved through \emph{Chain of Preference Optimization} (CPO), where LLMs are fine-tuned to align each step of the CoT reasoning paths with those of ToT using the inherent preference information in the tree-search process. Extensive experimental results show that CPO significantly improves LLM performance in solving a variety of complex problems, including question answering, fact verification, and arithmetic reasoning, demonstrating its effectiveness. Our code is available at https://github.com/sail-sg/CPO .
GITA: Graph to Visual and Textual Integration for Vision-Language Graph Reasoning
Large Language Models (LLMs) are increasingly used for various tasks with graph structures. Though LLMs can process graph information in a textual format, they overlook the rich vision modality, which is an intuitive way for humans to comprehend structural information and conduct general graph reasoning. The potential benefits and capabilities of representing graph structures as visual images (i.e., $\textit{visual graph}$) are still unexplored.
GPT-5.4 mini brings some of the smarts of OpenAI's latest model to ChatGPT Free and Go users
GPT-5.4 mini brings some of the smarts of OpenAI's latest model to ChatGPT Free and Go users The new model offers performance improvements in reasoning, multimodal understanding and more. The ChatGPT icon, as seen on iPhone 12 running iOS. When OpenAI released GPT-5.4 at the start of March, the company said the new model was designed primarily for professional work like programming and data analysis. Now OpenAI is launching GPT-5.4 mini and nano, and while it is once again highlighting the usefulness of these new systems for tasks like coding, one of the new models is available to Free and Go users . What's more, that model, GPT-5.4 mini, even offers performance that approaches GPT-5.4 in a handful of areas.
The Human Skill That Eludes AI
Why can't language models write well? I n a certain, strange way, generative AI peaked with OpenAI's GPT-2 seven years ago. Little known to anyone outside of tech circles, GPT-2 excelled at producing unexpected answers. "You could be like, 'Continue this story:,' and GPT-2 would be like, ','" Katy Gero, a poet and computer scientist who has been experimenting with language models since 2017, told me. "The models won't do that anymore." AI leaders boast about their models' superhuman technical abilities.
The Download: OpenAI's US military deal, and Grok's CSAM lawsuit
Plus: China has approved the world's first commercial brain chip. Where OpenAI's technology could show up in Iran OpenAI has controversially agreed to give the Pentagon access to its AI. But where exactly could its tech show up, and which applications will its customers and employees tolerate? There's pressure to integrate it quickly with existing military tools. One defense official revealed it could even assist in selecting strike targets. OpenAI's partnership with Anduril, which makes drones and counter-drone technologies, adds another hint at what is to come.
AI Confessions: A Chatbot Ended My Marriage
Your stories about how AI is impacting your mental health, decision-making, and relationships. Please enable javascript to get your Slate Plus feeds. If you can't access your feeds, please contact customer support. Check your phone for a link to finish setting up your feed. Please enter a valid phone number.
UK must learn lessons from AI race and retain its quantum computing talent, says minister
In quantum computers, the information is contained in qubits that can work through vast numbers of different outcomes, which is not possible with classical computers. In quantum computers, the information is contained in qubits that can work through vast numbers of different outcomes, which is not possible with classical computers. The UK will not let quantum computing talent slip through its fingers and must learn lessons from US dominance of the AI race, the technology secretary has said, as the government announced a £1bn quantum funding pledge. Liz Kendall said the government hoped to retain homegrown quantum startups, engineers and researchers rather than lose them to competing countries, with the US stealing a march on its western rivals in AI. "I do look at what's happened on AI," said Kendall. "I do think we need to learn the lessons and make sure we give our brilliant scientists, spinouts and startups the ability to stay here and make it happen. And that requires a government that is bold and ambitious and confident in these technologies of the future."