captain
Shoot First, Ask Questions Later? Building Rational Agents that Explore and Act Like People
Grand, Gabriel, Pepe, Valerio, Andreas, Jacob, Tenenbaum, Joshua B.
Many high-stakes applications of AI require forming data-driven hypotheses and making targeted guesses; e.g., in scientific and diagnostic settings. Given limited resources, to what extent do agents based on language models (LMs) act rationally? We develop methods to benchmark and enhance agentic information-seeking, drawing on insights from human behavior. First, we introduce a strategic decision-oriented dialogue task called Collaborative Battleship, in which a partially-informed Captain must balance exploration (asking questions) and action (taking shots), while a fully-informed Spotter must provide accurate answers under an information bottleneck. Compared to human players (N=42), we find that LM agents struggle to ground answers in context, generate informative questions, and select high-value actions. Next, to address these gaps, we develop novel Monte Carlo inference strategies for LMs based on principles from Bayesian Experimental Design (BED). For Spotter agents, our approach boosts accuracy by up to 14.7% absolute over LM-only baselines; for Captain agents, it raises expected information gain (EIG) by up to 0.227 bits (94.2% of the achievable noise ceiling). Combined, these components yield sharper targeting (+0.303-0.374 F1), and enable weaker LMs, such as Llama-4-Scout, to outperform both humans (8% -> 82% win rate) and frontier models (0% -> 67% win rate vs. GPT-5) at ~1% of GPT-5's cost. We replicate these findings on Guess Who? where our methods significantly boost accuracy (+28.3-42.4 p.p.), demonstrating their general applicability for building rational information-seeking agents.
- North America > United States (0.14)
- Oceania > Australia > Victoria > Melbourne (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Leisure & Entertainment > Games (1.00)
- Government > Military > Navy (0.35)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Captain of tanker linked to Russian 'shadow fleet' charged in France
Captain of tanker linked to Russian'shadow fleet' charged in France The captain of an oil tanker believed to be part of Russia's shadow fleet of vessels used to evade sanctions has been charged by French authorities. The Chinese national was handed one count of refusing to follow instructions from the French navy and told to attend a court hearing in the northern coastal city of Brest next February. The Boracay left Russia last month and was off the coast of Denmark when unidentified drones forced the temporary closure of several airports last week. The tanker was earlier boarded by French soldiers because it was on a list of vessels subject to EU sanctions for carrying Russian oil exports. Russian President Vladimir Putin called France's actions piracy.
- Europe > France (1.00)
- Asia > Russia (1.00)
- South America (0.15)
- (29 more...)
- Government > Regional Government > Europe Government > France Government (0.75)
- Government > Regional Government > Europe Government > Russia Government (0.70)
- Government > Regional Government > Asia Government > Russia Government (0.70)
CAPTAIN at COLIEE 2023: Efficient Methods for Legal Information Retrieval and Entailment Tasks
Nguyen, Chau, Nguyen, Phuong, Tran, Thanh, Nguyen, Dat, Trieu, An, Pham, Tin, Dang, Anh, Nguyen, Le-Minh
The Competition on Legal Information Extraction/Entailment (COLIEE) is held annually to encourage advancements in the automatic processing of legal texts. Processing legal documents is challenging due to the intricate structure and meaning of legal language. In this paper, we outline our strategies for tackling Task 2, Task 3, and Task 4 in the COLIEE 2023 competition. Our approach involved utilizing appropriate state-of-the-art deep learning methods, designing methods based on domain characteristics observation, and applying meticulous engineering practices and methodologies to the competition. As a result, our performance in these tasks has been outstanding, with first places in Task 2 and Task 3, and promising results in Task 4. Our source code is available at https://github.com/Nguyen2015/CAPTAIN-COLIEE2023/tree/coliee2023.
Autothrottle: A Practical Bi-Level Approach to Resource Management for SLO-Targeted Microservices
Wang, Zibo, Li, Pinghe, Liang, Chieh-Jan Mike, Wu, Feng, Yan, Francis Y.
Achieving resource efficiency while preserving end-user experience is non-trivial for cloud application operators. As cloud applications progressively adopt microservices, resource managers are faced with two distinct levels of system behavior: end-to-end application latency and per-service resource usage. Translating between the two levels, however, is challenging because user requests traverse heterogeneous services that collectively (but unevenly) contribute to the end-to-end latency. We present Autothrottle, a bi-level resource management framework for microservices with latency SLOs (service-level objectives). It architecturally decouples application SLO feedback from service resource control, and bridges them through the notion of performance targets. Specifically, an application-wide learning-based controller is employed to periodically set performance targets -- expressed as CPU throttle ratios -- for per-service heuristic controllers to attain. We evaluate Autothrottle on three microservice applications, with workload traces from production scenarios. Results show superior CPU savings, up to 26.21% over the best-performing baseline and up to 93.84% over all baselines.
- North America > United States > California > Santa Clara County > Santa Clara (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Asia > China (0.04)
- Information Technology > Cloud Computing (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.93)
- Information Technology > Communications > Social Media (0.71)
- (2 more...)
Exploring the Intersection of Large Language Models and Agent-Based Modeling via Prompt Engineering
The final frontier for simulation is the accurate representation of complex, real-world social systems. While agent-based modeling (ABM) seeks to study the behavior and interactions of agents within a larger system, it is unable to faithfully capture the full complexity of human-driven behavior. Large language models (LLMs), like ChatGPT, have emerged as a potential solution to this bottleneck by enabling researchers to explore human-driven interactions in previously unimaginable ways. Our research investigates simulations of human interactions using LLMs. Through prompt engineering, inspired by Park et al. (2023), we present two simulations of believable proxies of human behavior: a two-agent negotiation and a six-agent murder mystery game.
- North America > United States > California > Alameda County > Berkeley (0.04)
- Asia > China > Hong Kong (0.04)
Elon Musk weighs in on allegations of ChatGPT's liberal bias with viral meme: 'Captain of propaganda'
Fox News correspondent Mark Meredith has the latest on ChatGPT on'Special Report.' Billionaire Elon Musk took another swing at artificial intelligence service ChatGPT and the mainstream media on Thursday with a viral meme that accumulated over 254,000 likes on Twitter. Musk has emerged as a major critic of ChatGPT amid accusations that the artificial intelligence (AI) bot engages in liberal bias. The Tesla CEO and owner of Twitter shared a meme with the caption, "ChatGPT to the mainstream media." "Look at me," the meme read.
- North America > United States > New York (0.07)
- North America > United States > Iowa (0.06)
- Asia > South Korea (0.06)
- Asia > Middle East > UAE > Dubai Emirate > Dubai (0.06)
Afternoon Update: Labor releases plan to cut industrial emissions; Melbourne Victory fined; and Prince Harry's book reviewed
The Albanese government has released its plan to revamp the safeguard mechanism – a Coalition policy that promised to reduce emissions from our biggest industrial polluters but actually resulted in the opposite. Labor has proposed a policy makeover. The government's plan will require big polluters to cut emissions by 5% a year until 2030, but will controversially allow them to continue buying carbon offsets from companies that pollute less. How the government regulates the safeguard mechanism is a big deal, given the polluting facilities included in the policy are responsible for 28% of the nation's emissions. If Australia is to meet its 43% emissions reduction target by 2030, this policy has to work.
- Asia > Thailand (0.06)
- South America > Brazil > Federal District > Brasília (0.05)
- Oceania > Australia > Australian Capital Territory > Canberra (0.05)
- (4 more...)
AI and the super app: An interview with Careem's Selim Turki
QuantumBlack, AI by McKinsey recently sat down with Selim Turki, head of data and AI at Careem, to discuss the latest trends in advanced analytics and artificial intelligence. Far from a dry discussion of theory, the conversation coalesced around several fascinating use cases in which Careem is using AI to make a difference in people's lives. We discussed how AI is being leveraged to improve customer and driver security through targeted facial-recognition checks to ensure drivers (captains) are who they say they are. We also discussed how AI is being used to provide customers with the most accurate and up-to-date estimated times of arrival (ETAs) by factoring in a host of conditions, including local weather conditions, prayer times, and even iftar times during Ramadan. Along the way, we discussed what it means to be an "AI first" company and the outlook for AI tech--and talent--in the region.
- Asia > Middle East > UAE (0.29)
- Africa > Middle East (0.18)
- Europe > Middle East (0.16)
- (2 more...)
The best PC games for 2022
So how do you categorize a beast like gaming on the PC? With decades of titles to pluck from (and the first port of call for most indie titles, too), there's so much to choose from. Gaming on your PC adds the benefits of (nearly always flawless) backward compatibility and console-beating graphical performance -- if you've got the coin for it. The whole idea of what a PC is and where you can play it is shifting, too, with the rise of handheld "consolized" PCs like the Steam Deck. We've tried to be broad with our recommendations here on purpose – there are so many great games out there for your PC, consider these some starting points.
- Transportation (1.00)
- Leisure & Entertainment > Games > Computer Games (1.00)
Using AI, Mayflower Autonomous Ship concludes trans-Atlantic journey - IT-Online
In a voyage lasting 40 days and conquering approximately 3 500 unmanned miles at sea, the Mayflower Autonomous Ship arrived in North America in Halifax, Nova Scotia on June 5, 2022. Following two years of design, construction and AI model training, the Mayflower Autonomous Ship (MAS) was officially launched in September 2020. Fast forward to 5 June 2022, and the ship completed an historic transatlantic voyage from Plymouth, UK to its North American arrival in Halifax, Nova Scotia. With no human captain or onboard crew, MAS is the first self-directed autonomous ship with technology that is scalable and extendible to traverse the Atlantic Ocean. MAS was designed and built by marine research non-profit ProMare with IBM acting as lead technology and science partner, with IBM automation, AI and edge computing technologies powering the ship's artificial intelligence (AI) captain to guide the vessel and make real-time decisions while at sea.
- North America > Canada > Nova Scotia > Halifax Regional Municipality > Halifax (0.50)
- Europe > United Kingdom > England > Devon > Plymouth (0.27)
- Atlantic Ocean (0.27)