Goto

Collaborating Authors

 baggage


SABER: Small Actions, Big Errors -- Safeguarding Mutating Steps in LLM Agents

Cuadron, Alejandro, Yu, Pengfei, Liu, Yang, Gupta, Arpit

arXiv.org Artificial Intelligence

Despite rapid progress in LLM agents, performance on long-horizon, tool-using tasks remains fragile. To better understand this fragility, we ask a simple question: \emph{do all actions contribute equally to failure?} Analyzing execution traces on $τ$-Bench (Airline/Retail) and SWE-Bench Verified, we decompose trajectories into \emph{mutating} (environment-changing) vs.\ non-mutating steps and formalize \emph{decisive deviations}, earliest action, level divergences that flip success to failure. A logistic regression reveals that each additional deviation in a mutating action reduces the odds of success by upto $92\%$ on Airline and upto $96\%$ on Retail for SoTA models. In contrast, deviations in non-mutating actions have little to no effect. Errors also grow with context length as agents drift from role and act on stale constraints. Motivated by these observations, we introduce \cm{}, a model-agnostic, gradient-free, test-time safeguard that (i) adds mutation-gated verification, (ii) injects \emph{Targeted Reflection} before mutating steps, and (iii) performs block-based context cleaning. \cm{} delivers consistent gains, e.g., Qwen3-Thinking: +28\% \emph{relative} on Airline, +11\% on Retail, and +7\% on SWE-Bench Verified; Claude: +9\%/+7\%. We further identify ceiling effects in $τ$-Bench, where annotation errors and underspecified tasks artificially cap model performance. To address this, we release $τ$-Bench Verified, which restores benchmark headroom through targeted revisions. Our results argue for action-level analysis, targeted safeguards, and reliable evaluations as prerequisites for robust multi-turn agents.


How to scale up your AI initiatives

#artificialintelligence

Most business executives believe they need to harness artificial intelligence (AI) to stay ahead of the pack and grow their business, but they often fail to scale up their AI initiatives across their organisations, according to an Accenture expert. Citing a global study by Accenture, Lee Joon Seong, managing director for applied intelligence in ASEAN at the consulting firm, noted that while 88% of global executives believed they needed AI for their business to survive, the same proportion also struggled to scale AI initiatives beyond the pilot stage. "A lot of people understand the potential of AI and have embarked on AI initiatives, but not many have fully realised their full potential," said Lee. The ability to scale is seen as a barometer of success in AI adoption, given the time, talent and resources involved in AI projects. Here's what organisations can do to scale up their AI initiatives: Earmarking AI as part of your business strategy sounds obvious, but many organisations still struggle to get that right.


How Artificial Intelligence is Reducing Baggage Mishandling

#artificialintelligence

Modern airports are using artificial intelligence (AI) to avoid mishandling baggage. The idea is to use AI for end-to-end tracking of baggage and planning optimized luggage routes, right from the time a passenger gets on-board, till she collects the baggage at the destination. Nearly all sectors of the economy - education, healthcare, finance, travel, and even the public sector - are applying technology to optimize their functioning. Technologies like AI are digitizing areas that were once thought capable of manual operations alone. Airports and airline companies are exploring the benefits of integrating technology into their operations and services.


Precision and Fitness in Object-Centric Process Mining

Adams, Jan Niklas, van der Aalst, Wil M. P.

arXiv.org Artificial Intelligence

Traditional process mining considers only one single case notion and discovers and analyzes models based on this. However, a single case notion is often not a realistic assumption in practice. Multiple case notions might interact and influence each other in a process. Object-centric process mining introduces the techniques and concepts to handle multiple case notions. So far, such event logs have been standardized and novel process model discovery techniques were proposed. However, notions for evaluating the quality of a model are missing. These are necessary to enable future research on improving object-centric discovery and providing an objective evaluation of model quality. In this paper, we introduce a notion for the precision and fitness of an object-centric Petri net with respect to an object-centric event log. We give a formal definition and accompany this with an example. Furthermore, we provide an algorithm to calculate these quality measures. We discuss our precision and fitness notion based on an event log with different models. Our precision and fitness notions are an appropriate way to generalize quality measures to the object-centric setting since we are able to consider multiple case notions, their dependencies and their interactions.


How AI and data analytics are transforming aviation

#artificialintelligence

Airlines and airports are now embracing new technologies and turning to artificial intelligence (AI) to support their customer service. Technology is drastically changing the way businesses connect with their customers, and the world of aviation is part of the change too. Data and the way it is used is transforming airlines from pre-flight to post-flight operations, including ticket purchase, seat selection, luggage, boarding and ground transportation. The data required is captured along the various components of a passenger's journey, allowing organisations to take informed steps towards operational efficiency and improved customer experience. Airlines and airports are now embracing new technologies and turning to artificial intelligence (AI) to support their customer service.


Here's how artificial intelligence and IoT can improve travel experience

#artificialintelligence

In interaction with Media, Aamir Junaid Ahmad, CEO - BusAndTicket, a technocrat himself, shared how the company is planning to use technology for improving bus ticket booking and travel experience for its customers. With the AI advancement in the world, we will soon implement new ways the technology can improve customer experience. AI and IoT together will give more personalised ticket booking experience and will help users find the best deals and recommendations to fulfill their travel plans with ease. The more they use the service, the more information will be available to further customize the search results. BusAndTicket.com is coming up for the first time with the concept of dynamic pricing in bus ticket booking using the Analytical Benefits of AI.


Artificial Intelligence and IoT can improve travel experience

#artificialintelligence

In interaction with Media, Aamir Junaid Ahmad, CEO - BusAndTicket, a technocrat himself, shared how the company is planning to use technology for improving bus ticket booking and travel experience for its customers. With the AI advancement in the world, we will soon implement new ways the technology can improve customer experience. AI and IoT together will give more personalized ticket booking experience and will help users find the best deals and recommendations to fulfill their travel plans with ease. The more they use the service, the more information will be available to further customize the search results. BusAndTicket.com is coming up for the first time with the concept of dynamic pricing in bus ticket booking using the Analytical Benefits of AI.


Trusted data will determine the future of baggage handling SITA

#artificialintelligence

IATA sees RFID (radio frequency identification) as one of the keys to transforming the baggage handling process. SITA worked with IATA back in 2017 on a detailed business case, estimating that RFID could reduce the number of mishandled bags by an extra 25% and could potentially save the air transport industry $3 billion in baggage mishandling costs. Airlines and airports are now proactively working together to boost their baggage handling efforts as part of IATA's Resolution 753, which requires airlines to "maintain an accurate inventory of baggage by monitoring the acquisition and delivery of baggage". RFID tagging is now 99.98% accurate, according to IATA. Within the next four years most baggage systems will be RFID enabled, which is a huge improvement on barcodes alone.


How AI Transforms CCTV into a 24/7 Virtual Guard

#artificialintelligence

Security operators tasked with monitoring multiple camera feeds have a difficult job. It's virtually impossible for them to give their full attention to more than one camera at any given moment. Threat scenarios vary so operators must also be able to react swiftly and appropriately. False alarms could be costly and missed detections could be deadly, so control rooms are operating under tremendous pressure, which in turn increases the risk of errors. That's why the United Kingdom's Centre for the Protection of National Infrastructure (CPNI) recommends a 20 minute shift for CCTV control room operators because of vigilance decrement. Vigilance decrement is characterized not just by attention decreases but also by an increase in oversight for tasks requiring intense sustained attention.


How Artificial Intelligence Is Transforming the Travel Industry

#artificialintelligence

Artificial intelligence and Machine Learning are gradually empowering businesses by offering innovative ideas of performing various tasks with ease, accuracy and without human intervention. There is a broad scope of opportunities for AI and ML that can magically transform the world by improving customer service experience and minimizing efforts. Furthermore, the travel industry has always been ahead in adapting to technological advancements. Even the travelers have shown equal enthusiasm to adopt these technology advancements as it makes their travel experience simpler and enjoyable. Before we further discuss the impacts of AI on the travel industry, let's first understand what AI is and how does it function.