Goto

Collaborating Authors

 adr


ReflexFlow: Rethinking Learning Objective for Exposure Bias Alleviation in Flow Matching

Huang, Guanbo, Mao, Jingjia, Huang, Fanding, Liu, Fengkai, Luo, Xiangyang, Liang, Yaoyuan, Lu, Jiasheng, Wang, Xiaoe, Liu, Pei, Fu, Ruiliu, Huang, Shao-Lun

arXiv.org Artificial Intelligence

Despite tremendous recent progress, Flow Matching methods still suffer from exposure bias due to discrepancies in training and inference. This paper investigates the root causes of exposure bias in Flow Matching, including: (1) the model lacks generalization to biased inputs during training, and (2) insufficient low-frequency content captured during early denoising, leading to accumulated bias. Based on these insights, we propose ReflexFlow, a simple and effective reflexive refinement of the Flow Matching learning objective that dynamically corrects exposure bias. ReflexFlow consists of two components: (1) Anti-Drift Rectification (ADR), which reflexively adjusts prediction targets for biased inputs utilizing a redesigned loss under training-time scheduled sampling; and (2) Frequency Compensation (FC), which reflects on missing low-frequency components and compensates them by reweight-ing the loss using exposure bias. ReflexFlow is model-agnostic, compatible with all Flow Matching frameworks, and improves generation quality across datasets. Experiments on CIF AR-10, CelebA-64, and ImageNet-256 show that ReflexFlow outperforms prior approaches in mitigating exposure bias, achieving a 35.65% reduction in FID on CelebA-64.


Barbarians at the Gate: How AI is Upending Systems Research

Cheng, Audrey, Liu, Shu, Pan, Melissa, Li, Zhifei, Wang, Bowen, Krentsel, Alex, Xia, Tian, Cemri, Mert, Park, Jongseok, Yang, Shuo, Chen, Jeff, Agrawal, Lakshya, Desai, Aditya, Xing, Jiarong, Sen, Koushik, Zaharia, Matei, Stoica, Ion

arXiv.org Artificial Intelligence

Artificial Intelligence (AI) is starting to transform the research process as we know it by automating the discovery of new solutions. Given a task, the typical AI-driven approach is (i) to generate a set of diverse solutions, and then (ii) to verify these solutions and select one that solves the problem. Crucially, this approach assumes the existence of a reliable verifier, i.e., one that can accurately determine whether a solution solves the given problem. We argue that systems research, long focused on designing and evaluating new performance-oriented algorithms, is particularly well-suited for AI-driven solution discovery. This is because system performance problems naturally admit reliable verifiers: solutions are typically implemented in real systems or simulators, and verification reduces to running these software artifacts against predefined workloads and measuring performance. We term this approach as AI-Driven Research for Systems (ADRS), which iteratively generates, evaluates, and refines solutions. Using penEvolve, an existing open-source ADRS instance, we present case studies across diverse domains, including load balancing for multi-region cloud scheduling, Mixture-of-Experts inference, LLM-based SQL queries, and transaction scheduling. In multiple instances, ADRS discovers algorithms that outperform state-of-the-art human designs (e.g., achieving up to 5.0x runtime improvements or 50% cost reductions). We distill best practices for guiding algorithm evolution, from prompt design to evaluator construction, for existing frameworks. We then discuss the broader implications for the systems community: as AI assumes a central role in algorithm design, we argue that human researchers will increasingly focus on problem formulation and strategic guidance. Our results highlight both the disruptive potential and the urgent need to adapt systems research practices in the age of AI.


we introduce task selection based on prior experience into a meta-learning algorithm by conceptualizing the learner and

Neural Information Processing Systems

We highly appreciate the reviewers' time, efforts, and valuable suggestions! R3, R4 asked for further clarification on the differences between existing work and our approach. P AML and ACL can be seen as complimentary approaches, e.g., P AML might be used to R1 also mentions that only one of the environments is learned from pixel data. Lastly, we will add an analysis of the settings fully observed 4.1 and pixel-descriptor 4.4. With space constraints in mind and since our work's goal is to incorporate active ML approach used in this work in Section 2. Control signals.


Vision-based Perception System for Automated Delivery Robot-Pedestrians Interactions

Tushe, Ergi, Farooq, Bilal

arXiv.org Artificial Intelligence

The integration of Automated Delivery Robots (ADRs) into pedestrian-heavy urban spaces introduces unique challenges in terms of safe, efficient, and socially acceptable navigation. We develop the complete pipeline for a single vision sensor based multi-pedestrian detection and tracking, pose estimation, and monocular depth perception. Leveraging the real-world MOT17 dataset sequences, this study demonstrates how integrating human-pose estimation and depth cues enhances pedestrian trajectory prediction and identity maintenance, even under occlusions and dense crowds. Results show measurable improvements, including up to a 10% increase in identity preservation (IDF1), a 7% improvement in multiobject tracking accuracy (MOTA), and consistently high detection precision exceeding 85%, even in challenging scenarios. Notably, the system identifies vulnerable pedestrian groups supporting more socially aware and inclusive robot behaviour.


Deep Learning-Based Forecasting of Hotel KPIs: A Cross-City Analysis of Global Urban Markets

Atapattu, C. J., Cui, Xia, Abeynayake, N. R

arXiv.org Artificial Intelligence

This study employs Long Short-Term Memory (LSTM) networks to forecast key performance indicators (KPIs), Occupancy (OCC), Average Daily Rate (ADR), and Revenue per Available Room (RevPAR), across five major cities: Manchester, Amsterdam, Dubai, Bangkok, and Mumbai. The cities were selected for their diverse economic profiles and hospitality dynamics. Monthly data from 2018 to 2025 were used, with 80% for training and 20% for testing. Advanced time series decomposition and machine learning techniques enabled accurate forecasting and trend identification. Results show that Manchester and Mumbai exhibited the highest predictive accuracy, reflecting stable demand patterns, while Dubai and Bangkok demonstrated higher variability due to seasonal and event-driven influences. The findings validate the effectiveness of LSTM models for urban hospitality forecasting and provide a comparative framework for data-driven decision-making. The models generalisability across global cities highlights its potential utility for tourism stakeholders and urban planners.


Lessons from Defending Gemini Against Indirect Prompt Injections

Shi, Chongyang, Lin, Sharon, Song, Shuang, Hayes, Jamie, Shumailov, Ilia, Yona, Itay, Pluto, Juliette, Pappu, Aneesh, Choquette-Choo, Christopher A., Nasr, Milad, Sitawarin, Chawin, Gibson, Gena, Terzis, Andreas, Flynn, John "Four"

arXiv.org Artificial Intelligence

Gemini is increasingly used to perform tasks on behalf of users, where function-calling and tool-use capabilities enable the model to access user data. Some tools, however, require access to untrusted data introducing risk. Adversaries can embed malicious instructions in untrusted data which cause the model to deviate from the user's expectations and mishandle their data or permissions. In this report, we set out Google DeepMind's approach to evaluating the adversarial robustness of Gemini models and describe the main lessons learned from the process. We test how Gemini performs against a sophisticated adversary through an adversarial evaluation framework, which deploys a suite of adaptive attack techniques to run continuously against past, current, and future versions of Gemini. We describe how these ongoing evaluations directly help make Gemini more resilient against manipulation.


GPML: Graph Processing for Machine Learning

Jaber, Majed, Michel, Julien, Boutry, Nicolas, Parrend, Pierre

arXiv.org Artificial Intelligence

The dramatic increase of complex, multi-step, and rapidly evolving attacks in dynamic networks involves advanced cyber-threat detectors. The GPML (Graph Processing for Machine Learning) library addresses this need by transforming raw network traffic traces into graph representations, enabling advanced insights into network behaviors. The library provides tools to detect anomalies in interaction and community shifts in dynamic networks. GPML supports community and spectral metrics extraction, enhancing both real-time detection and historical forensics analysis. This library supports modern cybersecurity challenges with a robust, graph-based approach.


DRAFT-ing Architectural Design Decisions using LLMs

Dhar, Rudra, Kakran, Adyansh, Karan, Amey, Vaidhyanathan, Karthik, Varma, Vasudeva

arXiv.org Artificial Intelligence

Architectural Knowledge Management (AKM) is crucial for software development but remains challenging due to the lack of standardization and high manual effort. Architecture Decision Records (ADRs) provide a structured approach to capture Architecture Design Decisions (ADDs), but their adoption is limited due to the manual effort involved and insufficient tool support. Our previous work has shown that Large Language Models (LLMs) can assist in generating ADDs. However, simply prompting the LLM does not produce quality ADDs. Moreover, using third-party LLMs raises privacy concerns, while self-hosting them poses resource challenges. To this end, we experimented with different approaches like few-shot, retrieval-augmented generation (RAG) and fine-tuning to enhance LLM's ability to generate ADDs. Our results show that both techniques improve effectiveness. Building on this, we propose Domain Specific Retreival Augumented Few Shot Fine Tuninng, DRAFT, which combines the strengths of all these three approaches for more effective ADD generation. DRAFT operates in two phases: an offline phase that fine-tunes an LLM on generating ADDs augmented with retrieved examples and an online phase that generates ADDs by leveraging retrieved ADRs and the fine-tuned model. We evaluated DRAFT against existing approaches on a dataset of 4,911 ADRs and various LLMs and analyzed them using automated metrics and human evaluations. Results show DRAFT outperforms all other approaches in effectiveness while maintaining efficiency. Our findings indicate that DRAFT can aid architects in drafting ADDs while addressing privacy and resource constraints.


Lived Experience Not Found: LLMs Struggle to Align with Experts on Addressing Adverse Drug Reactions from Psychiatric Medication Use

Chandra, Mohit, Sriraman, Siddharth, Verma, Gaurav, Khanuja, Harneet Singh, Campayo, Jose Suarez, Li, Zihang, Birnbaum, Michael L., De Choudhury, Munmun

arXiv.org Artificial Intelligence

Adverse Drug Reactions (ADRs) from psychiatric medications are the leading cause of hospitalizations among mental health patients. With healthcare systems and online communities facing limitations in resolving ADR-related issues, Large Language Models (LLMs) have the potential to fill this gap. Despite the increasing capabilities of LLMs, past research has not explored their capabilities in detecting ADRs related to psychiatric medications or in providing effective harm reduction strategies. To address this, we introduce the Psych-ADR benchmark and the Adverse Drug Reaction Response Assessment (ADRA) framework to systematically evaluate LLM performance in detecting ADR expressions and delivering expert-aligned mitigation strategies. Our analyses show that LLMs struggle with understanding the nuances of ADRs and differentiating between types of ADRs. While LLMs align with experts in terms of expressed emotions and tone of the text, their responses are more complex, harder to read, and only 70.86% aligned with expert strategies. Furthermore, they provide less actionable advice by a margin of 12.32% on average. Our work provides a comprehensive benchmark and evaluation framework for assessing LLMs in strategy-driven tasks within high-risk domains.


A Coalition Game for On-demand Multi-modal 3D Automated Delivery System

Moosavi, Farzan, Farooq, Bilal

arXiv.org Artificial Intelligence

We introduce a multi-modal autonomous delivery optimization framework as a coalition game for a fleet of UAVs and ADRs operating in two overlaying networks to address last-mile delivery in urban environments, including high-density areas, road-based routing, and real-world operational challenges. The problem is defined as multiple depot pickup and delivery with time windows constrained over operational restrictions, such as vehicle battery limitation, precedence time window, and building obstruction. Subsequently, the coalition game theory is applied to investigate cooperation structures among the modes to capture how strategic collaboration among vehicles can improve overall routing efficiency. To do so, a generalized reinforcement learning model is designed to evaluate the cost-sharing and allocation to different coalitions for which sub-additive property and non-empty core exist. Our methodology leverages an end-to-end deep multi-agent policy gradient method augmented by a novel spatio-temporal adjacency neighbourhood graph attention network and transformer architecture using a heterogeneous edge-enhanced attention model. Conducting several numerical experiments on last-mile delivery applications, the result from the case study in the city of Mississauga shows that despite the incorporation of an extensive network in the graph for two modes and a complex training structure, the model addresses realistic operational constraints and achieves high-quality solutions compared with the existing transformer-based and heuristics methods and can perform well on non-homogeneous data distribution, generalizes well on the different scale and configuration, and demonstrate a robust performance under stochastic scenarios subject to wind speed and direction.