discard
UnfoldML_Nuerips
Algorithm 1 Hard-gating Algorithm for In-Stage IDKCascade Input Ds: Training data containing Ns samples in stage-s Ms: Sorted list of the models trained for stage-s C: Dictionary of models' spatio-temporal costs cs: User-defined budget of spatio-temporal cost for stage-s q: Confidence function maxA: Value for the upper bound of the cutoffs to avoid over-fitting nBins: Number of bins for the grid search Output s: The optimal IDK cutoff vector for stage-s 1: procedure HARDGATING(Ds, Ms, cs, C, q, maxA, nBins) 2: s =[], ModelAssign = 1, cost = P We use the Sepsis-3 toolkit3 to obtain the suspected infection time in patients, and following the process in Seymour et al. (2016) to finally label the onset of sepsis. We result at a total number of 20,009 sepsis patients out of the 52,902 adult patients from MIMIC-III database. We exclude those patients who stay in ICUs less than 6 hours and also exclude those patients who developed sepsis within the first 6 hours after ICU admission. This reduces our cohort to a total of 34,475ICU patient, and only 2,370(6.8%) Then according to Singer et al. (2016), we identify the onset of septic shock as Algorithm 3 End-to-End Training algorithm for UnfoldML Input D: Full training data containing N instances M: Full model zoo C: Dictionary of models' spatio-temporal costs q: Confidence criterion Output: the optimal ICK1 gate parameters (or a,b): the optimal IDK gate parameters 1: procedure END-TO-ENDTRAINING (D, M) 2: Pre-allocate costs cs for each stage s. Figure 4: Transitions in model calls: both cascades always call the first model per each stage for an entrance and transition to next models (IDK) or next stage (ICK).
The Challenger: When Do New Data Sources Justify Switching Machine Learning Models?
Digalakis, Vassilis Jr, Pérignon, Christophe, Saurin, Sébastien, Sentenac, Flore
We study the problem of deciding whether, and when an organization should replace a trained incumbent model with a challenger relying on newly available features. We develop a unified economic and statistical framework that links learning-curve dynamics, data-acquisition and retraining costs, and discounting of future gains. First, we characterize the optimal switching time in stylized settings and derive closed-form expressions that quantify how horizon length, learning-curve curvature, and cost differentials shape the optimal decision. Second, we propose three practical algorithms--a one-shot baseline, a greedy sequential method, and a look-ahead sequential method. Using a real-world credit-scoring dataset with gradually arriving alternative data, we show that (i) optimal switching times vary systematically with cost parameters and learning-curve behavior, and (ii) the look-ahead sequential method outperforms other methods and is able to approach in value an oracle with full foresight. Finally, we establish finite-sample guarantees, including conditions under which the sequential look-ahead method achieve sublinear regret relative to that oracle. Our results provide an operational blueprint for economically sound model transitions as new data sources become available.
Preview, Accept or Discard? A Predictive Low-Motion Interaction Paradigm
Repetitive strain injury (RSI) affects roughly one in five computer users and remains largely unresolved despite decades of ergonomic mouse redesign. All such devices share a fundamental limitation: they still require fine-motor motion to operate. This work investigates whether predictive, AI-assisted input can reduce that motion by replacing physical pointing with ranked on-screen suggestions. To preserve user agency, we introduce Preview Accept Discard (PAD), a zero-click interaction paradigm that lets users preview predicted GUI targets, cycle through a small set of ranked alternatives, and accept or discard them via key-release timing. We evaluate PAD in two settings: a browser-based email client and a ISO 9241-9 keyboard-prediction task under varying top-3 accuracies. Across both studies, PAD substantially reduces hand motion relative to trackpad use while maintaining comparable task times with the trackpad only when accuracies are similar to those of the best spell-checkers.
Reinforcement Learning for Hanabi
Cohen, Nina, France, Kordel K.
Hanabi has become a popular game for research when it comes to reinforcement learning (RL) as it is one of the few cooperative card games where you have incomplete knowledge of the entire environment, thus presenting a challenge for a RL agent. We explored different tabular and deep reinforcement learning algorithms to see which had the best performance both against an agent of the same type and also against other types of agents. We establish that certain agents played their highest scoring games against specific agents while others exhibited higher scores on average by adapting to the opposing agent's behavior. We attempted to quantify the conditions under which each algorithm provides the best advantage and identified the most interesting interactions between agents of different types. In the end, we found that temporal difference (TD) algorithms had better overall performance and balancing of play types compared to tabular agents. Specifically, tabular Expected SARSA and deep Q-Learning agents showed the best performance.
Controlling Risk of Retrieval-augmented Generation: A Counterfactual Prompting Framework
Chen, Lu, Zhang, Ruqing, Guo, Jiafeng, Fan, Yixing, Cheng, Xueqi
Retrieval-augmented generation (RAG) has emerged as a popular solution to mitigate the hallucination issues of large language models. However, existing studies on RAG seldom address the issue of predictive uncertainty, i.e., how likely it is that a RAG model's prediction is incorrect, resulting in uncontrollable risks in real-world applications. In this work, we emphasize the importance of risk control, ensuring that RAG models proactively refuse to answer questions with low confidence. Our research identifies two critical latent factors affecting RAG's confidence in its predictions: the quality of the retrieved results and the manner in which these results are utilized. To guide RAG models in assessing their own confidence based on these two latent factors, we develop a counterfactual prompting framework that induces the models to alter these factors and analyzes the effect on their answers. We also introduce a benchmarking procedure to collect answers with the option to abstain, facilitating a series of experiments. For evaluation, we introduce several risk-related metrics and the experimental results demonstrate the effectiveness of our approach. Our code and benchmark dataset are available at https://github.com/ict-bigdatalab/RC-RAG.
PokerGPT: An End-to-End Lightweight Solver for Multi-Player Texas Hold'em via Large Language Model
Huang, Chenghao, Cao, Yanbo, Wen, Yinlong, Zhou, Tao, Zhang, Yanru
Poker, also known as Texas Hold'em, has always been a typical research target within imperfect information games (IIGs). IIGs have long served as a measure of artificial intelligence (AI) development. Representative prior works, such as DeepStack and Libratus heavily rely on counterfactual regret minimization (CFR) to tackle heads-up no-limit Poker. However, it is challenging for subsequent researchers to learn CFR from previous models and apply it to other real-world applications due to the expensive computational cost of CFR iterations. Additionally, CFR is difficult to apply to multi-player games due to the exponential growth of the game tree size. In this work, we introduce PokerGPT, an end-to-end solver for playing Texas Hold'em with arbitrary number of players and gaining high win rates, established on a lightweight large language model (LLM). PokerGPT only requires simple textual information of Poker games for generating decision-making advice, thus guaranteeing the convenient interaction between AI and humans. We mainly transform a set of textual records acquired from real games into prompts, and use them to fine-tune a lightweight pre-trained LLM using reinforcement learning human feedback technique. To improve fine-tuning performance, we conduct prompt engineering on raw data, including filtering useful information, selecting behaviors of players with high win rates, and further processing them into textual instruction using multiple prompt engineering techniques. Through the experiments, we demonstrate that PokerGPT outperforms previous approaches in terms of win rate, model size, training time, and response speed, indicating the great potential of LLMs in solving IIGs.
CFR-p: Counterfactual Regret Minimization with Hierarchical Policy Abstraction, and its Application to Two-player Mahjong
Counterfactual Regret Minimization(CFR) has shown its success in Texas Hold'em poker. We apply this algorithm to another popular incomplete information game, Mahjong. Compared to the poker game, Mahjong is much more complex with many variants. We study two-player Mahjong by conducting game theoretical analysis and making a hierarchical abstraction to CFR based on winning policies. This framework can be generalized to other imperfect information games.
Predictive discarding for sustainable Industry 5.0
The computer chip shortage has prompted Dr Geert van Kollenburg and his colleagues at Eindhoven University of Technology, the Netherlands, to find data-driven methods to optimise chip manufacturing processes. As part of the MadeIn4 project, they have developed a predictive discarding framework in which quality predictions from artificial intelligence (AI) algorithms are used to decide on whether to discard an unfinished product. This approach can improve both the profitability and sustainability of manufacturing processes. In line with Industry 5.0 goals, predictive discarding offers a way for humans and AI to work together to achieve sustainable manufacturing. Our digital society relies on computer chips, but these chips are currently in short supply.
Don't let the robot discard your resume: Applying for work in an automated world
Company hiring managers are integrating artificial intelligence (AI) systems to automate decisions on job applicants. "Job candidates are often unaware of these methods," warns Cristina Colom, director of Digital Future Society, an agency from Spain that informs citizens about technology issues. Applicant Tracking Systems (ATS) are one of the most widely used types of software in the human resources field. They offer specific solutions for recruiting personnel and cull applications "to leave a manageable number of resumes that meet a position's requirements," explains Carme Roselló, the coordinator of labor market projects at Barcelona Activa. First and foremost, automation allows for "fast screening," says recruitment consultant Juan Gonzalez.
Artificial Bee Colony Algorithm
Artificial Bee Colony (ABC) algorithm is a Swarm Intelligence optimization algorithm inspired by the functioning of honey bees trying to find the best nectar resources surrounding their bee hive. Derviş Kara-Bogaz first proposed this algorithm in 2005. This algorithm has been used in many forms of optimization of complex non-linear functions. As you will see soon, this algorithm is dependent on the randomness of the situation, it is a great domain for applying better strategies to find the optimal point even faster. You will also notice that if the algorithm has a hint that the point is somehow a local minimum, it has a strategy to even discard it.