Pullman
Conformal Margin Risk Minimization: An Envelope Framework for Robust Learning under Label Noise
Shi, Yuanjie, Li, Peihong, Zhang, Zijian, Doppa, Janardhan Rao, Yan, Yan
Most methods for learning with noisy labels require privileged knowledge such as noise transition matrices, clean subsets or pretrained feature extractors, resources typically unavailable when robustness is most needed. We propose Conformal Margin Risk Minimization (CMRM), a plug-and-play envelope framework that improves any classification loss under label noise by adding a single quantile-calibrated regularization term, with no privileged knowledge or training pipeline modification. CMRM measures the confidence margin between the observed label and competing labels, and thresholds it with a conformal quantile estimated per batch to focus training on high-margin samples while suppressing likely mislabeled ones. We derive a learning bound for CMRM under arbitrary label noise requiring only mild regularity of the margin distribution. Across five base methods and six benchmarks with synthetic and real-world noise, CMRM consistently improves accuracy (up to +3.39%), reduces conformal prediction set size (up to -20.44%) and does not hurt under 0% noise, showing that CMRM captures a method-agnostic uncertainty signal that existing mechanisms did not exploit.
Geometric Calibration and Neutral Zones for Uncertainty-Aware Multi-Class Classification
Das, Soumojit, Dasgupta, Nairanjana, Dutta, Prashanta
Modern artificial intelligence systems make critical decisions yet often fail silently when uncertain -- even well-calibrated models provide no mechanism to identify \textit{which specific predictions} are unreliable. We develop a geometric framework addressing both calibration and instance-level uncertainty quantification for neural network probability outputs. Treating probability vectors as points on the $(c-1)$-dimensional probability simplex equipped with the Fisher--Rao metric, we construct: (i) Additive Log-Ratio (ALR) calibration maps that reduce exactly to Platt scaling for binary problems while extending naturally to multi-class settings, and (ii) geometric reliability scores that translate calibrated probabilities into actionable uncertainty measures, enabling principled deferral of ambiguous predictions to human review. Theoretical contributions include: consistency of the calibration estimator at rate $O_p(n^{-1/2})$ via M-estimation theory (Theorem~1), and tight concentration bounds for reliability scores with explicit sub-Gaussian parameters enabling sample size calculations for validation set design (Theorem~2). We conjecture Neyman--Pearson optimality of our neutral zone construction based on connections to Bhattacharyya coefficients. Empirical validation on Adeno-Associated Virus classification demonstrates that the two-stage framework captures 72.5\% of errors while deferring 34.5\% of samples, reducing automated decision error rates from 16.8\% to 6.9\%. Notably, calibration alone yields marginal accuracy gains; the operational benefit arises primarily from the reliability scoring mechanism, which applies to any well-calibrated probability output. This work bridges information geometry and statistical learning, offering formal guarantees for uncertainty-aware classification in applications requiring rigorous validation.
Direct Prediction Set Minimization via Bilevel Conformal Classifier Training
Shi, Yuanjie, Shahrokhi, Hooman, Jia, Xuesong, Chen, Xiongzhi, Doppa, Janardhan Rao, Yan, Yan
Conformal prediction (CP) is a promising uncertainty quantification framework which works as a wrapper around a black-box classifier to construct prediction sets (i.e., subset of candidate classes) with provable guarantees. However, standard calibration methods for CP tend to produce large prediction sets which makes them less useful in practice. This paper considers the problem of integrating conformal principles into the training process of deep classifiers to directly minimize the size of prediction sets. We formulate conformal training as a bilevel optimization problem and propose the {\em Direct Prediction Set Minimization (DPSM)} algorithm to solve it. The key insight behind DPSM is to minimize a measure of the prediction set size (upper level) that is conditioned on the learned quantile of conformity scores (lower level). We analyze that DPSM has a learning bound of $O(1/\sqrt{n})$ (with $n$ training samples), while prior conformal training methods based on stochastic approximation for the quantile has a bound of $Ω(1/s)$ (with batch size $s$ and typically $s \ll \sqrt{n}$). Experiments on various benchmark datasets and deep models show that DPSM significantly outperforms the best prior conformal training baseline with $20.46\%\downarrow$ in the prediction set size and validates our theory.
AIMI: Leveraging Future Knowledge and Personalization in Sparse Event Forecasting for Treatment Adherence
Mamun, Abdullah, Cook, Diane J., Ghasemzadeh, Hassan
Adherence to prescribed treatments is crucial for individuals with chronic conditions to avoid costly or adverse health outcomes. For certain patient groups, intensive lifestyle interventions are vital for enhancing medication adherence. Accurate forecasting of treatment adherence can open pathways to developing an on-demand intervention tool, enabling timely and personalized support. With the increasing popularity of smartphones and wearables, it is now easier than ever to develop and deploy smart activity monitoring systems. However, effective forecasting systems for treatment adherence based on wearable sensors are still not widely available. We close this gap by proposing Adherence Forecasting and Intervention with Machine Intelligence (AIMI). AIMI is a knowledge-guided adherence forecasting system that leverages smartphone sensors and previous medication history to estimate the likelihood of forgetting to take a prescribed medication. A user study was conducted with 27 participants who took daily medications to manage their cardiovascular diseases. We designed and developed CNN and LSTM-based forecasting models with various combinations of input features and found that LSTM models can forecast medication adherence with an accuracy of 0.932 and an F-1 score of 0.936. Moreover, through a series of ablation studies involving convolutional and recurrent neural network architectures, we demonstrate that leveraging known knowledge about future and personalized training enhances the accuracy of medication adherence forecasting. Code available: https://github.com/ab9mamun/AIMI.
Learning Surrogates for Offline Black-Box Optimization via Gradient Matching
Hoang, Minh, Fadhel, Azza, Deshwal, Aryan, Doppa, Janardhan Rao, Hoang, Trong Nghia
Offline design optimization problem arises in numerous science and engineering applications including material and chemical design, where expensive online experimentation necessitates the use of in silico surrogate functions to predict and maximize the target objective over candidate designs. Although these surrogates can be learned from offline data, their predictions are often inaccurate outside the offline data regime. This challenge raises a fundamental question about the impact of imperfect surrogate model on the performance gap between its optima and the true optima, and to what extent the performance loss can be mitigated. Although prior work developed methods to improve the robustness of surrogate models and their associated optimization processes, a provably quantifiable relationship between an imperfect surrogate and the corresponding performance gap, as well as whether prior methods directly address it, remain elusive. To shed light on this important question, we present a theoretical framework to understand offline black-box optimization, by explicitly bounding the optimization quality based on how well the surrogate matches the latent gradient field that underlines the offline data. Inspired by our theoretical analysis, we propose a principled black-box gradient matching algorithm to create effective surrogate models for offline optimization, improving over prior approaches on various real-world benchmarks.
Provably-Stable Neural Network-Based Control of Nonlinear Systems
Li, Anran, Swensen, John P., Hosseinzadeh, Mehdi
In recent years, Neural Networks (NNs) have been employed to control nonlinear systems due to their potential capability in dealing with situations that might be difficult for conventional nonlinear control schemes. However, to the best of our knowledge, the current literature on NN-based control lacks theoretical guarantees for stability and tracking performance. This precludes the application of NN-based control schemes to systems where stringent stability and performance guarantees are required. To address this gap, this paper proposes a systematic and comprehensive methodology to design provably-stable NN-based control schemes for affine nonlinear systems. Rigorous analysis is provided to show that the proposed approach guarantees stability of the closed-loop system with the NN in the loop. Also, it is shown that the resulting NN-based control scheme ensures that system states asymptotically converge to a neighborhood around the desired equilibrium point, with a tunable proximity threshold. The proposed methodology is validated and evaluated via simulation studies on an inverted pendulum and experimental studies on a Parrot Bebop 2 drone.
A Guaranteed-Stable Neural Network Approach for Optimal Control of Nonlinear Systems
Li, Anran, Swensen, John P., Hosseinzadeh, Mehdi
A promising approach to optimal control of nonlinear systems involves iteratively linearizing the system and solving an optimization problem at each time instant to determine the optimal control input. Since this approach relies on online optimization, it can be computationally expensive, and thus unrealistic for systems with limited computing resources. One potential solution to this issue is to incorporate a Neural Network (NN) into the control loop to emulate the behavior of the optimal control scheme. Ensuring stability and reference tracking in the resulting NN-based closed-loop system requires modifications to the primary optimization problem. These modifications often introduce non-convexity and nonlinearity with respect to the decision variables, which may surpass the capabilities of existing solvers and complicate the generation of the training dataset. To address this issue, this paper develops a Neural Optimization Machine (NOM) to solve the resulting optimization problems. The central concept of a NOM is to transform the optimization challenges into the problem of training a NN. Rigorous proofs demonstrate that when a NN trained on data generated by the NOM is used in the control loop, all signals remain bounded and the system states asymptotically converge to a neighborhood around the desired equilibrium point, with a tunable proximity threshold. Simulation and experimental studies are provided to illustrate the effectiveness of the proposed methodology.
Safe and Efficient Robot Action Planning in the Presence of Unconcerned Humans
Amiri, Mohsen, Hosseinzadeh, Mehdi
This paper proposes a robot action planning scheme that provides an efficient and probabilistically safe plan for a robot interacting with an unconcerned human -- someone who is either unaware of the robot's presence or unwilling to engage in ensuring safety. The proposed scheme is predictive, meaning that the robot is required to predict human actions over a finite future horizon; such predictions are often inaccurate in real-world scenarios. One possible approach to reduce the uncertainties is to provide the robot with the capability of reasoning about the human's awareness of potential dangers. This paper discusses that by using a binary variable, so-called danger awareness coefficient, it is possible to differentiate between concerned and unconcerned humans, and provides a learning algorithm to determine this coefficient by observing human actions. Moreover, this paper argues how humans rely on predictions of other agents' future actions (including those of robots in human-robot interaction) in their decision-making. It also shows that ignoring this aspect in predicting human's future actions can significantly degrade the efficiency of the interaction, causing agents to deviate from their optimal paths. The proposed robot action planning scheme is verified and validated via extensive simulation and experimental studies on a LoCoBot WidowX-250.
Atleus: Accelerating Transformers on the Edge Enabled by 3D Heterogeneous Manycore Architectures
Dhingra, Pratyush, Doppa, Janardhan Rao, Pande, Partha Pratim
Transformer architectures have become the standard neural network model for various machine learning applications including natural language processing and computer vision. However, the compute and memory requirements introduced by transformer models make them challenging to adopt for edge applications. Furthermore, fine-tuning pre-trained transformers (e.g., foundation models) is a common task to enhance the model's predictive performance on specific tasks/applications. Existing transformer accelerators are oblivious to complexities introduced by fine-tuning. In this paper, we propose the design of a three-dimensional (3D) heterogeneous architecture referred to as Atleus that incorporates heterogeneous computing resources specifically optimized to accelerate transformer models for the dual purposes of fine-tuning and inference. Specifically, Atleus utilizes non-volatile memory and systolic array for accelerating transformer computational kernels using an integrated 3D platform. Moreover, we design a suitable NoC to achieve high performance and energy efficiency. Finally, Atleus adopts an effective quantization scheme to support model compression. Experimental results demonstrate that Atleus outperforms existing state-of-the-art by up to 56x and 64.5x in terms of performance and energy efficiency respectively
Power-Efficient Actuation for Insect-Scale Autonomous Underwater Vehicles
Longwell, Cody R., Trygstad, Conor K., Perez-Arancibia, Nestor O.
We present a new evolution of the Very Little Eel-Inspired roBot, the VLEIBot++, a 900-mg swimmer driven by two 10-mg bare high-work density (HWD) actuators, whose functionality is based on the use of shape-memory alloy (SMA) wires. An actuator of this type consumes an average power of about 40 mW during in-air operation. We integrated onboard power and computation into the VLEIBot++ using a custom-built printed circuit board (PCB) and an 11-mAh 3.7-V 507-mg single-cell lithium-ion (Li-Ion) battery, which in conjunction enable autonomous swimming for about 20 min on a single charge. This robot can swim at speeds of up to 18.7 mm/s (0.46 Bl/s) and is the first subgram microswimmer with onboard power, actuation, and computation developed to date. Unfortunately, the approach employed to actuate VLEIBot++ prototypes is infeasible for underwater applications because a typical 10-mg bare SMA-based microactuator requires an average power on the order of 800 mW when operating underwater. To address this issue, we introduce a new 13-mg power-efficient high-performance SMA-based microactuator that can function with similar power requirements (approx. 80 mW on average) and actuation performance (approx. 3 mm at low frequencies) in air and water. This design is based on the use of a sealed flexible air-capsule that encloses the SMA wires that drive the microactuator with the purpose of passively controlling the heat-transfer rate of the thermal system. Furthermore, this new power-efficient encapsulated actuator requires low voltages of excitation (3 to 4 V) and simple power electronics to function. The breakthroughs presented in this paper represent a path towards the creation of insect-scale autonomous underwater vehicles (AUVs).