Goto

Collaborating Authors

 evaluation time


Fast and Flexible Monotonic Functions with Ensembles of Lattices

Neural Information Processing Systems

For many machine learning problems, there are some inputs that are known to be positively (or negatively) related to the output, and in such cases training the model to respect that monotonic relationship can provide regularization, and makes the model more interpretable. However, flexible monotonic functions are computationally challenging to learn beyond a few features. We break through this barrier by learning ensembles of monotonic calibrated interpolated look-up tables (lattices). A key contribution is an automated algorithm for selecting feature subsets for the ensemble base models. We demonstrate that compared to random forests, these ensembles produce similar or better accuracy, while providing guaranteed monotonicity consistent with prior knowledge, smaller model size and faster evaluation.


positive feedback, and greatly appreciate the critical and constructive suggestions

Neural Information Processing Systems

Thank you for your valuable feedback, which is very helpful in improving the paper. We're encouraged by the broadly "Put this in the context of other work on computational homogenization / multi-scale finite element Our method is related to these and the boundary element method (BEM). "Limitation associated with micro-scale buckling... the coarse-grain behavior might exhibit hysteretic effects": Good "How sensitive is the outer optimization to the accuracy of the surrogate gradients?" "Do you know how the CES method scales with system size in terms of accuracy and evaluation time": In terms of "the method to solve the outer optimization over BCs to find minimum energy solutions to the composed surrogates Free DoFs are optimized to minimize total predicted energy using LBFGS. "The discuss of the surrogate and i.i.d. "Are the BCs shared when a boundary is common between two cells": Y es. We have 1 DoF for each blue point in Fig 2. "Its not clear how the HMC and PDE solver are used together": HMC is used to generate training BCs, preferring larger The PDE solver is used to compute the gradient of the pdf (which depends on E) w.r.t. the BC. Given BCs, we run the solver to determine the internal u and E. We compute dE/dBC with the Then we use this to compute the gradient of the pdf w.r.t. the BCs, needed for the leapfrog step. "does the HMC require a significant burn-in time before producing reasonable samples": No. Note: we don't truly care Per appendix, HMC took between 3 and 100 leapfrog steps per sample. The process of using the surrogates to solve the original problem can be explained in more detail. Newton method is neither the fast nor the most stable... a comparison with more sophisticated methods would be From a brief look it looks like Liu et al's method is tailored for Reviewer 5: "There is one outlier in L2 compression that was quite bad": We will discuss this in the main paper. "A comment might help the reader situate this work within the more usual (less idyllic) context of approximating This is a good suggestion: we will relate to other work in learning energies.


XAutoLM: Efficient Fine-Tuning of Language Models via Meta-Learning and AutoML

arXiv.org Artificial Intelligence

Experts in machine learning leverage domain knowledge to navigate decisions in model selection, hyperparameter optimization, and resource allocation. This is particularly critical for fine-tuning language models (LMs), where repeated trials incur substantial computational overhead and environmental impact. However, no existing automated framework simultaneously tackles the entire model selection and hyperparameter optimization (HPO) task for resource-efficient LM fine-tuning. We introduce XAutoLM, a meta-learning-augmented AutoML framework that reuses past experiences to optimize discriminative and generative LM fine-tuning pipelines efficiently. XAutoLM learns from stored successes and failures by extracting task- and system-level meta-features to bias its sampling toward valuable configurations and away from costly dead ends. On four text classification and two question-answering benchmarks, XAutoLM surpasses zero-shot optimizer's peak F1 on five of six tasks, cuts mean evaluation time of pipelines by up to 4.5x, reduces search error ratios by up to sevenfold, and uncovers up to 50% more pipelines above the zero-shot Pareto front. In contrast, simpler memory-based baselines suffer negative transfer. We release XAutoLM and our experience store to catalyze resource-efficient, Green AI fine-tuning in the NLP community.




Automated Heuristic Design for Unit Commitment Using Large Language Models

arXiv.org Artificial Intelligence

The Unit Commitment (UC) problem is a classic challenge in the optimal scheduling of power systems. Years of research and practice have shown that formulating reasonable unit commitment plans can significantly improve the economic efficiency of power systems' operations. In recent years, with the introduction of technologies such as machine learning and the Lagrangian relaxation method, the solution methods for the UC problem have become increasingly diversified, but still face challenges in terms of accuracy and robustness. This paper proposes a Function Space Search (FunSearch) method based on large language models. This method combines pre-trained large language models and evaluators to creatively generate solutions through the program search and evolution process while ensuring their rationality. In simulation experiments, a case of unit commitment with \(10\) units is used mainly. Compared to the genetic algorithm, the results show that FunSearch performs better in terms of sampling time, evaluation time, and total operating cost of the system, demonstrating its great potential as an effective tool for solving the UC problem.


ASPO: Constraint-Aware Bayesian Optimization for FPGA-based Soft Processors

arXiv.org Artificial Intelligence

Bayesian Optimization (BO) has shown promise in tuning processor design parameters. However, standard BO does not support constraints involving categorical parameters such as types of branch predictors and division circuits. In addition, optimization time of BO grows with processor complexity, which becomes increasingly significant especially for FPGA-based soft processors. This paper introduces ASPO, an approach that leverages disjunctive form to enable BO to handle constraints involving categorical parameters. Unlike existing methods that directly apply standard BO, the proposed ASPO method, for the first time, customizes the mathematical mechanism of BO to address challenges faced by soft-processor designs on FPGAs. Specifically, ASPO supports categorical parameters using a novel customized BO covariance kernel. It also accelerates the design evaluation procedure by penalizing the BO acquisition function with potential evaluation time and by reusing FPGA synthesis checkpoints from previously evaluated configurations. ASPO targets three soft processors: RocketChip, BOOM, and EL2 VeeR. The approach is evaluated based on seven RISC-V benchmarks. Results show that ASPO can reduce execution time for the ``multiply'' benchmark on the BOOM processor by up to 35\% compared to the default configuration. Furthermore, it reduces design time for the BOOM processor by up to 74\% compared to Boomerang, a state-of-the-art hardware-oriented BO approach.


Prompting Decision Transformers for Zero-Shot Reach-Avoid Policies

arXiv.org Artificial Intelligence

Offline goal-conditioned reinforcement learning methods have shown promise for reach-avoid tasks, where an agent must reach a target state while avoiding undesirable regions of the state space. Existing approaches typically encode avoid-region information into an augmented state space and cost function, which prevents flexible, dynamic specification of novel avoid-region information at evaluation time. They also rely heavily on well-designed reward and cost functions, limiting scalability to complex or poorly structured environments. We introduce RADT, a decision transformer model for offline, reward-free, goal-conditioned, avoid region-conditioned RL. RADT encodes goals and avoid regions directly as prompt tokens, allowing any number of avoid regions of arbitrary size to be specified at evaluation time. Using only suboptimal offline trajectories from a random policy, RADT learns reach-avoid behavior through a novel combination of goal and avoid-region hindsight relabeling. We benchmark RADT against 3 existing offline goal-conditioned RL models across 11 tasks, environments, and experimental settings. RADT generalizes in a zero-shot manner to out-of-distribution avoid region sizes and counts, outperforming baselines that require retraining. In one such zero-shot setting, RADT achieves 35.7% improvement in normalized cost over the best retrained baseline while maintaining high goal-reaching success. We apply RADT to cell reprogramming in biology, where it reduces visits to undesirable intermediate gene expression states during trajectories to desired target states, despite stochastic transitions and discrete, structured state dynamics.


Adaptive Machine Learning for Resource-Constrained Environments

arXiv.org Artificial Intelligence

The Internet of Things is an example domain where data is perpetually generated in ever-increasing quantities, reflecting the proliferation of connected devices and the formation of continuous data streams over time. Consequently, the demand for ad-hoc, cost-effective machine learning solutions must adapt to this evolving data influx. This study tackles the task of offloading in small gateways, exacerbated by their dynamic availability over time. An approach leveraging CPU utilization metrics using online and continual machine learning techniques is proposed to predict gateway availability. These methods are compared to popular machine learning algorithms and a recent time-series foundation model, Lag-Llama, for fine-tuned and zero-shot setups. Their performance is benchmarked on a dataset of CPU utilization measurements over time from an IoT gateway and focuses on model metrics such as prediction errors, training and inference times, and memory consumption. Our primary objective is to study new efficient ways to predict CPU performance in IoT environments. Across various scenarios, our findings highlight that ensemble and online methods offer promising results for this task in terms of accuracy while maintaining a low resource footprint.


Adaptive Hoeffding Tree with Transfer Learning for Streaming Synchrophasor Data Sets

arXiv.org Artificial Intelligence

Synchrophasor technology or phasor measurement units (PMUs) are known to detect multiple type of oscillations or faults better than Supervisory Control and Data Acquisition (SCADA) systems, but the volume of Bigdata (e.g., 30-120 samples per second on a single PMU) generated by these sensors at the aggregator level (e.g., several PMUs) requires special handling. Conventional machine learning or data mining methods are not suitable to handle such larger streaming realtime data. This is primarily due to latencies associated with cloud environments (e.g., at an aggregator or PDC level), and thus necessitates the need for local computing to move the data on the edge (or locally at the PMU level) for processing. This requires faster real-time streaming algorithms to be processed at the local level (e.g., typically by a Field Programmable Gate Array (FPGA) based controllers). This paper proposes a transfer learning-based hoeffding tree with ADWIN (THAT) method to detect anomalous synchrophasor signatures. The proposed algorithm is trained and tested with the OzaBag method. The preliminary results with transfer learning indicate that a computational time saving of 0.7ms is achieved with THAT algorithm (0.34ms) over Ozabag (1.04ms), while the accuracy of both methods in detecting fault events remains at 94% for four signatures.