discrete
95c7dfc5538e1ce71301cf92a9a96bd0-Supplemental.pdf
For regression, we model output noise as a zero-mean Gaussian: N(0,σ2) where σ2 is the varianceofthenoise,treatedasahyperparameter. Neal[21] shows that in the regression setting, the isotropic Gaussian prior for a BNN with a single hidden layer approaches aGaussian process prior asthe number ofhidden units tends toinfinity,solong as the chosen activation function is bounded. We will use this prior in the baseline BNN for our experiments. In the context of BNNs, our Markov chain is a sequence ofrandomparametersW(1),W(2),... definedoverW,whichweconstruct bydefining thetransitionkernel. BBB is scalable and fast, and therefore can be applied to high-dimensional and large datasets in real-life applications.
- North America > United States > Massachusetts > Suffolk County > Boston (0.05)
- Asia > Middle East > Israel (0.05)
ProtoTS: Learning Hierarchical Prototypes for Explainable Time Series Forecasting
Peng, Ziheng, Ren, Shijie, Gu, Xinyue, Yang, Linxiao, Wang, Xiting, Sun, Liang
While deep learning has achieved impressive performance in time series forecasting, it becomes increasingly crucial to understand its decision-making process for building trust in high-stakes scenarios. Existing interpretable models often provide only local and partial explanations, lacking the capability to reveal how heterogeneous and interacting input variables jointly shape the overall temporal patterns in the forecast curve. We propose ProtoTS, a novel interpretable forecasting framework that achieves both high accuracy and transparent decision-making through modeling prototypical temporal patterns. ProtoTS computes instance-prototype similarity based on a denoised representation that preserves abundant heterogeneous information. The prototypes are organized hierarchically to capture global temporal patterns with coarse prototypes while capturing finer-grained local variations with detailed prototypes, enabling expert steering and multi-level interpretability. Experiments on multiple realistic benchmarks, including a newly released LOF dataset, show that ProtoTS not only exceeds existing methods in forecast accuracy but also delivers expert-steerable interpretations for better model understanding and decision support. Time series forecasting has been widely applied in high-stakes scenarios such as load forecasting (Jiang et al., 2024; Y ang et al., 2023), energy management (Deb et al., 2017; Weron, 2014), weather prediction (Angryk et al., 2020; Karevan & Suykens, 2020), all of which involve considerable financial impacts. In these applications, while achieving high forecast accuracy is crucial, understanding why and how the model makes specific predictions is equally important. It aids in preventing substantial financial losses and building the trust necessary (Rojat et al., 2021). A range of explainable time series forecasting methods have been developed to simultaneously ensure interpretability and good predictive performance (Oreshkin et al., 2019; Lim et al., 2021; Zhao et al., 2024; Lin et al., 2024). However, their overall interpretability and potential for further performance improvement are limited, since they mainly provide local, partial explanations for both the output and input sides: C1: For the output side, existing methods (Lim et al., 2021; Zhao et al., 2024) mainly explain the prediction at individual time steps, lacking the ability to help users quickly interpret the reasons behind the overall trend in the forecast curve. For each instance, model computes its similarity to all prototypes to form a prediction, enabling detailed local interpretation.
- North America > United States > Pennsylvania (0.04)
- North America > United States > New Jersey (0.04)
- North America > United States > Maryland (0.04)
- (6 more...)
TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them
Wang, Yidong, Song, Yunze, Zhu, Tingyuan, Zhang, Xuanwang, Yu, Zhuohao, Chen, Hao, Song, Chiyu, Wang, Qiufeng, Wang, Cunxiang, Wu, Zhen, Dai, Xinyu, Zhang, Yue, Ye, Wei, Zhang, Shikun
The adoption of Large Language Models (LLMs) as automated evaluators (LLM-as-a-judge) has revealed critical inconsistencies in current evaluation frameworks. We identify two fundamental types of inconsistencies: (1) Score-Comparison Inconsistency, where lower-rated responses outperform higher-scored ones in pairwise comparisons, and (2) Pairwise Transitivity Inconsistency, manifested through circular preference chains (A>B>C>A) and equivalence contradictions (A=B=C\neq A). We argue that these issues come from information loss in discrete rating systems and ambiguous tie judgments during pairwise evaluation. We propose TrustJudge, a probabilistic framework that addresses these limitations through two key innovations: 1) distribution-sensitive scoring that computes continuous expectations from discrete rating probabilities, preserving information entropy for more precise scoring, and 2) likelihood-aware aggregation that resolves transitivity violations using bidirectional preference probabilities or perplexity. We also formalize the theoretical limitations of current LLM-as-a-judge frameworks and demonstrate how TrustJudge's components overcome them. When evaluated with Llama-3.1-70B-Instruct as judge using our dataset, TrustJudge reduces Score-Comparison inconsistency by 8.43% (from 23.32% to 14.89%) and Pairwise Transitivity inconsistency by 10.82% (from 15.22% to 4.40%), while maintaining higher evaluation accuracy. Our work provides the first systematic analysis of evaluation framework inconsistencies in LLM-as-a-judge paradigms, offering both theoretical insights and practical solutions for reliable automated assessment. The framework demonstrates consistent improvements across various model architectures and scales, enabling more trustworthy LLM evaluation without requiring additional training or human annotations. The codes can be found at https://github.com/TrustJudge/TrustJudge.
A Bayesian Inference over Neural Networks On a supervised model parameterized by W, we seek to infer the conditional distribution W | D
The prior and likelihood are both modelling choices. A.1 Likelihoods for BNNs The likelihood is purely a function of the model prediction Φ As exact posterior inference via (11) is intractable, we instead rely on approximate inference algorithms, which can be broadly grouped into two classes based on their method of approximation. A concrete label can be obtained by choosing the class with highest output value. The Gaussian variational family is a common choice. Estimators for the integral in (15) are necessary.
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Asia > Middle East > Israel (0.04)
- Health & Medicine (0.94)
- Banking & Finance (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.97)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.65)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > Singapore (0.04)
- Information Technology > Data Science > Data Mining (0.94)
- Information Technology > Communications > Networks (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Efficient Mitigation of Bus Bunching through Setter-Based Curriculum Learning
Shah, Avidan, Tran, Danny, Tang, Yuhan
Curriculum learning has been growing in the domain of reinforcement learning as a method of improving training efficiency for various tasks. It involves modifying the difficulty (lessons) of the environment as the agent learns, in order to encourage more optimal agent behavior and higher reward states. However, most curriculum learning methods currently involve discrete transitions of the curriculum or predefined steps by the programmer or using automatic curriculum learning on only a small subset training such as only on an adversary. In this paper, we propose a novel approach to curriculum learning that uses a Setter Model to automatically generate an action space, adversary strength, initialization, and bunching strength. Transportation and traffic optimization is a well known area of study, especially for reinforcement learning based solutions. We specifically look at the bus bunching problem for the context of this study. The main idea of the problem is to minimize the delays caused by inefficient bus timings for passengers arriving and departing from a system of buses. While the heavy exploration in the area makes innovation and improvement with regards to performance marginal, it simultaneously provides an effective baseline for developing new generalized techniques. Our group is particularly interested in examining curriculum learning and its effect on training efficiency and overall performance. We decide to try a lesser known approach to curriculum learning, in which the curriculum is not fixed or discretely thresholded. Our method for automated curriculum learning involves a curriculum that is dynamically chosen and learned by an adversary network made to increase the difficulty of the agent's training, and defined by multiple forms of input. Our results are shown in the following sections of this paper.
- Research Report > New Finding (0.48)
- Research Report > Promising Solution (0.48)
- Instructional Material > Course Syllabus & Notes (0.46)
- Education (0.93)
- Transportation > Passenger (0.52)
- Transportation > Ground > Road (0.47)
- Transportation > Infrastructure & Services (0.47)
Investigating the Impact of Choice on Deep Reinforcement Learning for Space Controls
Hamilton, Nathaniel, Dunlap, Kyle, Hobbs, Kerianne L.
For many space applications, traditional control methods are often used during operation. However, as the number of space assets continues to grow, autonomous operation can enable rapid development of control methods for different space related tasks. One method of developing autonomous control is Reinforcement Learning (RL), which has become increasingly popular after demonstrating promising performance and success across many complex tasks. While it is common for RL agents to learn bounded continuous control values, this may not be realistic or practical for many space tasks that traditionally prefer an on/off approach for control. This paper analyzes using discrete action spaces, where the agent must choose from a predefined list of actions. The experiments explore how the number of choices provided to the agents affects their measured performance during and after training. This analysis is conducted for an inspection task, where the agent must circumnavigate an object to inspect points on its surface, and a docking task, where the agent must move into proximity of another spacecraft and "dock" with a low relative speed. A common objective of both tasks, and most space tasks in general, is to minimize fuel usage, which motivates the agent to regularly choose an action that uses no fuel. Our results show that a limited number of discrete choices leads to optimal performance for the inspection task, while continuous control leads to optimal performance for the docking task.
- North America > United States > Ohio > Greene County > Beavercreek (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Montana (0.04)
- Government > Military > Air Force (0.46)
- Government > Regional Government (0.46)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > Singapore (0.04)
- Telecommunications > Networks (0.34)
- Information Technology > Networks (0.34)
Gaussian Mixture Models for Affordance Learning using Bayesian Networks
Osório, Pedro, Bernardino, Alexandre, Martinez-Cantin, Ruben, Santos-Victor, José
Affordances are fundamental descriptors of relationships between actions, objects and effects. They provide the means whereby a robot can predict effects, recognize actions, select objects and plan its behavior according to desired goals. This paper approaches the problem of an embodied agent exploring the world and learning these affordances autonomously from its sensory experiences. Models exist for learning the structure and the parameters of a Bayesian Network encoding this knowledge. Although Bayesian Networks are capable of dealing with uncertainty and redundancy, previous work considered complete observability of the discrete sensory data, which may lead to hard errors in the presence of noise. In this paper we consider a probabilistic representation of the sensors by Gaussian Mixture Models (GMMs) and explicitly taking into account the probability distribution contained in each discrete affordance concept, which can lead to a more correct learning.
- Europe > Portugal > Lisbon > Lisbon (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- Asia > Taiwan > Taiwan Province > Taipei (0.04)