Goto

Collaborating Authors

Predicting Future Actions of Reinforcement Learning Agents

Neural Information Processing Systems

As reinforcement learning agents become increasingly deployed in real-world scenarios, predicting future agent actions and events during deployment is important for facilitating better human-agent interaction and preventing catastrophic outcomes. This paper experimentally evaluates and compares the effectiveness of future action and event prediction for three types of RL agents: explicitly planning, implicitly planning, and non-planning. We employ two approaches: the inner state approach, which involves predicting based on the inner computations of the agents (e.g., plans or neuron activations), and a simulation-based approach, which involves unrolling the agent in a learned world model. Our results show that the plans of explicitly planning agents are significantly more informative for prediction than the neuron activations of the other types. Furthermore, using internal plans proves more robust to model quality compared to simulation-based approaches when predicting actions, while the results for event prediction are more mixed. These findings highlight the benefits of leveraging inner states and simulations to predict future agent actions and events, thereby improving interaction and safety in real-world deployments.


Empirical Likelihood for Contextual Bandits

Neural Information Processing Systems

We propose an estimator and confidence interval for computing the value of a policy from off-policy data in the contextual bandit setting. To this end we apply empirical likelihood techniques to formulate our estimator and confidence interval as simple convex optimization problems. Using the lower bound of our confidence interval, we then propose an off-policy policy optimization algorithm that searches for policies with large reward lower bound. We empirically find that both our estimator and confidence interval improve over previous proposals in finite sample regimes. Finally, the policy optimization algorithm we propose outperforms a strong baseline system for learning from off-policy data.


Exploiting Descriptive Completeness Prior for Cross Modal Hashing with Incomplete Labels

Neural Information Processing Systems

In this paper, we tackle the challenge of generating high-quality hash codes for cross-modal retrieval in the presence of incomplete labels, which creates uncertainty in distinguishing between positive and negative pairs. Vision-language models such as CLIP offer a potential solution by providing generic knowledge for missing label recovery, yet their zero-shot performance remains insufficient. To address this, we propose a novel Prompt Contrastive Recovery approach, PCRIL, which progressively identifies promising positive classes from unknown label sets and recursively searches for other relevant labels. Identifying unknowns is nontrivial due to the fixed and long-tailed patterns of positive label sets in training data, which hampers the discovery of new label combinations. Therefore, we consider each subset of positive labels and construct three types of negative prompts through deletion, addition, and replacement for prompt learning. The augmented supervision guides the model to measure the completeness of label sets, thus facilitating the subsequent greedy tree search for label completion. We also address extreme cases of significant unknown labels and lack of negative pairwise supervision by deriving two augmentation strategies: seeking unknown-complementary samples for mixup and random flipping for negative labels. Extensive experiments reveal the vulnerability of current methods and demonstrate the effectiveness of PCRIL, achieving an average 12% mAP improvement to the current SOTA across all datasets. Our code is available at github.com/E-Galois/PCRIL.


Automatically Learning Compact Quality-aware Surrogates for Optimization Problems Bryan Wilder Harvard University Harvard University Cambridge, MA Milind Tambe Harvard University

Neural Information Processing Systems

Solving optimization problems with unknown parameters often requires learning a predictive model to predict the values of the unknown parameters and then solving the problem using these values. Recent work has shown that including the optimization problem as a layer in the model training pipeline results in predictions of the unobserved parameters that lead to higher decision quality. Unfortunately, this process comes at a large computational cost because the optimization problem must be solved and differentiated through in each training iteration; furthermore, it may also sometimes fail to improve solution quality due to non-smoothness issues that arise when training through a complex optimization layer. To address these shortcomings, we learn a low-dimensional surrogate model of a large optimization problem by representing the feasible space in terms of meta-variables, each of which is a linear combination of the original variables. By training a low-dimensional surrogate model end-to-end, and jointly with the predictive model, we achieve: i) a large reduction in training and inference time; and ii) improved performance by focusing attention on the more important variables in the optimization and learning in a smoother space. Empirically, we demonstrate these improvements on a non-convex adversary modeling task, a submodular recommendation task and a convex portfolio optimization task.


Reviewer # 1 or perhaps does not exist, using covariates as an alternative objective to optimize could be an extension of the current

Neural Information Processing Systems

We thank the reviewer for the constructive feedback. We will make the suggested clarifications and fix the typos. The framework of the paper uses the model to improve the reparameterization directly. Reparameterizing in such an extension is an interesting future direction to explore. We thank the reviewer for the constructive feedback.


Supplementary material for CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat Intelligence

Neural Information Processing Systems

This task assesses the LLMs' ability to evaluate the severity of This task tests the LLMs' capability to The dataset consists of 5 TSV files, each corresponding to a different task. "Prompt" column used to pose questions to the LLM. Most files also include a "GT" column that The dataset includes URLs indicating the sources from which the data was collected. A permanent DOI identifier is associated with the dataset: DOI: AI4Sec (2024).


CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat Intelligence Dipkamal Bhusal Rochester Institute of Technology Rochester Institute of Technology Rochester, NY, USA

Neural Information Processing Systems

Cyber threat intelligence (CTI) is crucial in today's cybersecurity landscape, providing essential insights to understand and mitigate the ever-evolving cyber threats. The recent rise of Large Language Models (LLMs) have shown potential in this domain, but concerns about their reliability, accuracy, and hallucinations persist. While existing benchmarks provide general evaluations of LLMs, there are no benchmarks that address the practical and applied aspects of CTI-specific tasks. To bridge this gap, we introduce CTIBench, a benchmark designed to assess LLMs' performance in CTI applications. CTIBench includes multiple datasets focused on evaluating knowledge acquired by LLMs in the cyber-threat landscape. Our evaluation of several state-of-the-art models on these tasks provides insights into their strengths and weaknesses in CTI contexts, contributing to a better understanding of LLM capabilities in CTI.


Appendices

Neural Information Processing Systems

The supplementary material is organized as follows. We first discuss additional related work and provide experiment details in Section 2 and Appendix B respectively. In Appendix C, we provide additional experiments to further validate the extreme nature of Simplicity Bias (SB). Then, in Appendix D, we provide additional information about the experiment setup used to to show that extreme SB can hurt generalization. We evaluate the extent to which ensemble methods and adversarial training mitigate Simplicity Bias (SB) in Appendix E. Finally, we provide the proof of Theorem 1 in Appendix F. In this section, we provide a more thorough discussion of relevant work related to margin-based generalization bounds, adversarial attacks and robustness, and out-of-distribution (OOD) examples. Margin-based generalization bounds: Building up on the classical work of [3], recent works try to obtain tighter generalization bounds for neural networks in terms of normalized margin [4, 50, 18, 22]. Here, margin is defined as the difference in the probability of the true label and the largest probability of the incorrect labels.


The Pitfalls of Simplicity Bias in Neural Networks

Neural Information Processing Systems

Several works have proposed Simplicity Bias (SB)--the tendency of standard training procedures such as Stochastic Gradient Descent (SGD) to find simple models--to justify why neural networks generalize well [1, 49, 74]. However, the precise notion of simplicity remains vague. Furthermore, previous settings [67, 24] that use SB to justify why neural networks generalize well do not simultaneously capture the non-robustness of neural networks--a widely observed phenomenon in practice [71, 36]. We attempt to reconcile SB and the superior standard generalization of neural networks with the non-robustness observed in practice by introducing piecewise-linear and image-based datasets, which (a) incorporate a precise notion of simplicity, (b) comprise multiple predictive features with varying levels of simplicity, and (c) capture the non-robustness of neural networks trained on real data. Through theoretical analysis and targeted experiments on these datasets, we make four observations: (i) SB of SGD and variants can be extreme: neural networks can exclusively rely on the simplest feature and remain invariant to all predictive complex features.


Documentation for The Noisy Ostracods Dataset, He Wang

Neural Information Processing Systems

The Noisy Ostracods dataset is a real-world taxonomy dataset characterized by various types of noise. It was created out of the need for a clean taxonomy dataset and the challenges we encountered during the cleaning process in our real use case. Our goal was to provide a benchmark for evaluating the performance of robust machine learning methods and label correction algorithms from a practical perspective. The imbalanced and fine-grained nature of the dataset introduces additional challenges to these methods. The document is made by adapting the most relevant questions from datasheets for datasets[1] according to the property of our datasets.