Goto

Collaborating Authors

Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning

Neural Information Processing Systems

Large vision-language models (VLMs) fine-tuned on specialized visual instructionfollowing data have exhibited impressive language reasoning capabilities across various scenarios. However, this fine-tuning paradigm may not be able to efficiently learn optimal decision-making agents in multi-step goal-directed tasks from interactive environments. To address this challenge, we propose an algorithmic framework that fine-tunes VLMs with reinforcement learning (RL). Specifically, our framework provides a task description and then prompts the VLM to generate chain-of-thought (CoT) reasoning, enabling the VLM to efficiently explore intermediate reasoning steps that lead to the final text-based action. Next, the open-ended text output is parsed into an executable action to interact with the environment to obtain goal-directed task rewards. Finally, our framework uses these task rewards to fine-tune the entire VLM with RL.


Flexible Modeling of Diversity with Strongly Log-Concave Distributions

Neural Information Processing Systems

Strongly log-concave (SLC) distributions are a rich class of discrete probability distributions over subsets of some ground set. They are strictly more general than strongly Rayleigh (SR) distributions such as the well-known determinantal point process. While SR distributions offer elegant models of diversity, they lack an easy control over how they express diversity. We propose SLC as the right extension of SR that enables easier, more intuitive control over diversity, illustrating this via examples of practical importance. We develop two fundamental tools needed to apply SLC distributions to learning and inference: sampling and mode finding. For sampling we develop an MCMC sampler and give theoretical mixing time bounds. For mode finding, we establish a weak log-submodularity property for SLC functions and derive optimization guarantees for a distorted greedy algorithm.


e-COP: Episodic Constrained Optimization of Policies

Neural Information Processing Systems

In this paper, we present the e-COP algorithm, the first policy optimization algorithm for constrained Reinforcement Learning (RL) in episodic (finite horizon) settings. Such formulations are applicable when there are separate sets of optimization criteria and constraints on a system's behavior. We approach this problem by first establishing a policy difference lemma for the episodic setting, which provides the theoretical foundation for the algorithm. Then, we propose to combine a set of established and novel solution ideas to yield the e-COP algorithm that is easy to implement and numerically stable, and provide a theoretical guarantee on optimality under certain scaling assumptions. Through extensive empirical analysis using benchmarks in the Safety Gym suite, we show that our algorithm has similar or better performance than SoTA (non-episodic) algorithms adapted for the episodic setting. The scalability of the algorithm opens the door to its application in safety-constrained Reinforcement Learning from Human Feedback for Large Language or Diffusion Models.



Assessing Disparate Impact of Personalized Interventions: Identifiability and Bounds

Neural Information Processing Systems

Personalized interventions in social services, education, and healthcare leverage individual-level causal effect predictions in order to give the best treatment to each individual or to prioritize program interventions for the individuals most likely to benefit. While the sensitivity of these domains compels us to evaluate the fairness of such policies, we show that actually auditing their disparate impacts per standard observational metrics, such as true positive rates, is impossible since ground truths are unknown. Whether our data is experimental or observational, an individual's actual outcome under an intervention different than that received can never be known, only predicted based on features. We prove how we can nonetheless pointidentify these quantities under the additional assumption of monotone treatment response, which may be reasonable in many applications. We further provide a sensitivity analysis for this assumption by means of sharp partial-identification bounds under violations of monotonicity of varying strengths. We show how to use our results to audit personalized interventions using partially-identified ROC and xROC curves and demonstrate this in a case study of a French job training dataset.


Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models

Neural Information Processing Systems

The availability of large-scale multimodal datasets and advancements in diffusion models have significantly accelerated progress in 4D content generation. Most prior approaches rely on multiple images or video diffusion models, utilizing score distillation sampling for optimization or generating pseudo novel views for direct supervision. However, these methods are hindered by slow optimization speeds and multi-view inconsistency issues. Spatial and temporal consistency in 4D geometry has been extensively explored respectively in 3D-aware diffusion models and traditional monocular video diffusion models. Building on this foundation, we propose a strategy to migrate the temporal consistency in video diffusion models to the spatial-temporal consistency required for 4D generation.


Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning Jun-Yan He4 Jingdong Sun 3

Neural Information Processing Systems

Accurate emotion perception is crucial for various applications, including humancomputer interaction, education, and counseling. However, traditional singlemodality approaches often fail to capture the complexity of real-world emotional expressions, which are inherently multimodal.


Dimension-Free Bounds for Low-Precision Training

Neural Information Processing Systems

Low-precision training is a promising way of decreasing the time and energy cost of training machine learning models. Previous work has analyzed low-precision training algorithms, such as low-precision stochastic gradient descent, and derived theoretical bounds on their convergence rates. These bounds tend to depend on the dimension of the model d in that the number of bits needed to achieve a particular error bound increases as d increases. In this paper, we derive new bounds for low-precision training algorithms that do not contain the dimension d, which lets us better understand what affects the convergence of these algorithms as parameters scale. Our methods also generalize naturally to let us prove new convergence bounds on low-precision training with other quantization schemes, such as lowprecision floating-point computation and logarithmic quantization.


d4a93297083a23cc099f7bd6a8621131-AuthorFeedback.pdf

Neural Information Processing Systems

We thank the reviewers for their constructive feedback. R3, on the other hand, mentions problems with "generality" and "viability" as significant weaknesses. R3's first concern is that the learning method is limited to video data. This concern is puzzling to us. R3's second concern is that our 2D keypoint detector has been trained from annotations on real images. We are not wishing to claim (as stated by R3) that "the method completely Our work is a core step on this path.


C Access to PowerGraph Dataset C.1 Dataset documentation and intended uses

Neural Information Processing Systems

We use InMemoryDataset [27] class of Pytorch Geometric, which processes the raw data obtained from the Cascades [61] simulation. For each dataset UK, IEEE24, IEEE39, and IEEE118, we provide a folder containing the raw data organized in the following files for node-level tasks, i.e., power flow and optimal power flow analyses: edge_attr.mat: The dataset can be viewed and downloaded by the reviewers from https://figshare.com/articles/ dataset/PowerGraph/22820534 (node-level 1.08GB and graph-level 2.7GB, when uncompressed): Node-level data: #!/ bin / bash wget -O data. The authors state here that they bear all responsibility in case of violation of rights, etc., and confirm that this work is licensed under the CC BY 4.0 license. The code to obtain the PowerGraph dataset in the InMemoryDataset [27] format and to benchmark GNN and explainability methods is available as a public GitHub organization at https://github.com/