Gopalakrishnan, Sriram
Robust and Efficient Fine-tuning of LLMs with Bayesian Reparameterization of Low-Rank Adaptation
Sengupta, Ayan, Seth, Vaibhav, Pathak, Arinjay, Raman, Natraj, Gopalakrishnan, Sriram, Chakraborty, Tanmoy
Large Language Models (LLMs) are highly resource-intensive to fine-tune due to their enormous size. While low-rank adaptation is a prominent parameter-efficient fine-tuning approach, it suffers from sensitivity to hyperparameter choices, leading to instability in model performance on fine-tuning downstream tasks. This paper highlights the importance of effective parameterization in low-rank fine-tuning to reduce estimator variance and enhance the stability of final model outputs. We propose MonteCLoRA, an efficient fine-tuning technique, employing Monte Carlo estimation to learn an unbiased posterior estimation of low-rank parameters with low expected variance, which stabilizes fine-tuned LLMs with only O(1) additional parameters. MonteCLoRA shows significant improvements in accuracy and robustness, achieving up to 3.8% higher accuracy and 8.6% greater robustness than existing efficient fine-tuning methods on natural language understanding tasks with pre-trained RoBERTa-base. Furthermore, in generative tasks with pre-trained LLaMA-1-7B, MonteCLoRA demonstrates robust zero-shot performance with 50% lower variance than the contemporary efficient fine-tuning methods. The theoretical and empirical results presented in the paper underscore how parameterization and hyperpriors balance exploration-exploitation in the low-rank parametric space, therefore leading to more optimal and robust parameter estimation during efficient fine-tuning.
TRIP-PAL: Travel Planning with Guarantees by Combining Large Language Models and Automated Planners
de la Rosa, Tomas, Gopalakrishnan, Sriram, Pozanco, Alberto, Zeng, Zhen, Borrajo, Daniel
Travel planning is a complex task that involves generating a sequence of actions related to visiting places subject to constraints and maximizing some user satisfaction criteria. Traditional approaches rely on problem formulation in a given formal language, extracting relevant travel information from web sources, and use an adequate problem solver to generate a valid solution. As an alternative, recent Large Language Model (LLM) based approaches directly output plans from user requests using language. Although LLMs possess extensive travel domain knowledge and provide high-level information like points of interest and potential routes, current state-of-the-art models often generate plans that lack coherence, fail to satisfy constraints fully, and do not guarantee the generation of high-quality solutions. We propose TRIP-PAL, a hybrid method that combines the strengths of LLMs and automated planners, where (i) LLMs get and translate travel information and user information into data structures that can be fed into planners; and (ii) automated planners generate travel plans that guarantee constraint satisfaction and optimize for users' utility. Our experiments across various travel scenarios show that TRIP-PAL outperforms an LLM when generating travel plans.
Multi-Modal Financial Time-Series Retrieval Through Latent Space Projections
Bamford, Tom, Coletta, Andrea, Fons, Elizabeth, Gopalakrishnan, Sriram, Vyetrenko, Svitlana, Balch, Tucker, Veloso, Manuela
Financial firms commonly process and store billions of time-series data, generated continuously and at a high frequency. To support efficient data storage and retrieval, specialized time-series databases and systems have emerged. These databases support indexing and querying of time-series by a constrained Structured Query Language(SQL)-like format to enable queries like "Stocks with monthly price returns greater than 5%", and expressed in rigid formats. However, such queries do not capture the intrinsic complexity of high dimensional time-series data, which can often be better described by images or language (e.g., "A stock in low volatility regime"). Moreover, the required storage, computational time, and retrieval complexity to search in the time-series space are often non-trivial. In this paper, we propose and demonstrate a framework to store multi-modal data for financial time-series in a lower-dimensional latent space using deep encoders, such that the latent space projections capture not only the time series trends but also other desirable information or properties of the financial time-series data (such as price volatility). Moreover, our approach allows user-friendly query interfaces, enabling natural language text or sketches of time-series, for which we have developed intuitive interfaces. We demonstrate the advantages of our method in terms of computational efficiency and accuracy on real historical data as well as synthetic data, and highlight the utility of latent-space projections in the storage and retrieval of financial time-series data with intuitive query modalities.
Synthetic Data Applications in Finance
Potluru, Vamsi K., Borrajo, Daniel, Coletta, Andrea, Dalmasso, Niccolรฒ, El-Laham, Yousef, Fons, Elizabeth, Ghassemi, Mohsen, Gopalakrishnan, Sriram, Gosai, Vikesh, Kreaฤiฤ, Eleonora, Mani, Ganapathy, Obitayo, Saheed, Paramanand, Deepak, Raman, Natraj, Solonin, Mikhail, Sood, Srijan, Vyetrenko, Svitlana, Zhu, Haibei, Veloso, Manuela, Balch, Tucker
Synthetic data has made tremendous strides in various commercial settings including finance, healthcare, and virtual reality. We present a broad overview of prototypical applications of synthetic data in the financial sector and in particular provide richer details for a few select ones. These cover a wide variety of data modalities including tabular, time-series, event-series, and unstructured arising from both markets and retail financial applications. Since finance is a highly regulated industry, synthetic data is a potential approach for dealing with issues related to privacy, fairness, and explainability. Various metrics are utilized in evaluating the quality and effectiveness of our approaches in these applications. We conclude with open directions in synthetic data in the context of the financial domain.
SafeAR: Towards Safer Algorithmic Recourse by Risk-Aware Policies
Wu, Haochen, Sharma, Shubham, Patra, Sunandita, Gopalakrishnan, Sriram
With the growing use of machine learning (ML) models in critical domains such as finance and healthcare, the need to offer recourse for those adversely affected by the decisions of ML models has become more important; individuals ought to be provided with recommendations on actions to take for improving their situation and thus receiving a favorable decision. Prior work on sequential algorithmic recourse -- which recommends a series of changes -- focuses on action feasibility and uses the proximity of feature changes to determine action costs. However, the uncertainties of feature changes and the risk of higher than average costs in recourse have not been considered. It is undesirable if a recourse could (with some probability) result in a worse situation from which recovery requires an extremely high cost. It is essential to incorporate risks when computing and evaluating recourse. We call the recourse computed with such risk considerations as Safer Algorithmic Recourse (SafeAR). The objective is to empower people to choose a recourse based on their risk tolerance. In this work, we discuss and show how existing recourse desiderata can fail to capture the risk of higher costs. We present a method to compute recourse policies that consider variability in cost and connect algorithmic recourse literature with risk-sensitive reinforcement learning. We also adopt measures "Value at Risk" and "Conditional Value at Risk" from the financial literature to summarize risk concisely. We apply our method to two real-world datasets and compare policies with different risk-aversion levels using risk measures and recourse desiderata (sparsity and proximity).
pyRDDLGym: From RDDL to Gym Environments
Taitler, Ayal, Gimelfarb, Michael, Jeong, Jihwan, Gopalakrishnan, Sriram, Mladenov, Martin, Liu, Xiaotian, Sanner, Scott
Reinforcement Learning (RL) Sutton and Barto [2018] and Probabilistic planning Puterman [2014] are two research branches that address stochastic problems, often under the Markov assumption for state dynamics. The planning approach requires a given model, while the learning approach improves through repeated interaction with an environment, which can be viewed as a black box. Thus, the tools and the benchmarks for these two branches have grown apart. Learning agents do not require to be able to simulate model-based transitions, and thus frameworks such as OpenAI Gym Brockman et al. [2016] have become a standard, serving also as an interface for third-party benchmarks such as Todorov et al. [2012], Bellemare et al. [2013] and more. As the model is not necessary for solving the learning problem, the environments are hard-coded in a programming language. This has several downsides; if one does wish to see the model describing the environment, it has to be reverse-engineered from the environment framework, complex problems can result in a significant development period, code bugs may make their way into the environment and finally, there is no clean way to verify the model or reuse it directly. Thus, the creation of a verified acceptable benchmark is a challenging task. Planning agents on the other hand can interact with an environment Sanner [2010a], but in many cases simulate the model within the planning agent in order to solve the problem Keller and Eyerich [2012]. The planning community has also come up with formal description languages for various types of problems; these include the Planning Domain Definition Language (PDDL) Aeronautiques et al. [1998] for classical planning problems, PDDL2.1 Fox and Long [2003] for problems involving time and continuous variables, PPDDL Bryce and Buet [2008] for classical planning problems with action probabilistic effects and rewards, and Relational Dynamic Influence Diagram Language (RDDL)
Methods and Mechanisms for Interactive Novelty Handling in Adversarial Environments
Thai, Tung, Shen, Ming, Garg, Mayank, Kalani, Ayush, Vaidya, Nakul, Soni, Utkarsh, Verma, Mudit, Gopalakrishnan, Sriram, Varshney, Neeraj, Baral, Chitta, Kambhampati, Subbarao, Sinapov, Jivko, Scheutz, Matthias
Examples of such domains are "perfect information Learning to detect, characterize and accommodate novelties is a games" such as Chess, Go, or Ms.Pac-man, where the rules challenge that agents operating in open-world domains need to of the game, the goals of the players, and the entire state of the address to be able to guarantee satisfactory task performance. Certain game are always known by all agents [10, 24, 30]. This characteristic novelties (e.g., changes in environment dynamics) can interfere simplifies the game AI behavior by limiting the number of novelties with the performance or prevent agents from accomplishing task to instances of known types (e.g., a chess move with the bishop goals altogether. In this paper, we introduce general methods and a player has not seen before), thus allowing the development of architectural mechanisms for detecting and characterizing different the game AI without needing to anticipate any unknown scenarios types of novelties, and for building an appropriate adaptive within the bounds of the system (e.g., a novel piece with novel rules model to accommodate them utilizing logical representations and being introduced).
Synthesizing Policies That Account For Human Execution Errors Caused By State-Aliasing In Markov Decision Processes
Gopalakrishnan, Sriram, Verma, Mudit, Kambhampati, Subbarao
When humans are given a policy to execute, there can be policy execution errors and deviations in execution if there is uncertainty in identifying a state. So an algorithm that computes a policy for a human to execute ought to consider these effects in its computations. An optimal MDP policy that is poorly executed (because of a human agent) maybe much worse than another policy that is executed with fewer errors. In this paper, we consider the problems of erroneous execution and execution delay when computing policies for a human agent that would act in a setting modeled by a Markov Decision Process. We present a framework to model the likelihood of policy execution errors and likelihood of non-policy actions like inaction (delays) due to state uncertainty. This is followed by a hill climbing algorithm to search for good policies that account for these errors. We then use the best policy found by hill climbing with a branch and bound algorithm to find the optimal policy. We show experimental results in a Gridworld domain and analyze the performance of the two algorithms. We also present human studies that verify if our assumptions on policy execution by humans under state-aliasing are reasonable.
Integrating Planning, Execution and Monitoring in the presence of Open World Novelties: Case Study of an Open World Monopoly Solver
Gopalakrishnan, Sriram, Soni, Utkarsh, Thai, Tung, Lymperopoulos, Panagiotis, Scheutz, Matthias, Kambhampati, Subbarao
The game of monopoly is an adversarial multi-agent domain where there is no fixed goal other than to be the last player solvent, There are useful subgoals like monopolizing sets of properties, and developing them. There is also a lot of randomness from dice rolls, card-draws, and adversaries' strategies. This unpredictability is made worse when unknown novelties are added during gameplay. Given these challenges, Monopoly was one of the test beds chosen for the DARPA-SAILON program which aims to create agents that can detect and accommodate novelties. To handle the game complexities, we developed an agent that eschews complete plans, and adapts it's policy online as the game evolves. In the most recent independent evaluation in the SAILON program, our agent was the best performing agent on most measures. We herein present our approach and results.
Goal recognition via model-based and model-free techniques
Borrajo, Daniel, Gopalakrishnan, Sriram, Potluru, Vamsi K.
Humans interact with the world based on their inner motivations (goals) by performing actions. Those actions might be observable by financial institutions. In turn, financial institutions might log all these observed actions for better understanding human behavior. Examples of such interactions are investment operations (buying or selling options), account-related activities (creating accounts, making transactions, withdrawing money), digital interactions (utilizing the bank's web or mobile app for configuring alerts, or applying for a new credit card), or even illicit operations (such as fraud or money laundering). Once human behavior can be better understood, financial institutions can improve their processes allowing them to deepen the relationship with clients, offering targeted services (marketing), handling complaints-related interactions (operations), or performing fraud or money laundering investigations (compliance) [Borrajo et al., 2020].