Anand, Avishek
Model-free reinforcement learning with noisy actions for automated experimental control in optics
Richtmann, Lea, Schmiesing, Viktoria-S., Wilken, Dennis, Heine, Jan, Tranter, Aaron, Anand, Avishek, Osborne, Tobias J., Heurs, Michèle
Experimental control involves a lot of manual effort with non-trivial decisions for precise adjustments. Here, we study the automatic experimental alignment for coupling laser light into an optical fiber using reinforcement learning (RL). We face several real-world challenges, such as time-consuming training, partial observability, and noisy actions due to imprecision in the mirror steering motors. We show that we can overcome these challenges: To save time, we use a virtual testbed to tune our environment for dealing with partial observability and use relatively sample-efficient model-free RL algorithms like Soft Actor-Critic (SAC) or Truncated Quantile Critics (TQC). Furthermore, by fully training on the experiment, the agent learns directly to handle the noise present. In our extensive experimentation, we show that we are able to achieve 90% coupling, showcasing the effectiveness of our proposed approaches. We reach this efficiency, which is comparable to that of a human expert, without additional feedback loops despite the motors' inaccuracies. Our result is an example of the readiness of RL for real-world tasks. We consider RL a promising tool for reducing the workload in labs.
QuanTemp: A real-world open-domain benchmark for fact-checking numerical claims
V, Venktesh, Anand, Abhijit, Anand, Avishek, Setty, Vinay
Automated fact checking has gained immense interest to tackle the growing misinformation in the digital era. Existing systems primarily focus on synthetic claims on Wikipedia, and noteworthy progress has also been made on real-world claims. In this work, we release QuanTemp, a diverse, multi-domain dataset focused exclusively on numerical claims, encompassing temporal, statistical and diverse aspects with fine-grained metadata and an evidence collection without leakage. This addresses the challenge of verifying real-world numerical claims, which are complex and often lack precise information, not addressed by existing works that mainly focus on synthetic claims. We evaluate and quantify the limitations of existing solutions for the task of verifying numerical claims. We also evaluate claim decomposition based methods, numerical understanding based models and our best baselines achieves a macro-F1 of 58.32. This demonstrates that QuanTemp serves as a challenging evaluation set for numerical claim verification.
Temporal Blind Spots in Large Language Models
Wallat, Jonas, Jatowt, Adam, Anand, Avishek
Large language models (LLMs) have recently gained significant attention due to their unparalleled ability to perform various natural language processing tasks. These models, benefiting from their advanced natural language understanding capabilities, have demonstrated impressive zero-shot performance. However, the pre-training data utilized in LLMs is often confined to a specific corpus, resulting in inherent freshness and temporal scope limitations. Consequently, this raises concerns regarding the effectiveness of LLMs for tasks involving temporal intents. In this study, we aim to investigate the underlying limitations of general-purpose LLMs when deployed for tasks that require a temporal understanding. We pay particular attention to handling factual temporal knowledge through three popular temporal QA datasets. Specifically, we observe low performance on detailed questions about the past and, surprisingly, for rather new information. In manual and automatic testing, we find multiple temporal errors and characterize the conditions under which QA performance deteriorates. Our analysis contributes to understanding LLM limitations and offers valuable insights into developing future models that can better cater to the demands of temporally-oriented tasks. The code is available\footnote{https://github.com/jwallat/temporalblindspots}.
In-Context Ability Transfer for Question Decomposition in Complex QA
V, Venktesh, Bhattacharya, Sourangshu, Anand, Avishek
Answering complex questions is a challenging task that requires question decomposition and multistep reasoning for arriving at the solution. While existing supervised and unsupervised approaches are specialized to a certain task and involve training, recently proposed prompt-based approaches offer generalizable solutions to tackle a wide variety of complex question-answering (QA) tasks. However, existing prompt-based approaches that are effective for complex QA tasks involve expensive hand annotations from experts in the form of rationales and are not generalizable to newer complex QA scenarios and tasks. We propose, icat (In-Context Ability Transfer) which induces reasoning capabilities in LLMs without any LLM fine-tuning or manual annotation of in-context samples. We transfer the ability to decompose complex questions to simpler questions or generate step-by-step rationales to LLMs, by careful selection from available data sources of related tasks. We also propose an automated uncertainty-aware exemplar selection approach for selecting examples from transfer data sources. Finally, we conduct large-scale experiments on a variety of complex QA tasks involving numerical reasoning, compositional complex QA, and heterogeneous complex QA which require decomposed reasoning. We show that ICAT convincingly outperforms existing prompt-based solutions without involving any model training, showcasing the benefits of re-using existing abilities.
DINE: Dimensional Interpretability of Node Embeddings
Piaggesi, Simone, Khosla, Megha, Panisson, André, Anand, Avishek
Graphs are ubiquitous due to their flexibility in representing social and technological systems as networks of interacting elements. Graph representation learning methods, such as node embeddings, are powerful approaches to map nodes into a latent vector space, allowing their use for various graph tasks. Despite their success, only few studies have focused on explaining node embeddings locally. Moreover, global explanations of node embeddings remain unexplored, limiting interpretability and debugging potentials. We address this gap by developing human-understandable explanations for dimensions in node embeddings. Towards that, we first develop new metrics that measure the global interpretability of embedding vectors based on the marginal contribution of the embedding dimensions to predicting graph structure. We say that an embedding dimension is more interpretable if it can faithfully map to an understandable sub-structure in the input graph - like community structure. Having observed that standard node embeddings have low interpretability, we then introduce DINE (Dimension-based Interpretable Node Embedding), a novel approach that can retrofit existing node embeddings by making them more interpretable without sacrificing their task performance. We conduct extensive experiments on synthetic and real-world graphs and show that we can simultaneously learn highly interpretable node embeddings with effective performance in link prediction.
Context Aware Query Rewriting for Text Rankers using LLM
Anand, Abhijit, V, Venktesh, Setty, Vinay, Anand, Avishek
Query rewriting refers to an established family of approaches that are applied to underspecified and ambiguous queries to overcome the vocabulary mismatch problem in document ranking. Queries are typically rewritten during query processing time for better query modelling for the downstream ranker. With the advent of large-language models (LLMs), there have been initial investigations into using generative approaches to generate pseudo documents to tackle this inherent vocabulary gap. In this work, we analyze the utility of LLMs for improved query rewriting for text ranking tasks. We find that there are two inherent limitations of using LLMs as query re-writers -- concept drift when using only queries as prompts and large inference costs during query processing. We adopt a simple, yet surprisingly effective, approach called context aware query rewriting (CAR) to leverage the benefits of LLMs for query understanding. Firstly, we rewrite ambiguous training queries by context-aware prompting of LLMs, where we use only relevant documents as context.Unlike existing approaches, we use LLM-based query rewriting only during the training phase. Eventually, a ranker is fine-tuned on the rewritten queries instead of the original queries during training. In our extensive experiments, we find that fine-tuning a ranker using re-written queries offers a significant improvement of up to 33% on the passage ranking task and up to 28% on the document ranking task when compared to the baseline performance of using original queries.
Query Understanding in the Age of Large Language Models
Anand, Avishek, V, Venktesh, Anand, Abhijit, Setty, Vinay
Querying, conversing, and controlling search and information-seeking interfaces using natural language are fast becoming ubiquitous with the rise and adoption of large-language models (LLM). In this position paper, we describe a generic framework for interactive query-rewriting using LLMs. Our proposal aims to unfold new opportunities for improved and transparent intent understanding while building high-performance retrieval systems using LLMs. A key aspect of our framework is the ability of the rewriter to fully specify the machine intent by the search engine in natural language that can be further refined, controlled, and edited before the final retrieval phase. The ability to present, interact, and reason over the underlying machine intent in natural language has profound implications on transparency, ranking performance, and a departure from the traditional way in which supervised signals were collected for understanding intents. We detail the concept, backed by initial experiments, along with open questions for this interactive query understanding framework.
The Effect of Masking Strategies on Knowledge Retention by Language Models
Wallat, Jonas, Zhang, Tianyi, Anand, Avishek
Language models retain a significant amount of world knowledge from their pre-training stage. This allows knowledgeable models to be applied to knowledge-intensive tasks prevalent in information retrieval, such as ranking or question answering. Understanding how and which factual information is acquired by our models is necessary to build responsible models. However, limited work has been done to understand the effect of pre-training tasks on the amount of knowledge captured and forgotten by language models during pre-training. Building a better understanding of knowledge acquisition is the goal of this paper. Therefore, we utilize a selection of pre-training tasks to infuse knowledge into our model. In the following steps, we test the model's knowledge retention by measuring its ability to answer factual questions. Our experiments show that masking entities and principled masking of correlated spans based on pointwise mutual information lead to more factual knowledge being retained than masking random tokens. Our findings demonstrate that, like the ability to perform a task, the (factual) knowledge acquired from being trained on that task is forgotten when a model is trained to perform another task (catastrophic forgetting) and how to prevent this phenomenon. To foster reproducibility, the code, as well as the data used in this paper, are openly available.
FaxPlainAC: A Fact-Checking Tool Based on EXPLAINable Models with HumAn Correction in the Loop
Zhang, Zijian, Rudra, Koustav, Anand, Avishek
Fact-checking on the Web has become the main mechanism through which we detect the credibility of the news or information. Existing fact-checkers verify the authenticity of the information (support or refute the claim) based on secondary sources of information. However, existing approaches do not consider the problem of model updates due to constantly increasing training data due to user feedback. It is therefore important to conduct user studies to correct models' inference biases and improve the model in a life-long learning manner in the future according to the user feedback. In this paper, we present FaxPlainAC, a tool that gathers user feedback on the output of explainable fact-checking models. FaxPlainAC outputs both the model decision, i.e., whether the input fact is true or not, along with the supporting/refuting evidence considered by the model. Additionally, FaxPlainAC allows for accepting user feedback both on the prediction and explanation. Developed in Python, FaxPlainAC is designed as a modular and easily deployable tool. It can be integrated with other downstream tasks and allowing for fact-checking human annotation gathering and life-long learning.
Zorro: Valid, Sparse, and Stable Explanations in Graph Neural Networks
Funke, Thorben, Khosla, Megha, Anand, Avishek
With the ever-increasing popularity and applications of graph neural networks, several proposals have been made to interpret and understand the decisions of a GNN model. Explanations for a GNN model differ in principle from other input settings. It is important to attribute the decision to input features and other related instances connected by the graph structure. We find that the previous explanation generation approaches that maximize the mutual information between the label distribution produced by the GNN model and the explanation to be restrictive. Specifically, existing approaches do not enforce explanations to be predictive, sparse, or robust to input perturbations. In this paper, we lay down some of the fundamental principles that an explanation method for GNNs should follow and introduce a metric fidelity as a measure of the explanation's effectiveness. We propose a novel approach Zorro based on the principles from rate-distortion theory that uses a simple combinatorial procedure to optimize for fidelity. Extensive experiments on real and synthetic datasets reveal that Zorro produces sparser, stable, and more faithful explanations than existing GNN explanation approaches.