Not enough data to create a plot.
Try a different view from the menu above.
Singh, Moninder
Ranking Large Language Models without Ground Truth
Dhurandhar, Amit, Nair, Rahul, Singh, Moninder, Daly, Elizabeth, Ramamurthy, Karthikeyan Natesan
Evaluation and ranking of large language models (LLMs) has become an important problem with the proliferation of these models and their impact. Evaluation methods either require human responses which are expensive to acquire or use pairs of LLMs to evaluate each other which can be unreliable. In this paper, we provide a novel perspective where, given a dataset of prompts (viz. questions, instructions, etc.) and a set of LLMs, we rank them without access to any ground truth or reference responses. Inspired by real life where both an expert and a knowledgeable person can identify a novice our main idea is to consider triplets of models, where each one of them evaluates the other two, correctly identifying the worst model in the triplet with high probability. We also analyze our idea and provide sufficient conditions for it to succeed. Applying this idea repeatedly, we propose two methods to rank LLMs. In experiments on different generative tasks (summarization, multiple-choice, and dialog), our methods reliably recover close to true rankings without reference data. This points to a viable low-resource mechanism for practical use.
Reasoning about concepts with LLMs: Inconsistencies abound
Uceda-Sosa, Rosario, Ramamurthy, Karthikeyan Natesan, Chang, Maria, Singh, Moninder
The ability to summarize and organize knowledge into abstract concepts is key to learning and reasoning. Many industrial applications rely on the consistent and systematic use of concepts, especially when dealing with decision-critical knowledge. However, we demonstrate that, when methodically questioned, large language models (LLMs) often display and demonstrate significant inconsistencies in their knowledge. Computationally, the basic aspects of the conceptualization of a given domain can be represented as Is-A hierarchies in a knowledge graph (KG) or ontology, together with a few properties or axioms that enable straightforward reasoning. We show that even simple ontologies can be used to reveal conceptual inconsistencies across several LLMs. We also propose strategies that domain experts can use to evaluate and improve the coverage of key domain concepts in LLMs of various sizes. In particular, we have been able to significantly enhance the performance of LLMs of various sizes with openly available weights using simple knowledge-graph (KG) based prompting strategies.
Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations
Achintalwar, Swapnaja, Baldini, Ioana, Bouneffouf, Djallel, Byamugisha, Joan, Chang, Maria, Dognin, Pierre, Farchi, Eitan, Makondo, Ndivhuwo, Mojsilovic, Aleksandra, Nagireddy, Manish, Ramamurthy, Karthikeyan Natesan, Padhi, Inkit, Raz, Orna, Rios, Jesus, Sattigeri, Prasanna, Singh, Moninder, Thwala, Siphiwe, Uceda-Sosa, Rosario A., Varshney, Kush R.
The alignment of large language models is usually done by model providers to add or control behaviors that are common or universally understood across use cases and contexts. In contrast, in this article, we present an approach and architecture that empowers application developers to tune a model to their particular values, social norms, laws and other regulations, and orchestrate between potentially conflicting requirements in context. We lay out three main components of such an Alignment Studio architecture: Framers, Instructors, and Auditors that work in concert to control the behavior of a language model. We illustrate this approach with a running example of aligning a company's internal-facing enterprise chatbot to its business conduct guidelines.
SocialStigmaQA: A Benchmark to Uncover Stigma Amplification in Generative Language Models
Nagireddy, Manish, Chiazor, Lamogha, Singh, Moninder, Baldini, Ioana
Current datasets for unwanted social bias auditing are limited to studying protected demographic features such as race and gender. In this work, we introduce a comprehensive benchmark that is meant to capture the amplification of social bias, via stigmas, in generative language models. Taking inspiration from social science research, we start with a documented list of 93 US-centric stigmas and curate a question-answering (QA) dataset which involves simple social situations. Our benchmark, SocialStigmaQA, contains roughly 10K prompts, with a variety of prompt styles, carefully constructed to systematically test for both social bias and model robustness. We present results for SocialStigmaQA with two open source generative language models and we find that the proportion of socially biased output ranges from 45% to 59% across a variety of decoding strategies and prompting styles. We demonstrate that the deliberate design of the templates in our benchmark (e.g., adding biasing text to the prompt or using different verbs that change the answer that indicates bias) impacts the model tendencies to generate socially biased output. Additionally, through manual evaluation, we discover problematic patterns in the generated chain-of-thought output that range from subtle bias to lack of reasoning. Warning: This paper contains examples of text which are toxic, biased, and potentially harmful.
Function Composition in Trustworthy Machine Learning: Implementation Choices, Insights, and Questions
Nagireddy, Manish, Singh, Moninder, Hoffman, Samuel C., Ju, Evaline, Ramamurthy, Karthikeyan Natesan, Varshney, Kush R.
Ensuring trustworthiness in machine learning (ML) models is a multi-dimensional task. In addition to the traditional notion of predictive performance, other notions such as privacy, fairness, robustness to distribution shift, adversarial robustness, interpretability, explainability, and uncertainty quantification are important considerations to evaluate and improve (if deficient). However, these sub-disciplines or 'pillars' of trustworthiness have largely developed independently, which has limited us from understanding their interactions in real-world ML pipelines. In this paper, focusing specifically on compositions of functions arising from the different pillars, we aim to reduce this gap, develop new insights for trustworthy ML, and answer questions such as the following. Does the composition of multiple fairness interventions result in a fairer model compared to a single intervention? How do bias mitigation algorithms for fairness affect local post-hoc explanations? Does a defense algorithm for untargeted adversarial attacks continue to be effective when composed with a privacy transformation? Toward this end, we report initial empirical results and new insights from 9 different compositions of functions (or pipelines) on 7 real-world datasets along two trustworthy dimensions - fairness and explainability. We also report progress, and implementation choices, on an extensible composer tool to encourage the combination of functionalities from multiple pillars. To-date, the tool supports bias mitigation algorithms for fairness and post-hoc explainability methods. We hope this line of work encourages the thoughtful consideration of multiple pillars when attempting to formulate and resolve a trustworthiness problem.
AI Explainability 360: Impact and Design
Arya, Vijay, Bellamy, Rachel K. E., Chen, Pin-Yu, Dhurandhar, Amit, Hind, Michael, Hoffman, Samuel C., Houde, Stephanie, Liao, Q. Vera, Luss, Ronny, Mojsilovic, Aleksandra, Mourad, Sami, Pedemonte, Pablo, Raghavendra, Ramya, Richards, John, Sattigeri, Prasanna, Shanmugam, Karthikeyan, Singh, Moninder, Varshney, Kush R., Wei, Dennis, Zhang, Yunfeng
We also introduced a taxonomy to The increasing use of artificial intelligence (AI) systems in navigate the space of explanation methods, not only the ten high stakes domains has been coupled with an increase in societal in the toolkit but also the broader literature on explainable demands for these systems to provide explanations for AI. The taxonomy was intended to be usable by consumers their outputs. This societal demand has already resulted in with varied backgrounds to choose an appropriate explanation new regulations requiring explanations (Goodman and Flaxman method for their application. AIX360 differs from other 2016; Wachter, Mittelstadt, and Floridi 2017; Selbst open source explainability toolkits (see Arya et al. (2020) and Powles 2017; Pasternak 2019). Explanations can allow for a list) in two main ways: 1) its support for a broad and users to gain insight into the system's decision-making process, diverse spectrum of explainability methods, implemented in which is a key component in calibrating appropriate a common architecture, and 2) its educational material as trust and confidence in AI systems (Doshi-Velez and Kim discussed below.
One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques
Arya, Vijay, Bellamy, Rachel K. E., Chen, Pin-Yu, Dhurandhar, Amit, Hind, Michael, Hoffman, Samuel C., Houde, Stephanie, Liao, Q. Vera, Luss, Ronny, Mojsilović, Aleksandra, Mourad, Sami, Pedemonte, Pablo, Raghavendra, Ramya, Richards, John, Sattigeri, Prasanna, Shanmugam, Karthikeyan, Singh, Moninder, Varshney, Kush R., Wei, Dennis, Zhang, Yunfeng
As artificial intelligence and machine learning algorithms make further inroads into society, calls are increasing from multiple stakeholders for these algorithms to explain their outputs. At the same time, these stakeholders, whether they be affected citizens, government regulators, domain experts, or system developers, present different requirements for explanations. Toward addressing these needs, we introduce AI Explainability 360 (http://aix360.mybluemix.net/), an open-source software toolkit featuring eight diverse and state-of-the-art explainability methods and two evaluation metrics. Equally important, we provide a taxonomy to help entities requiring explanations to navigate the space of explanation methods, not only those in the toolkit but also in the broader literature on explainability. For data scientists and other users of the toolkit, we have implemented an extensible software architecture that organizes methods according to their place in the AI modeling pipeline. We also discuss enhancements to bring research innovations closer to consumers of explanations, ranging from simplified, more accessible versions of algorithms, to tutorials and an interactive web demo to introduce AI explainability to different audiences and application domains. Together, our toolkit and taxonomy can help identify gaps where more explainability methods are needed and provide a platform to incorporate them as they are developed.
AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias
Bellamy, Rachel K. E., Dey, Kuntal, Hind, Michael, Hoffman, Samuel C., Houde, Stephanie, Kannan, Kalapriya, Lohia, Pranay, Martino, Jacquelyn, Mehta, Sameep, Mojsilovic, Aleksandra, Nagar, Seema, Ramamurthy, Karthikeyan Natesan, Richards, John, Saha, Diptikalyan, Sattigeri, Prasanna, Singh, Moninder, Varshney, Kush R., Zhang, Yunfeng
We used Python's Flask framework for building the service and exposed a REST API that generates a bias report based on the following input parameters from a user: the dataset name, the protected attributes, the privileged and unprivileged groups, the chosen fairness metrics, and the chosen mitigation algorithm, if any. With these inputs, the back-end then runs a series of steps to 1) split the dataset into training, development, and validation sets; 2) train a logistic regression classifier on the training set; 3) run the bias-checking metrics on the classifier against the test dataset; 4) if a mitigation algorithm is chosen, run the mitigation algorithm with the appropriate pipeline (pre-processing, in-processing, or post-processing). The end result is then cached so that if the exact same inputs are provided, the result can be directly retrieved from cache and no additional computation is needed. The reason to truly use the toolkit code in serving the Web application rather than having a pre-computed lookup table of results is twofold: we want to make the app a real representation of the underlying capabilities (in fact, creating the Web app helped us debug a few items in the code), and we also avoid any issues of synchronizing updates to the metrics, explainers, and algorithms with the results shown: synchronization is automatic. Currently, the service is limited to three built-in datasets, but it can be expanded to support the user's own data upload. The service is also limited to building logistic regression classifiers, but again this can be expanded. Such expansions can be more easily implemented if this fairness service is integrated into a full AI suite that provides various classifier options and data storage solutions.
Interpretable Multi-Objective Reinforcement Learning through Policy Orchestration
Noothigattu, Ritesh, Bouneffouf, Djallel, Mattei, Nicholas, Chandra, Rachita, Madan, Piyush, Varshney, Kush, Campbell, Murray, Singh, Moninder, Rossi, Francesca
Autonomous cyber-physical agents and systems play an increasingly large role in our lives. To ensure that agents behave in ways aligned with the values of the societies in which they operate, we must develop techniques that allow these agents to not only maximize their reward in an environment, but also to learn and follow the implicit constraints of society. These constraints and norms can come from any number of sources including regulations, business process guidelines, laws, ethical principles, social norms, and moral values. We detail a novel approach that uses inverse reinforcement learning to learn a set of unspecified constraints from demonstrations of the task, and reinforcement learning to learn to maximize the environment rewards. More precisely, we assume that an agent can observe traces of behavior of members of the society but has no access to the explicit set of constraints that give rise to the observed behavior. Inverse reinforcement learning is used to learn such constraints, that are then combined with a possibly orthogonal value function through the use of a contextual bandit-based orchestrator that picks a contextually-appropriate choice between the two policies (constraint-based and environment reward-based) when taking actions. The contextual bandit orchestrator allows the agent to mix policies in novel ways, taking the best actions from either a reward maximizing or constrained policy. In addition, the orchestrator is transparent on which policy is being employed at each time step. We test our algorithms using a Pac-Man domain and show that the agent is able to learn to act optimally, act within the demonstrated constraints, and mix these two functions in complex ways.
Assessing National Development Plans for Alignment With Sustainable Development Goals via Semantic Search
Galsurkar, Jonathan (IBM T.J. Watson Research Center) | Singh, Moninder (IBM T.J. Watson Research Center) | Wu, Lingfei (IBM T.J. Watson Research Center) | Vempaty, Aditya (IBM T.J. Watson Research Center) | Sushkov, Mikhail (IBM Watson) | Iyer, Devika (United Nations Development Programme) | Kapto, Serge (United Nations Development Programme) | Varshney, Kush R. (IBM T.J. Watson Research Center)
The United Nations Development Programme (UNDP) helps countries implement the United Nations (UN) Sustainable Development Goals (SDGs), an agenda for tackling major societal issues such as poverty, hunger, and environmental degradation by the year 2030. A key service provided by UNDP to countries that seek it is a review of national development plans and sector strategies by policy experts to assess alignment of national targets with one or more of the 169 targets of the 17 SDGs. Known as the Rapid Integrated Assessment (RIA), this process involves manual review of hundreds, if not thousands, of pages of documents and takes weeks to complete. In this work, we develop a natural language processing-based methodology to accelerate the workflow of policy experts. Specifically we use paragraph embedding techniques to find paragraphs in the documents that match the semantic concepts of each of the SDG targets. One novel technical contribution of our work is in our use of historical RIAs from other countries as a form of neighborhood-based supervision for matches in the country under study. We have successfully piloted the algorithm to perform the RIA for Papua New Guinea’s national plan, with the UNDP estimating it will help reduce their completion time from an estimated 3-4 weeks to 3 days.