Goto

Collaborating Authors

 South America


A Comprehensive Survey on Self-Interpretable Neural Networks

arXiv.org Artificial Intelligence

Neural networks have achieved remarkable success across various fields. However, the lack of interpretability limits their practical use, particularly in critical decision-making scenarios. Post-hoc interpretability, which provides explanations for pre-trained models, is often at risk of robustness and fidelity. This has inspired a rising interest in self-interpretable neural networks, which inherently reveal the prediction rationale through the model structures. Although there exist surveys on post-hoc interpretability, a comprehensive and systematic survey of self-interpretable neural networks is still missing. To address this gap, we first collect and review existing works on self-interpretable neural networks and provide a structured summary of their methodologies from five key perspectives: attribution-based, function-based, concept-based, prototype-based, and rule-based self-interpretation. We also present concrete, visualized examples of model explanations and discuss their applicability across diverse scenarios, including image, text, graph data, and deep reinforcement learning. Additionally, we summarize existing evaluation metrics for self-interpretability and identify open challenges in this field, offering insights for future research. To support ongoing developments, we present a publicly accessible resource to track advancements in this domain: https://github.com/yangji721/Awesome-Self-Interpretable-Neural-Network.


Regulating Multifunctionality

arXiv.org Artificial Intelligence

Forthcoming in Philipp Hacker, Andreas Engel, Sarah Hammer and Brent Mittelstadt (eds) The Oxford Handbook on the Foundations and Regulation of Generative AI (Oxford University Press) Abstract Foundation models and generative artificial intelligence (AI) exacerbate a core regulatory challenge associated with AI: its heterogeneity. By their very nature, foundation models and generative AI can perform multiple functions for their users, thus presenting a vast array of different risks. This multifunctionality means that prescriptive, one-size-fits-all regulation will not be a viable option. Even performance standards and ex post liability--regulatory approaches that usually afford flexibility--are unlikely to be strong candidates for responding to multifunctional AI's risks, given challenges in monitoring and enforcement. Regulators will do well instead to promote proactive risk management on the part of developers and users by using management-based regulation, an approach that has proven effective in other contexts of heterogeneity. Regulators will also need to maintain ongoing vigilance and agility. More than in other contexts, regulators of multifunctional AI will need sufficient resources, top human talent and leadership, and organizational cultures committed to regulatory excellence. Consider one of humanity's most primal of tools: the knife [30]. The knife is not a singular tool; rather, it comes in many different varieties that serve many functions, each of which can generate value for society. Knives are used in the kitchen to prepare delicious meals, and then they are used by diners to consume those same meals. Knives carve objects, cut rope, and open packages. They clear paths through forests and jungles, and they help in harvesting seasonal crops. Knives can be used, of course, to injure or kill people. But in the hands of surgeons, knives are routinely used to save lives. And even though knives take many different forms and are often designed for many different purposes--think of, for example, the many types and sizes of surgical scalpels, woodcarver's chisels, and kitchen implements, among others--knives designed for one purpose also can be adapted for different uses, as anyone who has used a dinner knife to open a postal letter can attest. Many knives, though, are deliberately intended to serve multiple functions, as is the case with a simple pocketknife or, even more emblematically, the classic Swiss army knife, some models of which boast a combination of more than 30 different tools in one. The proliferation of functions performed by different knives has led over the years to different forms and sources of rules governing their manufacture, sale, and deployment.


PIP: Perturbation-based Iterative Pruning for Large Language Models

arXiv.org Artificial Intelligence

The rapid increase in the parameter counts of Large Language Models (LLMs), reaching billions or even trillions, presents significant challenges for their practical deployment, particularly in resource-constrained environments. To ease this issue, we propose PIP (Perturbation-based Iterative Pruning), a novel double-view structured pruning method to optimize LLMs, which combines information from two different views: the unperturbed view and the perturbed view. With the calculation of gradient differences, PIP iteratively prunes those that struggle to distinguish between these two views. Our experiments show that PIP reduces the parameter count by approximately 20% while retaining over 85% of the original model's accuracy across varied benchmarks. In some cases, the performance of the pruned model is within 5% of the unpruned version, demonstrating PIP's ability to preserve key aspects of model effectiveness. Moreover, PIP consistently outperforms existing state-of-the-art (SOTA) structured pruning methods, establishing it as a leading technique for optimizing LLMs in environments with constrained resources. Our code is available at: https://github.com/caoyiiiiii/PIP.


On The Truthfulness of 'Surprisingly Likely' Responses of Large Language Models

arXiv.org Artificial Intelligence

The principle of rewarding a crowd for surprisingly common answers has been used in the literature for designing a number of truthful information elicitation mechanisms. A related method has also been proposed in the literature for better aggregation of crowd wisdom. Drawing a comparison between crowd based collective intelligence systems and large language models, we define the notion of 'surprisingly likely' textual response of a large language model. This notion is inspired by the surprisingly common principle, but tailored for text in a language model. Using benchmarks such as TruthfulQA and openly available LLMs: GPT-2 and LLaMA-2, we show that the surprisingly likely textual responses of large language models are more accurate in many cases compared to standard baselines. For example, we observe up to 24 percentage points aggregate improvement on TruthfulQA and up to 70 percentage points improvement on individual categories of questions in this benchmark. We also provide further analysis of the results, including the cases when surprisingly likely responses are less or not more accurate.


Computational Protein Science in the Era of Large Language Models (LLMs)

arXiv.org Artificial Intelligence

Considering the significance of proteins, computational protein science has always been a critical scientific field, dedicated to revealing knowledge and developing applications within the protein sequence-structure-function paradigm. In the last few decades, Artificial Intelligence (AI) has made significant impacts in computational protein science, leading to notable successes in specific protein modeling tasks. However, those previous AI models still meet limitations, such as the difficulty in comprehending the semantics of protein sequences, and the inability to generalize across a wide range of protein modeling tasks. Recently, LLMs have emerged as a milestone in AI due to their unprecedented language processing & generalization capability. They can promote comprehensive progress in fields rather than solving individual tasks. As a result, researchers have actively introduced LLM techniques in computational protein science, developing protein Language Models (pLMs) that skillfully grasp the foundational knowledge of proteins and can be effectively generalized to solve a diversity of sequence-structure-function reasoning problems. While witnessing prosperous developments, it's necessary to present a systematic overview of computational protein science empowered by LLM techniques. First, we summarize existing pLMs into categories based on their mastered protein knowledge, i.e., underlying sequence patterns, explicit structural and functional information, and external scientific languages. Second, we introduce the utilization and adaptation of pLMs, highlighting their remarkable achievements in promoting protein structure prediction, protein function prediction, and protein design studies. Then, we describe the practical application of pLMs in antibody design, enzyme design, and drug discovery. Finally, we specifically discuss the promising future directions in this fast-growing field.


Fairness in LLM-Generated Surveys

arXiv.org Artificial Intelligence

Large Language Models (LLMs) excel in text generation and understanding, especially in simulating socio-political and economic patterns, serving as an alternative to traditional surveys. However, their global applicability remains questionable due to unexplored biases across socio-demographic and geographic contexts. This study examines how LLMs perform across diverse populations by analyzing public surveys from Chile and the United States, focusing on predictive accuracy and fairness metrics. The results show performance disparities, with LLM consistently outperforming on U.S. datasets. This bias originates from the U.S.-centric training data, remaining evident after accounting for socio-demographic differences. In the U.S., political identity and race significantly influence prediction accuracy, while in Chile, gender, education, and religious affiliation play more pronounced roles. Our study presents a novel framework for measuring socio-demographic biases in LLMs, offering a path toward ensuring fairer and more equitable model performance across diverse socio-cultural contexts.


Dialogue Systems for Emotional Support via Value Reinforcement

arXiv.org Artificial Intelligence

Emotional support dialogue systems aim to reduce help-seekers' distress and help them overcome challenges. While human values$\unicode{x2013}$core beliefs that shape an individual's priorities$\unicode{x2013}$are increasingly emphasized in contemporary psychological therapy for their role in fostering internal transformation and long-term emotional well-being, their integration into emotional support systems remains underexplored. To bridge this gap, we present a value-driven method for training emotional support dialogue systems designed to reinforce positive values in seekers. Our model learns to identify which values to reinforce at each turn and how to do so, by leveraging online support conversations from Reddit. The model demonstrated superior performance in emotional support capabilities, outperforming various baselines. Notably, it more effectively explored and elicited values from seekers. Expert assessments by therapists highlighted two key strengths of our model: its ability to validate users' challenges and its effectiveness in emphasizing positive aspects of their situations$\unicode{x2013}$both crucial elements of value reinforcement. Our work validates the effectiveness of value reinforcement for emotional support systems and establishes a foundation for future research.


How We Connected One Billion Lives Through Digital Technology

TIME - Tech

In an increasingly digital world, connectivity is a necessity. Yet, nearly a third of the global population remains offline, unable to access the services vital to participating in our global digital economy and society. The Edison Alliance at the World Economic Forum has worked to change that by delivering digital connectivity and access to financial, healthcare, and education services to those who need them most. Our partnerships with governments, industries, and non-governmental organizations drive lasting systemic change. The World Economic Forum played a pivotal role in launching and guiding the Alliance's work, providing a platform for stakeholders to come together and commit to a vision with actionable ideas and plans.


Review and Recommendations for using Artificial Intelligence in Intracoronary Optical Coherence Tomography Analysis

arXiv.org Artificial Intelligence

Artificial intelligence (AI) methodologies hold great promise for the rapid and accurate diagnosis of coronary artery disease (CAD) from intravascular optical coherent tomography (IVOCT) images. Numerous papers have been published describing AI-based models for different diagnostic tasks, yet it remains unclear which models have potential clinical utility and have been properly validated. This systematic review considered published literature between January 2015 and February 2023 describing AI-based diagnosis of CAD using IVOCT. Our search identified 5,576 studies, with 513 included after initial screening and 35 studies included in the final systematic review after quality screening. Our findings indicate that most of the identified models are not currently suitable for clinical use, primarily due to methodological flaws and underlying biases. To address these issues, we provide recommendations to improve model quality and research practices to enhance the development of clinically useful AI products.


An AI-Driven Live Systematic Reviews in the Brain-Heart Interconnectome: Minimizing Research Waste and Advancing Evidence Synthesis

arXiv.org Artificial Intelligence

The Brain-Heart Interconnectome (BHI) combines neurology and cardiology but is hindered by inefficiencies in evidence synthesis, poor adherence to quality standards, and research waste. To address these challenges, we developed an AI-driven system to enhance systematic reviews in the BHI domain. The system integrates automated detection of Population, Intervention, Comparator, Outcome, and Study design (PICOS), semantic search using vector embeddings, graph-based querying, and topic modeling to identify redundancies and underexplored areas. Core components include a Bi-LSTM model achieving 87% accuracy for PICOS compliance, a study design classifier with 95.7% accuracy, and Retrieval-Augmented Generation (RAG) with GPT-3.5, which outperformed GPT-4 for graph-based and topic-driven queries. The system provides real-time updates, reducing research waste through a living database and offering an interactive interface with dashboards and conversational AI. While initially developed for BHI, the system's adaptable architecture enables its application across various biomedical fields, supporting rigorous evidence synthesis, efficient resource allocation, and informed clinical decision-making.