AITopics

2410.21728

Country:

Atlantic Ocean (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Education (1.00)
Transportation (0.68)
Health & Medicine > Therapeutic Area (0.67)
Leisure & Entertainment > Sports (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceOct-29-2024

Can Knowledge Editing Really Correct Hallucinations?

Huang, Baixiang, Chen, Canyu, Xu, Xiongxiao, Payani, Ali, Shu, Kai

Large Language Models (LLMs) suffer from hallucinations, referring to the nonfactual information in generated content, despite their superior capacities across tasks. Meanwhile, knowledge editing has been developed as a new popular paradigm to correct the erroneous factual knowledge encoded in LLMs with the advantage of avoiding retraining from scratch. However, one common issue of existing evaluation datasets for knowledge editing is that they do not ensure LLMs actually generate hallucinated answers to the evaluation questions before editing. When LLMs are evaluated on such datasets after being edited by different techniques, it is hard to directly adopt the performance to assess the effectiveness of different knowledge editing methods in correcting hallucinations. Thus, the fundamental question remains insufficiently validated: Can knowledge editing really correct hallucinations in LLMs? We proposed HalluEditBench to holistically benchmark knowledge editing methods in correcting real-world hallucinations. First, we rigorously construct a massive hallucination dataset with 9 domains, 26 topics and more than 6, 000 hallucinations. Then, we assess the performance of knowledge editing methods in a holistic way on five dimensions including Efficacy, Generalization, Portability, Locality, and Robustness. Through HalluEditBench, we have provided new insights into the potentials and limitations of different knowledge editing methods in correcting hallucinations, which could inspire future improvements and facilitate the progress in the field of knowledge editing. Considering Table 1: Performance measured by Accuracy (%) the high cost of retraining LLMs from scratch, of Llama2-7B before editing ("Pre-edit") and after knowledge editing has been designed as a new applying typical knowledge editing methods ("Postedit") paradigm to correct erroneous or outdated factual on common existing evaluation datasets. When such datasets are adopted to evaluate the performance of LLMs after being edited, it is hard to directly use the scores to judge the effectiveness of different knowledge editing techniques in correcting hallucinations, which is the motivation of applying knowledge editing to LLMs. To better illustrate this point, following the evaluation setting in (Zhang et al., 2024e), we conducted a preliminary study to examine the pre-edit and post-edit performances of Llama2-7B on the aforementioned Who is the Chief Scientist of OpenAI? Who is the Chief Scientist of OpenAI? Who is the Chief Scientist of OpenAI?

arxiv preprint, editing, knowledge editing, (14 more...)

2410.16251

Country:

North America > Canada (0.04)
Europe > Poland (0.04)
South America > Venezuela > Gulf of Paria (0.04)
(12 more...)

Genre:

Overview (0.67)
Research Report (0.56)

Industry: Health & Medicine > Therapeutic Area (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.75)

Belief in the Machine: Investigating Epistemological Blind Spots of Language Models

Suzgun, Mirac, Gur, Tayfun, Bianchi, Federico, Ho, Daniel E., Icard, Thomas, Jurafsky, Dan, Zou, James

As language models (LMs) become integral to fields like healthcare, law, and journalism, their ability to differentiate between fact, belief, and knowledge is essential for reliable decision-making. Failure to grasp these distinctions can lead to significant consequences in areas such as medical diagnosis, legal judgments, and dissemination of fake news. Despite this, current literature has largely focused on more complex issues such as theory of mind, overlooking more fundamental epistemic challenges. This study systematically evaluates the epistemic reasoning capabilities of modern LMs, including GPT-4, Claude-3, and Llama-3, using a new dataset, KaBLE, consisting of 13,000 questions across 13 tasks. Our results reveal key limitations. First, while LMs achieve 86% accuracy on factual scenarios, their performance drops significantly with false scenarios, particularly in belief-related tasks. Second, LMs struggle with recognizing and affirming personal beliefs, especially when those beliefs contradict factual data, which raises concerns for applications in healthcare and counseling, where engaging with a person's beliefs is critical. Third, we identify a salient bias in how LMs process first-person versus third-person beliefs, performing better on third-person tasks (80.7%) compared to first-person tasks (54.4%). Fourth, LMs lack a robust understanding of the factive nature of knowledge, namely, that knowledge inherently requires truth. Fifth, LMs rely on linguistic cues for fact-checking and sometimes bypass the deeper reasoning. These findings highlight significant concerns about current LMs' ability to reason about truth, belief, and knowledge while emphasizing the need for advancements in these areas before broad deployment in critical sectors.

large language model, machine learning, natural language, (18 more...)

2410.21195

Country:

Asia > China (0.15)
Oceania > Australia (0.05)
Pacific Ocean (0.05)
(24 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Media > News (1.00)
Law (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Establishing Nationwide Power System Vulnerability Index across US Counties Using Interpretable Machine Learning

Ma, Junwei, Li, Bo, Omitaomu, Olufemi A., Mostafavi, Ali

Power outages have become increasingly frequent, intense, and prolonged in the US due to climate change, aging electrical grids, and rising energy demand. However, largely due to the absence of granular spatiotemporal outage data, we lack data-driven evidence and analytics-based metrics to quantify power system vulnerability. This limitation has hindered the ability to effectively evaluate and address vulnerability to power outages in US communities. Here, we collected ~179 million power outage records at 15-minute intervals across 3022 US contiguous counties (96.15% of the area) from 2014 to 2023. We developed a power system vulnerability assessment framework based on three dimensions (intensity, frequency, and duration) and applied interpretable machine learning models (XGBoost and SHAP) to compute Power System Vulnerability Index (PSVI) at the county level. Our analysis reveals a consistent increase in power system vulnerability over the past decade. We identified 318 counties across 45 states as hotspots for high power system vulnerability, particularly in the West Coast (California and Washington), the East Coast (Florida and the Northeast area), the Great Lakes megalopolis (Chicago-Detroit metropolitan areas), and the Gulf of Mexico (Texas). Heterogeneity analysis indicates that urban counties, counties with interconnected grids, and states with high solar generation exhibit significantly higher vulnerability. Our results highlight the significance of the proposed PSVI for evaluating the vulnerability of communities to power outages. The findings underscore the widespread and pervasive impact of power outages across the country and offer crucial insights to support infrastructure operators, policymakers, and emergency managers in formulating policies and programs aimed at enhancing the resilience of the US power infrastructure.

artificial intelligence, machine learning, vulnerability, (17 more...)

2410.19754

Country:

North America > Mexico (0.34)
North America > United States > Illinois > Cook County > Chicago (0.24)
Atlantic Ocean > Gulf of Mexico (0.24)
(41 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Machinery > Industrial Machinery (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Energy > Renewable > Wind (1.00)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Kumar, Shanu, Venkata, Akhila Yesantarao, Khandelwal, Shubhanshu, Santra, Bishal, Agrawal, Parag, Gupta, Manish

SCULPT: Systematic Tuning of Long Prompts

As large language models become increasingly central to solving complex tasks, the challenge of optimizing long, unstructured prompts has become critical. Existing optimization techniques often struggle to effectively handle such prompts, leading to suboptimal performance. We introduce SCULPT (Systematic Tuning of Long Prompts), a novel framework that systematically refines long prompts by structuring them hierarchically and applying an iterative actor-critic mechanism. To enhance robustness and generalizability, SCULPT utilizes two complementary feedback mechanisms: Preliminary Assessment, which assesses the prompt's structure before execution, and Error Assessment, which diagnoses and addresses errors post-execution. By aggregating feedback from these mechanisms, SCULPT avoids overfitting and ensures consistent improvements in performance. Our experimental results demonstrate significant accuracy gains and enhanced robustness, particularly in handling erroneous and misaligned prompts. SCULPT consistently outperforms existing approaches, establishing itself as a scalable solution for optimizing long prompts across diverse and real-world tasks.

large language model, machine learning, natural language, (17 more...)

2410.20788

Country:

Europe > France (0.04)
North America > United States > New York (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
(8 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

CRAT: A Multi-Agent Framework for Causality-Enhanced Reflective and Retrieval-Augmented Translation with Large Language Models

Chen, Meiqi, Meng, Fandong, Zhang, Yingxue, Zhang, Yan, Zhou, Jie

Large language models (LLMs) have shown great promise in machine translation, but they still struggle with contextually dependent terms, such as new or domain-specific words. This leads to inconsistencies and errors that are difficult to address. Existing solutions often depend on manual identification of such terms, which is impractical given the complexity and evolving nature of language. While Retrieval-Augmented Generation (RAG) could provide some assistance, its application to translation is limited by issues such as hallucinations from information overload. In this paper, we propose CRAT, a novel multi-agent translation framework that leverages RAG and causality-enhanced self-reflection to address these challenges. This framework consists of several specialized agents: the Unknown Terms Identification agent detects unknown terms within the context, the Knowledge Graph (KG) Constructor agent extracts relevant internal knowledge about these terms and retrieves bilingual information from external sources, the Causality-enhanced Judge agent validates the accuracy of the information, and the Translator agent incorporates the refined information into the final output. This automated process allows for more precise and consistent handling of key terms during translation. Our results show that CRAT significantly improves translation accuracy, particularly in handling context-sensitive terms and emerging vocabulary.

large language model, machine learning, translation, (19 more...)

2410.21067

Country:

Asia > China (0.05)
Atlantic Ocean > South Atlantic Ocean > Scotia Sea (0.04)
North America > United States > Pennsylvania (0.04)
(4 more...)

Genre: Research Report > New Finding (0.68)

Industry: Banking & Finance (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Purason, Taido, Kuulmets, Hele-Andra, Fishel, Mark

LLMs for Extremely Low-Resource Finno-Ugric Languages

arXiv.org Artificial IntelligenceOct-24-2024

The advancement of large language models (LLMs) has predominantly focused on high-resource languages, leaving low-resource languages, such as those in the Finno-Ugric family, significantly underrepresented. This paper addresses this gap by focusing on V\~oro, Livonian, and Komi. We cover almost the entire cycle of LLM creation, from data collection to instruction tuning and evaluation. Our contributions include developing multilingual base and instruction-tuned models; creating evaluation benchmarks, including the smugri-MT-bench multi-turn conversational benchmark; and conducting human evaluation. We intend for this work to promote linguistic diversity, ensuring that lesser-resourced languages can benefit from advancements in NLP.

large language model, machine learning, natural language, (20 more...)

2410.18902

Country:

Europe > Estonia > Tartu County > Tartu (0.04)
Asia > Singapore (0.04)
Asia > Indonesia > Bali (0.04)
(11 more...)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

arXiv.org Artificial IntelligenceOct-24-2024

KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing

Yang, Yifei, Cao, Zouying, Chen, Qiguang, Qin, Libo, Yang, Dongjie, Zhao, Hai, Chen, Zhi

The development of large language models (LLMs) has significantly expanded model sizes, resulting in substantial GPU memory requirements during inference. Nowadays, most existing KV cache compression methods focus on intra-layer compression within a single Transformer layer but few works consider layer-wise compression. In this paper, we propose a plug-and-play method called KVSharer, which shares the KV cache between layers to achieve layer-wise compression. Rather than intuitively sharing based on higher similarity, we discover a counterintuitive phenomenon: sharing dissimilar KV caches better preserves the model performance. Experiments show that KVSharer can reduce KV cache computation by 30%, thereby lowering memory consumption without significantly impacting model performance and it can also achieve at least 1.3 times generation acceleration. Although the KV cache Figure 1: Previous methods primarily focus greatly helps improve inference speed, it also significantly on discarding Keys and Values within layers. During the LLM inference In contrast, we share KV caches across layers phase, the KV cache typically accounts for based on their dissimilarity. Recent research has seen a proliferation of methods aimed at compressing KV caches to reduce memory consumption (Zandieh et al., 2024; Xu et al., 2024; Yang et al., 2024b; Zhang et al., 2024b;a; Dong et al., 2024). However, these efforts have predominantly focused on intra-layer KV cache compression within individual Transformer layers of LLM.

kv cache, large language model, machine learning, (14 more...)

2410.18517

Country:

Pacific Ocean (0.04)
Atlantic Ocean > Mediterranean Sea (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

arXiv.org Artificial IntelligenceOct-23-2024

Rethinking Positive Pairs in Contrastive Learning

Wu, Jiantao, Mo, Shentong, Feng, Zhenhua, Atito, Sara, Kitler, Josef, Awais, Muhammad

Contrastive learning, a prominent approach to representation learning, traditionally assumes positive pairs are closely related samples (the same image or class) and negative pairs are distinct samples. We challenge this assumption by proposing to learn from arbitrary pairs, allowing any pair of samples to be positive within our framework.The primary challenge of the proposed approach lies in applying contrastive learning to disparate pairs which are semantically distant. Motivated by the discovery that SimCLR can separate given arbitrary pairs (e.g., garter snake and table lamp) in a subspace, we propose a feature filter in the condition of class pairs that creates the requisite subspaces by gate vectors selectively activating or deactivating dimensions. This filter can be optimized through gradient descent within a conventional contrastive learning mechanism. We present Hydra, a universal contrastive learning framework for visual representations that extends conventional contrastive learning to accommodate arbitrary pairs. Our approach is validated using IN1K, where 1K diverse classes compose 500,500 pairs, most of them being distinct. Surprisingly, Hydra achieves superior performance in this challenging setting. Additional benefits include the prevention of dimensional collapse and the discovery of class relationships. Our work highlights the value of learning common features of arbitrary pairs and potentially broadens the applicability of contrastive learning techniques on the sample pairs with weak relationships.

artificial intelligence, machine learning, subspace, (15 more...)

2410.182

Country:

Europe > United Kingdom > England > Surrey (0.04)
North America > United States > Virginia (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(5 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Ngong, Ivoline C., Near, Joseph P., Mireshghallah, Niloofar

Differentially Private Learning Needs Better Model Initialization and Self-Distillation

arXiv.org Artificial IntelligenceOct-23-2024

DPSGD to fine-tune these models on private data often yields poor results, particularly when the private Differentially private SGD (DPSGD) enables dataset is small (Tramèr et al., 2022; Mireshghallah privacy-preserving training of language models, et al., 2021). Recent work has shown that leveraging but often reduces utility, diversity, and linguistic better hand-crafted features (Tramer and Boneh, 2020) quality. We introduce DPRefine, a threephase or features from large pre-trained language models (Li method that initializes a model using et al., 2022, 2021) can improve the privacy-utility tradeoff data synthesis from a small pre-trained LM in differentially private learning. However, these with rigorous filtering, applies DP finetuning approaches have limitations: smaller pre-trained models on private data, and performs self-distillation offer limited benefits, and fine-tuning larger models on to refine outputs. This approach significantly private data may be infeasible due to proprietary concerns outperforms vanilla DPSGD, with AlpacaEval or infrastructure limitations. This raises a critical preferring DPRefine's generations in 78.4% question: Can we develop small, domain-specific language of cases across all datasets. Our analysis reveals models that achieve high performance without that DPRefine reduces linguistic errors in requiring large private datasets or large, pre-trained generated text by 84.0%, mitigating grammar models?

large language model, machine learning, natural language, (18 more...)

2410.17566

Country:

South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.14)
Europe > United Kingdom > Wales (0.04)
North America > Mexico (0.04)
(14 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment > Sports (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
(9 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)