AITopics

The increasing sizes of large language models (LLMs) result in significant computational overhead and memory usage when adapting these models to specific tasks or domains. Various parameter-efficient fine-tuning (PEFT) methods have been devised to mitigate these challenges by training a small set of parameters for the task-specific updates of the model weights. Among PEFT methods, LoRA stands out for its simplicity and efficiency, inspiring the development of a series of variants. However, LoRA and its successors disregard the knowledge that is noisy or irrelevant to the targeted task, detrimentally impacting model performance and leading to suboptimality. To address this limitation, we introduce Knowledge-aware Singular-value Adaptation (KaSA), a PEFT method that leverages singular value decomposition (SVD) with knowledge-aware singular values to dynamically activate knowledge based on its relevance to the task at hand. We conduct extensive experiments across a range of LLMs on tasks spanning natural language understanding (NLU), generation (NLG), instruction following, and commonsense reasoning. The experimental results demonstrate that KaSA consistently outperforms FFT and 14 popular PEFT baselines across 16 benchmarks and 4 synthetic datasets, underscoring our method's efficacy and adaptability. The source code of our method is available at https://github.com/juyongjiang/KaSA.

large language model, machine learning, natural language, (18 more...)

2412.06071

Country:

North America > United States > Hawaii (0.04)
North America > United States > Arizona (0.04)
North America > United States > Rocky Mountains (0.04)
(12 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Education (0.93)
Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Rule-based Data Selection for Large Language Models

Li, Xiaomin, Gao, Mingye, Zhang, Zhiwei, Yue, Chang, Hu, Hong

There are increasing studies using LLMs to rate and select data based on several human-crafted metrics (rules). However, these conventional rule-based approaches often depend too heavily on human heuristics, lack effective metrics for assessing rules, and exhibit limited adaptability to new tasks. In our study, we introduce an innovative rule-based framework that utilizes the orthogonality of score vectors associated with rules as a novel metric for rule evaluations. Our approach includes an automated pipeline that first uses LLMs to generate a diverse set of rules, encompassing various rating dimensions to evaluate data quality. Then it rates a batch of data based on these rules and uses the determinantal point process (DPP) from random matrix theory to select the most orthogonal score vectors, thereby identifying a set of independent rules. These rules are subsequently used to evaluate all data, selecting samples with the highest average scores for downstream tasks such as LLM training. We verify the effectiveness of our method through two experimental setups: 1) comparisons with ground truth ratings and 2) benchmarking LLMs trained with the chosen data. Our comprehensive experiments cover a range of scenarios, including general pre-training and domain-specific fine-tuning in areas such as IMDB, Medical, Math, and Code. The outcomes demonstrate that our DPP-based rule rating method consistently outperforms other approaches, including rule-free rating, uniform sampling, importance resampling, and QuRating, in terms of both rating precision and model performance.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

2410.04715

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Pennsylvania (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
North America > United States > Massachusetts (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Health & Medicine (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Korre, Katerina, Pavlopoulos, John, Gajo, Paolo, Barrón-Cedeño, Alberto

Hate Speech According to the Law: An Analysis for Effective Detection

The issue of hate speech extends beyond the confines of the online realm. It is a problem with real-life repercussions, prompting most nations to formulate legal frameworks that classify hate speech as a punishable offence. These legal frameworks differ from one country to another, contributing to the big chaos that online platforms have to face when addressing reported instances of hate speech. With the definitions of hate speech falling short in introducing a robust framework, we turn our gaze onto hate speech laws. We consult the opinion of legal experts on a hate speech dataset and we experiment by employing various approaches such as pretrained models both on hate speech and legal data, as well as exploiting two large language models (Qwen2-7B-Instruct and Meta-Llama-3-70B). Due to the time-consuming nature of data acquisition for prosecutable hate speech, we use pseudo-labeling to improve our pretrained models. This study highlights the importance of amplifying research on prosecutable hate speech and provides insights into effective strategies for combating hate speech within the parameters of legal frameworks. Our findings show that legal knowledge in the form of annotations can be useful when classifying prosecutable hate speech, yet more focus should be paid on the differences between the laws.

large language model, machine learning, natural language, (18 more...)

2412.06144

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > United Kingdom > Wales (0.04)
Europe > United Kingdom > Scotland (0.04)
(14 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > Europe Government (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Shankar, Shreya, Chambers, Tristan, Shah, Tarak, Parameswaran, Aditya G., Wu, Eugene

DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing

Analyzing unstructured data has been a persistent challenge in data processing. Large Language Models (LLMs) have shown promise in this regard, leading to recent proposals for declarative frameworks for LLM-powered processing of unstructured data. However, these frameworks focus on reducing cost when executing user-specified operations using LLMs, rather than improving accuracy, executing most operations as-is (in a single LLM call). This is problematic for complex tasks and data, where LLM outputs for user-defined operations are often inaccurate, even with optimized prompts. For example, an LLM may struggle to identify {\em all} instances of specific clauses, like force majeure or indemnification, in lengthy legal documents, requiring decomposition of the data, the task, or both. We present DocETL, a system that optimizes complex document processing pipelines, while accounting for LLM shortcomings. DocETL offers a declarative interface for users to define such pipelines and uses an agent-based approach to automatically optimize them, leveraging novel agent-based rewrites (that we call rewrite directives), as well as an optimization and evaluation framework. We introduce (i) logical rewriting of pipelines, tailored for LLM-based tasks, (ii) an agent-guided plan evaluation mechanism that synthesizes and orchestrates task-specific validation prompts, and (iii) an optimization algorithm that efficiently finds promising plans, considering the latencies of agent-based plan generation and evaluation. Our evaluation on four different unstructured document analysis tasks demonstrates that DocETL finds plans with outputs that are 25 to 80% more accurate than well-engineered baselines, addressing a critical gap in unstructured data analysis. DocETL is open-source at docetl.org, and as of November 2024, has amassed over 1.3k GitHub Stars, with users spanning a variety of domains.

large language model, machine learning, natural language, (20 more...)

2410.12189

Country:

North America > United States > California (0.04)
North America > United States > New York (0.04)
Asia > Singapore (0.04)
Asia > Indonesia > Bali (0.04)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.93)
Health & Medicine (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

arXiv.org Machine LearningDec-8-2024

On Socially Fair Low-Rank Approximation and Column Subset Selection

Song, Zhao, Vakilian, Ali, Woodruff, David P., Zhou, Samson

Low-rank approximation and column subset selection are two fundamental and related problems that are applied across a wealth of machine learning applications. In this paper, we study the question of socially fair low-rank approximation and socially fair column subset selection, where the goal is to minimize the loss over all sub-populations of the data. We show that surprisingly, even constant-factor approximation to fair low-rank approximation requires exponential time under certain standard complexity hypotheses. On the positive side, we give an algorithm for fair low-rank approximation that, for a constant number of groups and constant-factor accuracy, runs in $2^{\text{poly}(k)}$ time rather than the na\"{i}ve $n^{\text{poly}(k)}$, which is a substantial improvement when the dataset has a large number $n$ of observations. We then show that there exist bicriteria approximation algorithms for fair low-rank approximation and fair column subset selection that run in polynomial time.

approximation, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

2412.06063

Country:

North America > United States > Texas (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California (0.04)

Genre: Research Report (0.82)

Industry: Law (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

A Comprehensive Survey and Guide to Multimodal Large Language Models in Vision-Language Tasks

Liang, Chia Xin, Tian, Pu, Yin, Caitlyn Heqi, Yua, Yao, An-Hou, Wei, Ming, Li, Wang, Tianyang, Bi, Ziqian, Liu, Ming

This survey and application guide to multimodal large language models(MLLMs) explores the rapidly developing field of MLLMs, examining their architectures, applications, and impact on AI and Generative Models. Starting with foundational concepts, we delve into how MLLMs integrate various data types, including text, images, video and audio, to enable complex AI systems for cross-modal understanding and generation. It covers essential topics such as training methods, architectural components, and practical applications in various fields, from visual storytelling to enhanced accessibility. Through detailed case studies and technical analysis, the text examines prominent MLLM implementations while addressing key challenges in scalability, robustness, and cross-modal learning. Concluding with a discussion of ethical considerations, responsible AI development, and future directions, this authoritative resource provides both theoretical frameworks and practical insights. It offers a balanced perspective on the opportunities and challenges in the development and deployment of MLLMs, and is highly valuable for researchers, practitioners, and students interested in the intersection of natural language processing and computer vision.

large language model, machine learning, natural language, (25 more...)

2411.06284

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Indiana (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(5 more...)

Genre:

Workflow (1.00)
Research Report > Promising Solution (1.00)
Overview (1.00)
(2 more...)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Information Technology > Services (1.00)
(9 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(8 more...)

FedRBE -- a decentralized privacy-preserving federated batch effect correction tool for omics data based on limma

Burankova, Yuliya, Klemm, Julian, Lohmann, Jens J. G., Taheri, Ahmad, Probul, Niklas, Baumbach, Jan, Zolotareva, Olga

Batch effects in omics data obscure true biological signals and constitute a major challenge for privacy-preserving analyses of distributed patient data. Existing batch effect correction methods either require data centralization, which may easily conflict with privacy requirements, or lack support for missing values and automated workflows. To bridge this gap, we developed fedRBE, a federated implementation of limma's removeBatchEffect method. We implemented it as an app for the FeatureCloud platform. Unlike its existing analogs, fedRBE effectively handles data with missing values and offers an automated, user-friendly online user interface ( https://featurecloud.ai/app/fedrbe). Leveraging secure multi-party computation provides enhanced security guarantees over classical federated learning approaches. We evaluated our fedRBE algorithm on simulated and real omics data, achieving performance comparable to the centralized method with negligible differences (no greater than 3.6E-13). By enabling collaborative correction without data sharing, fedRBE facilitates large-scale omics studies where batch effect correction is crucial.

correction, dataset, removebatcheffect, (13 more...)

2412.05894

Country:

North America > United States > California (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > China (0.04)
(6 more...)

Genre: Research Report > Experimental Study (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
(2 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Biomedical Informatics (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

A Comprehensive Guide to Explainable AI: From Classical Models to LLMs

Hsieh, Weiche, Bi, Ziqian, Jiang, Chuanqi, Liu, Junyu, Peng, Benji, Zhang, Sen, Pan, Xuanhe, Xu, Jiawei, Wang, Jinlang, Chen, Keyu, Feng, Pohsun, Wen, Yizhu, Song, Xinyuan, Wang, Tianyang, Liu, Ming, Yang, Junjie, Li, Ming, Jing, Bowen, Ren, Jintao, Song, Junhao, Tseng, Hong-Ming, Zhang, Yichao, Yan, Lawrence K. Q., Niu, Qian, Chen, Silin, Wang, Yunze, Liang, Chia Xin

challenge and limitation computational complexity, inspection import partialdependencedisplay 6 7, transformer encoder-decoder architecture, (16 more...)

Explainable Artificial Intelligence (XAI) addresses the growing need for transparency and interpretability in AI systems, enabling trust and accountability in decision-making processes. This book offers a comprehensive guide to XAI, bridging foundational concepts with advanced methodologies. It explores interpretability in traditional models such as Decision Trees, Linear Regression, and Support Vector Machines, alongside the challenges of explaining deep learning architectures like CNNs, RNNs, and Large Language Models (LLMs), including BERT, GPT, and T5. The book presents practical techniques such as SHAP, LIME, Grad-CAM, counterfactual explanations, and causal inference, supported by Python code examples for real-world applications. Case studies illustrate XAI's role in healthcare, finance, and policymaking, demonstrating its impact on fairness and decision support. The book also covers evaluation metrics for explanation quality, an overview of cutting-edge XAI tools and frameworks, and emerging research directions, such as interpretability in federated learning and ethical AI considerations. Designed for a broad audience, this resource equips readers with the theoretical insights and practical skills needed to master XAI. Hands-on examples and additional resources are available at the companion GitHub repository: https://github.com/Echoslayer/XAI_From_Classical_Models_to_LLMs.

2412.008

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
Europe > Italy > Marche > Ancona Province > Ancona (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(14 more...)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.67)
Research Report > New Finding (0.46)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Banking & Finance (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Daily Mail - Science & techDec-7-2024, 06:18:57 GMT

Murdered health insurance boss Brian Thompson backed 'malicious' AI that denied 90% of patient coverage

A controversial AI program used to deny elderly people health coverage is now at the center of questions about the shooting of the UnitedHealthcare CEO. Brian Thompson, 50 was gunned down Wednesday outside a Hilton in Midtown Manhattan in what police have described as a'brazen' and'targeted' attack. The killer is still on the loose and the motive is not yet known - but a former-FBI agent told Newsweek that he may have been denied health coverage. UnitedHealthcare became the largest denier of insurance plans in 2023, dismissing one in every three claims. It has now emerged that during the years before that, the company implemented AI software that had a 90 percent denial rate.

brian thompson, thompson, unitedhealthcare, (15 more...)

Daily Mail - Science & tech

Country:

North America > United States > Minnesota > Freeborn County > Albert Lea (0.06)
North America > United States > Tennessee (0.05)

Genre: Personal > Obituary (1.00)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
Banking & Finance > Insurance (1.00)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.41)

Avogaro, Andrea, Capogrosso, Luigi, Fummi, Franco, Cristani, Marco

Dif4FF: Leveraging Multimodal Diffusion Models and Graph Neural Networks for Accurate New Fashion Product Performance Forecasting

arXiv.org Artificial IntelligenceDec-7-2024

In the fast-fashion industry, overproduction and unsold inventory create significant environmental problems. Precise sales forecasts for unreleased items could drastically improve the efficiency and profits of industries. However, predicting the success of entirely new styles is difficult due to the absence of past data and ever-changing trends. Specifically, currently used deterministic models struggle with domain shifts when encountering items outside their training data. The recently proposed diffusion models address this issue using a continuous-time diffusion process. Specifically, these models enable us to predict the sales of new items, mitigating the domain shift challenges encountered by deterministic models. As a result, this paper proposes Dif4FF, a novel two-stage pipeline for New Fashion Product Performance Forecasting (NFPPF) that leverages the power of diffusion models conditioned on multimodal data related to specific clothes. Dif4FF first utilizes a multimodal score-based diffusion model to forecast multiple sales trajectories for various garments over time. The forecasts are refined using a powerful Graph Convolutional Network (GCN) architecture. By leveraging the GCN's capability to capture long-range dependencies within both the temporal and spatial data and seeking the optimal solution between these two dimensions, Dif4FF offers the most accurate and efficient forecasting system available in the literature for predicting the sales of new items. We tested Dif4FF on VISUELLE, the de facto standard for NFPPF, achieving new state-of-the-art results.

artificial intelligence, diffusion model, machine learning, (14 more...)

doi: 10.1007/978-3-031-78186-5_7

2412.05566

Country: Europe > Italy (0.04)

Genre: Research Report (1.00)

Industry:

Law > Environmental Law (0.48)
Textiles, Apparel & Luxury Goods (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)