AITopics

Fine-tuning pre-trained large language models in a parameter-efficient manner is widely studied for its effectiveness and efficiency. LoRA is one of the most widely used methods, which assumes that the optimization process is essentially low dimensional. Although LoRA has demonstrated commendable performance, there remains a significant performance gap between LoRA and full fine-tuning when learning new tasks. In this work, we propose Low-Rank Adaptation with Task-Relevant Feature Enhancement(LoRATRF) for enhancing task-relevant features from the perspective of editing neural network representations. To prioritize task-relevant features, a task-aware filter that selectively extracts valuable knowledge from hidden representations for the target or current task is designed. As the experiments on a vareity of datasets including NLU, commonsense reasoning and mathematical reasoning tasks demonstrates, our method reduces 33.71% parameters and achieves better performance on a variety of datasets in comparison with SOTA low-rank methods.

computational linguistic, large language model, machine learning, (15 more...)

2412.09827

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
South America > Colombia > Meta Department > Villavicencio (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(10 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

AutoDCWorkflow: LLM-based Data Cleaning Workflow Auto-Generation and Benchmark

Li, Lan, Fang, Liri, Torvik, Vetle I.

We investigate the reasoning capabilities of large language models (LLMs) for automatically generating data-cleaning workflows. To evaluate LLMs' ability to complete data-cleaning tasks, we implemented a pipeline for LLM-based Auto Data Cleaning Workflow (AutoDCWorkflow), prompting LLMs on data cleaning operations to repair three types of data quality issues: duplicates, missing values, and inconsistent data formats. Given a dirty table and a purpose (expressed as a query), this pipeline generates a minimal, clean table sufficient to address the purpose and the data cleaning workflow used to produce the table. The planning process involves three main LLM-driven components: (1) Select Target Columns: Identifies a set of target columns related to the purpose. (2) Inspect Column Quality: Assesses the data quality for each target column and generates a Data Quality Report as operation objectives. (3) Generate Operation & Arguments: Predicts the next operation and arguments based on the data quality report results. Additionally, we propose a data cleaning benchmark to evaluate the capability of LLM agents to automatically generate workflows that address data cleaning purposes of varying difficulty levels. The benchmark comprises the annotated datasets as a collection of purpose, raw table, clean table, data cleaning workflow, and answer set. In our experiments, we evaluated three LLMs that auto-generate purpose-driven data cleaning workflows. The results indicate that LLMs perform well in planning and generating data-cleaning workflows without the need for fine-tuning.

data quality, large language model, natural language, (16 more...)

2412.06724

Country:

North America > United States > Illinois > Cook County > Chicago (0.05)
North America > United States > Michigan (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(4 more...)

Genre:

Workflow (1.00)
Research Report > New Finding (0.66)

Industry:

Government (0.46)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.30)

Technology:

Information Technology > Data Science > Data Quality > Data Cleaning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Souza, Leandro C., Guingo, Bruno C., Giraldi, Gilson, Portugal, Renato

Regression and Classification with Single-Qubit Quantum Neural Networks

Since classical machine learning has become a powerful tool for developing data-driven algorithms, quantum machine learning is expected to similarly impact the development of quantum algorithms. The literature reflects a mutually beneficial relationship between machine learning and quantum computing, where progress in one field frequently drives improvements in the other. Motivated by the fertile connection between machine learning and quantum computing enabled by parameterized quantum circuits, we use a resource-efficient and scalable Single-Qubit Quantum Neural Network (SQQNN) for both regression and classification tasks. The SQQNN leverages parameterized single-qubit unitary operators and quantum measurements to achieve efficient learning. To train the model, we use gradient descent for regression tasks. For classification, we introduce a novel training method inspired by the Taylor series, which can efficiently find a global minimum in a single step. This approach significantly accelerates training compared to iterative methods. Evaluated across various applications, the SQQNN exhibits virtually error-free and strong performance in regression and classification tasks, including the MNIST dataset. These results demonstrate the versatility, scalability, and suitability of the SQQNN for deployment on near-term quantum devices.

artificial intelligence, machine learning, neuron, (14 more...)

2412.09486

Country:

North America > United States > Wisconsin (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Portugal (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.68)
Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Lin, Xiaochuan, Chen, Xiangyong

Reasoning-Aware Query-Focused Summarization over Multi-Table Data

Query-focused summarization over multi-table data is a challenging yet critical task for extracting precise and relevant information from structured data. Existing methods often rely on complex preprocessing steps and struggle to generalize across domains or handle the logical reasoning required for multi-table queries. In this paper, we propose QueryTableSummarizer++, an end-to-end generative framework leveraging large language models (LLMs) enhanced with table-aware pre-training, query-aligned fine-tuning, and reinforcement learning with feedback. Our method eliminates the need for intermediate serialization steps and directly generates query-relevant summaries. Experiments on a benchmark dataset demonstrate that QueryTableSummarizer++ significantly outperforms state-of-the-art baselines in terms of BLEU, ROUGE, and F1-score. Additional analyses highlight its scalability, generalization across domains, and robust handling of complex queries. Human evaluation further validates the superior quality and practical applicability of the generated summaries, establishing QueryTableSummarizer++ as a highly effective solution for multi-table summarization tasks.

information retrieval, large language model, machine learning, (16 more...)

2412.0897

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
Europe > Spain (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)

Temporal Causal Discovery in Dynamic Bayesian Networks Using Federated Learning

Chen, Jianhong, Ma, Ying, Yue, Xubo

Traditionally, learning the structure of a Dynamic Bayesian Network has been centralized, with all data pooled in one location. However, in real-world scenarios, data are often dispersed among multiple parties (e.g., companies, devices) that aim to collaboratively learn a Dynamic Bayesian Network while preserving their data privacy and security. In this study, we introduce a federated learning approach for estimating the structure of a Dynamic Bayesian Network from data distributed horizontally across different parties. We propose a distributed structure learning method that leverages continuous optimization so that only model parameters are exchanged during optimization. Experimental results on synthetic and real datasets reveal that our method outperforms other state-of-the-art techniques, particularly when there are many clients with limited individual sample sizes.

artificial intelligence, bayesian network, machine learning, (15 more...)

2412.09814

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Virginia (0.04)
North America > United States > New York (0.04)
(8 more...)

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.34)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Distance-Adaptive Quaternion Knowledge Graph Embedding with Bidirectional Rotation

Wang, Weihua, Liang, Qiuyu, Bao, Feilong, Gao, Guanglai

Quaternion contains one real part and three imaginary parts, which provided a more expressive hypercomplex space for learning knowledge graph. Existing quaternion embedding models measure the plausibility of a triplet either through semantic matching or geometric distance scoring functions. However, it appears that semantic matching diminishes the separability of entities, while the distance scoring function weakens the semantics of entities. To address this issue, we propose a novel quaternion knowledge graph embedding model. Our model combines semantic matching with entity's geometric distance to better measure the plausibility of triplets. Specifically, in the quaternion space, we perform a right rotation on head entity and a reverse rotation on tail entity to learn rich semantic features. Then, we utilize distance adaptive translations to learn geometric distance between entities. Furthermore, we provide mathematical proofs to demonstrate our model can handle complex logical relationships. Extensive experimental results and analyses show our model significantly outperforms previous models on well-known knowledge graph completion benchmark datasets. Our code is available at https://github.com/llqy123/DaBR.

dataset, knowledge graph, rotation, (16 more...)

2412.04076

Country:

Asia > Mongolia (0.05)
Asia > Thailand > Bangkok > Bangkok (0.04)
Asia > China > Inner Mongolia > Hohhot (0.04)
(6 more...)

Genre:

Research Report (0.50)
Personal > Honors (0.46)

Industry: Leisure & Entertainment (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Strugatski, Alona, Alexandron, Giora

Applying IRT to Distinguish Between Human and Generative AI Responses to Multiple-Choice Assessments

Generative AI is transforming the educational landscape, raising significant concerns about cheating. Despite the widespread use of multiple-choice questions (MCQs) in assessments, the detection of AI cheating in MCQ-based tests has been almost unexplored, in contrast to the focus on detecting AI-cheating on text-rich student outputs. In this paper, we propose a method based on the application of Item Response Theory (IRT) to address this gap. Our approach operates on the assumption that artificial and human intelligence exhibit different response patterns, with AI cheating manifesting as deviations from the expected patterns of human responses. These deviations are modeled using Person-Fit Statistics (PFS). We demonstrate that this method effectively highlights the differences between human responses and those generated by premium versions of leading chatbots (ChatGPT, Claude, and Gemini), but that it is also sensitive to the amount of AI cheating in the data. Furthermore, we show that the chatbots differ in their reasoning profiles. Our work provides both a theoretical foundation and empirical evidence for the application of IRT to identify AI cheating in MCQ-based assessments.

chatbot, instrument, response pattern, (14 more...)

2412.02713

Country:

Asia > Middle East > Israel (0.05)
South America > Uruguay > Maldonado > Maldonado (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.75)

Industry:

Education > Educational Setting > Online (0.94)
Education > Curriculum > Subject-Specific Education (0.68)
Education > Educational Technology > Educational Software > Computer Based Training (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.71)

Borsoi, Ricardo Augusto, Usevich, Konstantin, Brie, David, Adali, Tülay

Personalized Coupled Tensor Decomposition for Multimodal Data Fusion: Uniqueness and Algorithms

Coupled tensor decompositions (CTDs) perform data fusion by linking factors from different datasets. Although many CTDs have been already proposed, current works do not address important challenges of data fusion, where: 1) the datasets are often heterogeneous, constituting different "views" of a given phenomena (multimodality); and 2) each dataset can contain personalized or dataset-specific information, constituting distinct factors that are not coupled with other datasets. In this work, we introduce a personalized CTD framework tackling these challenges. A flexible model is proposed where each dataset is represented as the sum of two components, one related to a common tensor through a multilinear measurement model, and another specific to each dataset. Both the common and distinct components are assumed to admit a polyadic decomposition. This generalizes several existing CTD models. We provide conditions for specific and generic uniqueness of the decomposition that are easy to interpret. These conditions employ uni-mode uniqueness of different individual datasets and properties of the measurement model. Two algorithms are proposed to compute the common and distinct components: a semi-algebraic one and a coordinate-descent optimization method. Experimental results illustrate the advantage of the proposed framework compared with the state of the art approaches.

decomposition, tensor, uniqueness, (16 more...)

doi: 10.1109/TSP.2024.3510680

2412.01102

Country:

North America > United States > Maryland > Baltimore County (0.14)
North America > United States > Maryland > Baltimore (0.14)
Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.04)
(10 more...)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.93)
Health & Medicine > Therapeutic Area > Neurology (0.93)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Chen, Angelica, Stanton, Samuel D., Alberstein, Robert G., Watkins, Andrew M., Bonneau, Richard, Gligorijević, Vladimir, Cho, Kyunghyun, Frey, Nathan C.

LLMs are Highly-Constrained Biophysical Sequence Optimizers

Large language models (LLMs) have recently shown significant potential in various biological tasks such as protein engineering and molecule design. These tasks typically involve black-box discrete sequence optimization, where the challenge lies in generating sequences that are not only biologically feasible but also adhere to hard fine-grained constraints. However, LLMs often struggle with such constraints, especially in biological contexts where verifying candidate solutions is costly and time-consuming. In this study, we explore the possibility of employing LLMs as highly-constrained bilevel optimizers through a methodology we refer to as Language Model Optimization with Margin Expectation (LLOME). This approach combines both offline and online optimization, utilizing limited oracle evaluations to iteratively enhance the sequences generated by the LLM. We additionally propose a novel training objective - Margin-Aligned Expectation (MargE) - that trains the LLM to smoothly interpolate between the reward and reference distributions. Lastly, we introduce a synthetic test suite that bears strong geometric similarity to real biophysical problems and enables rapid evaluation of LLM optimizers without time-consuming lab validation. Our findings reveal that, in comparison to genetic algorithm baselines, LLMs achieve significantly lower regret solutions while requiring fewer test function evaluations. However, we also observe that LLMs exhibit moderate miscalibration, are susceptible to generator collapse, and have difficulty finding the optimal solution when no explicit ground truth rewards are available. Large language models (LLMs) have recently shown significant promise on various biophysical optimization tasks, such as protein engineering and molecule design. These tasks are often formulated as black-box discrete sequence optimization problems, wherein a solver must attempt to output a discrete sequence x X that is feasible (i.e., a biologically plausible sequence) and that fulfills a number of strict constraints, such as containing specific motifs. Yet despite their many successes, LLMs often struggle to generate outputs that fulfill hard fine-grained constraints [31].

algorithm, language model, sequence, (15 more...)

2410.22296

Country:

Africa > Togo > Maritime Region > Lome (0.08)
Asia > Indonesia > Bali (0.04)
Asia > Singapore (0.04)
(11 more...)

Genre: Research Report > New Finding (0.86)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

arXiv.org Machine LearningDec-12-2024

Model Developmental Safety: A Retention-Centric Method and Applications in Vision-Language Models

Li, Gang, Yu, Wendi, Yao, Yao, Tong, Wei, Liang, Yingbin, Lin, Qihang, Yang, Tianbao

In the real world, a learning-enabled system usually undergoes multiple cycles of model development to enhance the system's ability to handle difficult or emerging tasks. This continual model development process raises a significant issue that the model development for acquiring new or improving existing capabilities may inadvertently lose capabilities of the old model, also known as catastrophic forgetting. Existing continual learning studies focus on mitigating catastrophic forgetting by trading off performance on previous tasks and new tasks to ensure good average performance. However, they are inadequate for many applications especially in safety-critical domains, as failure to strictly preserve the good performance of the old model not only introduces safety risks and uncertainties but also imposes substantial expenses in the re-improving and re-validation of existing properties. To address this issue, we introduce model developmental safety as a guarantee of a learning system such that in the model development process the new model should strictly preserve the existing protected capabilities of the old model while improving its performance on target tasks. To ensure the model developmental safety, we present a retention-centric framework by formulating the model developmental safety as data-dependent constraints. Under this framework, we study how to develop a pretrained vision-language model, specifically the CLIP model, for acquiring new capabilities or improving existing capabilities of image classification. We propose an efficient constrained optimization algorithm with theoretical guarantee and use its insights to finetune a CLIP model with task-dependent heads for promoting the model developmental safety. Our experiments on improving vision perception capabilities on autonomous driving and scene recognition datasets demonstrate the efficacy of the proposed approach.

constraint, developmental safety, model developmental safety, (13 more...)

arXiv.org Machine Learning

2410.03955

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Iowa > Johnson County > Iowa City (0.04)
North America > United States > Texas > Brazos County > College Station (0.04)
(5 more...)

Genre: Research Report (0.82)

Industry:

Transportation > Ground > Road (0.48)
Automobiles & Trucks (0.48)
Information Technology > Robotics & Automation (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)