AITopics | Edmonton

Collaborating Authors

Edmonton

NormTab: Improving Symbolic Reasoning in LLMs Through Tabular Data Normalization

arXiv.org Artificial IntelligenceJun-25-2024

In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities in parsing textual data and generating code. However, their performance in tasks involving tabular data, especially those requiring symbolic reasoning, faces challenges due to the structural variance and inconsistency in table cell values often found in web tables. In this paper, we introduce NormTab, a novel framework aimed at enhancing the symbolic reasoning performance of LLMs by normalizing web tables. We study table normalization as a stand-alone, one-time preprocessing step using LLMs to support symbolic reasoning on tabular data. Our experimental evaluation, conducted on challenging web table datasets such as WikiTableQuestion and TabFact, demonstrates that leveraging NormTab significantly improves symbolic reasoning performance, showcasing the importance and effectiveness of web table normalization for enhancing LLM-based symbolic reasoning tasks.

normalization, normtab, reasoning, (15 more...)

arXiv.org Artificial Intelligence

2406.17961

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.15)
North America > United States > California > Los Angeles County > Los Angeles (0.05)
Europe > United Kingdom > England > Greater Manchester > Manchester (0.04)
(12 more...)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Sports > Football (0.95)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

C-ICL: Contrastive In-context Learning for Information Extraction

Mo, Ying, Liu, Jiahao, Yang, Jian, Wang, Qifan, Zhang, Shun, Wang, Jingang, Li, Zhoujun

arXiv.org Artificial IntelligenceJun-24-2024

There has been increasing interest in exploring the capabilities of advanced large language models (LLMs) in the field of information extraction (IE), specifically focusing on tasks related to named entity recognition (NER) and relation extraction (RE). Although researchers are exploring the use of few-shot information extraction through in-context learning with LLMs, they tend to focus only on using correct or positive examples for demonstration, neglecting the potential value of incorporating incorrect or negative examples into the learning process. In this paper, we present c-ICL, a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations. This approach enhances the ability of LLMs to extract entities and relations by utilizing prompts that incorporate not only the positive samples but also the reasoning behind them. This method allows for the identification and correction of potential interface errors. Specifically, our proposed method taps into the inherent contextual information and valuable information in hard negative samples and the nearest positive neighbors to the test and then applies the in-context learning demonstrations based on LLMs. Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods, delivering substantial enhancements in performance across a broad spectrum of related tasks. These improvements are noteworthy, showcasing the versatility of our approach in miscellaneous scenarios.

computational linguistic, demonstration, negative sample, (14 more...)

arXiv.org Artificial Intelligence

2402.11254

Country:

Europe > France (0.05)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.05)
North America > Canada > Ontario > Toronto (0.05)
(25 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Government (0.48)
Transportation (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World

Yan, Weixiang, Liu, Haitian, Wu, Tengxiao, Chen, Qian, Wang, Wen, Chai, Haoyuan, Wang, Jiayi, Zhao, Weishan, Zhang, Yixin, Zhang, Renjun, Zhu, Li

arXiv.org Artificial IntelligenceJun-19-2024

LLMs have achieved significant performance progress in various NLP applications. However, LLMs still struggle to meet the strict requirements for accuracy and reliability in the medical field and face many challenges in clinical applications. Existing clinical diagnostic evaluation benchmarks for evaluating medical agents powered by LLMs have severe limitations. Firstly, most existing medical evaluation benchmarks face the risk of data leakage or contamination. Secondly, existing benchmarks often neglect the characteristics of multiple departments and specializations in modern medical practice. Thirdly, existing evaluation methods are limited to multiple-choice questions, which do not align with the real-world diagnostic scenarios. Lastly, existing evaluation methods lack comprehensive evaluations of end-to-end real clinical scenarios. These limitations in benchmarks in turn obstruct advancements of LLMs and agents for medicine. To address these limitations, we introduce ClinicalLab, a comprehensive clinical diagnosis agent alignment suite. ClinicalLab includes ClinicalBench, an end-to-end multi-departmental clinical diagnostic evaluation benchmark for evaluating medical agents and LLMs. ClinicalBench is based on real cases that cover 24 departments and 150 diseases. ClinicalLab also includes four novel metrics (ClinicalMetrics) for evaluating the effectiveness of LLMs in clinical diagnostic tasks. We evaluate 17 LLMs and find that their performance varies significantly across different departments. Based on these findings, in ClinicalLab, we propose ClinicalAgent, an end-to-end clinical agent that aligns with real-world clinical diagnostic practices. We systematically investigate the performance and applicable scenarios of variants of ClinicalAgent on ClinicalBench. Our findings demonstrate the importance of aligning with modern medical practices in designing medical agents.

clinicalbench, diagnosis, llm, (13 more...)

arXiv.org Artificial Intelligence

2406.1389

Country:

Asia > China (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Nephrology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
(8 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

On the Robustness of Language Models for Tabular Question Answering

Bhandari, Kushal Raj, Xing, Sixue, Dan, Soham, Gao, Jianxi

arXiv.org Artificial IntelligenceJun-18-2024

Large Language Models (LLMs), originally shown to ace various text comprehension tasks have also remarkably been shown to tackle table comprehension tasks without specific training. While previous research has explored LLM capabilities with tabular dataset tasks, our study assesses the influence of $\textit{in-context learning}$,$ \textit{model scale}$, $\textit{instruction tuning}$, and $\textit{domain biases}$ on Tabular Question Answering (TQA). We evaluate the robustness of LLMs on Wikipedia-based $\textbf{WTQ}$ and financial report-based $\textbf{TAT-QA}$ TQA datasets, focusing on their ability to robustly interpret tabular data under various augmentations and perturbations. Our findings indicate that instructions significantly enhance performance, with recent models like Llama3 exhibiting greater robustness over earlier versions. However, data contamination and practical reliability issues persist, especially with WTQ. We highlight the need for improved methodologies, including structure-aware self-attention mechanisms and better handling of domain-specific tabular data, to develop more reliable LLMs for table comprehension.

dataset, llm, tabular data, (14 more...)

arXiv.org Artificial Intelligence

2406.12719

Country:

Asia > Philippines > Luzon > National Capital Region > City of Manila (0.15)
Oceania > Australia > New South Wales > Sydney (0.05)
North America > United States > Georgia > Fulton County > Atlanta (0.05)
(8 more...)

Genre: Research Report (0.84)

Industry: Leisure & Entertainment > Sports > Olympic Games (0.33)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

P-ICL: Point In-Context Learning for Named Entity Recognition with Large Language Models

Jiang, Guochao, Ding, Zepeng, Shi, Yuchen, Yang, Deqing

arXiv.org Artificial IntelligenceJun-17-2024

In recent years, the rise of large language models (LLMs) has made it possible to directly achieve named entity recognition (NER) without any demonstration samples or only using a few samples through in-context learning (ICL). However, standard ICL only helps LLMs understand task instructions, format and input-label mapping, but neglects the particularity of the NER task itself. In this paper, we propose a new prompting framework P-ICL to better achieve NER with LLMs, in which some point entities are leveraged as the auxiliary information to recognize each entity type. With such significant information, the LLM can achieve entity classification more precisely. To obtain optimal point entities for prompting LLMs, we also proposed a point entity selection method based on K-Means clustering. Our extensive experiments on some representative NER benchmarks verify the effectiveness of our proposed strategies in P-ICL and point entity selection.

computational linguistic, entity type, point entity, (14 more...)

arXiv.org Artificial Intelligence

2405.0496

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > Syria (0.04)
Asia > Japan (0.04)
(18 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)

Add feedback

Oblivious subspace embeddings for compressed Tucker decompositions

Pietrosanu, Matthew, Jiang, Bei, Kong, Linglong

arXiv.org Machine LearningJun-13-2024

Emphasis in the tensor literature on random embeddings (tools for low-distortion dimension reduction) for the canonical polyadic (CP) tensor decomposition has left analogous results for the more expressive Tucker decomposition comparatively lacking. This work establishes general Johnson-Lindenstrauss (JL) type guarantees for the estimation of Tucker decompositions when an oblivious random embedding is applied along each mode. When these embeddings are drawn from a JL-optimal family, the decomposition can be estimated within $\varepsilon$ relative error under restrictions on the embedding dimension that are in line with recent CP results. We implement a higher-order orthogonal iteration (HOOI) decomposition algorithm with random embeddings to demonstrate the practical benefits of this approach and its potential to improve the accessibility of otherwise prohibitive tensor analyses. On moderately large face image and fMRI neuroimaging datasets, empirical results show that substantial dimension reduction is possible with minimal increase in reconstruction error relative to traditional HOOI ($\leq$5% larger error, 50%-60% lower computation time for large models with 50% dimension reduction along each mode). Especially for large tensors, our method outperforms traditional higher-order singular value decomposition (HOSVD) and recently proposed TensorSketch methods.

algorithm, decomposition, tucker decomposition, (14 more...)

arXiv.org Machine Learning

2406.09387

Country:

North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.14)
Africa > Senegal > Kolda Region > Kolda (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.90)
Health & Medicine > Health Care Technology (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.48)

Add feedback

DeTriever: Decoder-representation-based Retriever for Improving NL2SQL In-Context Learning

Feng, Yuxi, Li, Raymond, Fan, Zhenan, Carenini, Giuseppe, Pourreza, Mohammadreza, Zhang, Weiwei, Zhang, Yong

arXiv.org Artificial IntelligenceJun-12-2024

While in-context Learning (ICL) has proven to be an effective technique to improve the performance of Large Language Models (LLMs) in a variety of complex tasks, notably in translating natural language questions into Structured Query Language (NL2SQL), the question of how to select the most beneficial demonstration examples remains an open research problem. While prior works often adapted off-the-shelf encoders to retrieve examples dynamically, an inherent discrepancy exists in the representational capacities between the external retrievers and the LLMs. Further, optimizing the selection of examples is a non-trivial task, since there are no straightforward methods to assess the relative benefits of examples without performing pairwise inference. To address these shortcomings, we propose DeTriever, a novel demonstration retrieval framework that learns a weighted combination of LLM hidden states, where rich semantic information is encoded. To train the model, we propose a proxy score that estimates the relative benefits of examples based on the similarities between output queries. Experiments on two popular NL2SQL benchmarks demonstrate that our method significantly outperforms the state-of-the-art baselines on one-shot NL2SQL tasks.

detriever, language model, representation, (14 more...)

arXiv.org Artificial Intelligence

2406.07913

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Washington > King County > Seattle (0.04)
North America > Dominican Republic (0.04)
(9 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Hierarchical Neural Networks, p-Adic PDEs, and Applications to Image Processing

Zúñiga-Galindo, W. A., Zambrano-Luna, B. A., Dibba, Baboucarr

arXiv.org Artificial IntelligenceJun-11-2024

The first goal of this article is to introduce a new type of p-adic reaction-diffusion cellular neural network with delay. We study the stability of these networks and provide numerical simulations of their responses. The second goal is to provide a quick review of the state of the art of p-adic cellular neural networks and their applications to image processing.

cnn, equation, neural network, (15 more...)

arXiv.org Artificial Intelligence

2406.0779

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Spain > Andalusia > Seville Province > Seville (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.64)

Add feedback

Benchmark Data Contamination of Large Language Models: A Survey

Xu, Cheng, Guan, Shuhao, Greene, Derek, Kechadi, M-Tahar

arXiv.org Artificial IntelligenceJun-6-2024

The rapid development of Large Language Models (LLMs) like GPT-4, Claude-3, and Gemini has transformed the field of natural language processing. However, it has also resulted in a significant issue known as Benchmark Data Contamination (BDC). This occurs when language models inadvertently incorporate evaluation benchmark information from their training data, leading to inaccurate or unreliable performance during the evaluation phase of the process. This paper reviews the complex challenge of BDC in LLM evaluation and explores alternative assessment methods to mitigate the risks associated with traditional benchmarks. The paper also examines challenges and future directions in mitigating BDC risks, highlighting the complexity of the issue and the need for innovative solutions to ensure the reliability of LLM evaluation in real-world applications.

computational linguistic, language model, proceedings, (13 more...)

arXiv.org Artificial Intelligence

2406.04244

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > Singapore (0.04)
(27 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Promising Solution (0.88)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.67)

Add feedback

Leveraging automatic strategy discovery to teach people how to select better projects

Heindrich, Lovis, Lieder, Falk

arXiv.org Artificial IntelligenceJun-6-2024

The decisions of individuals and organizations are often suboptimal because normative decision strategies are too demanding in the real world. Recent work suggests that some errors can be prevented by leveraging artificial intelligence to discover and teach prescriptive decision strategies that take people's constraints into account. So far, this line of research has been limited to simplified decision problems. This article is the first to extend this approach to a real-world decision problem, namely project selection. We develop a computational method (MGPS) that automatically discovers project selection strategies that are optimized for real people and develop an intelligent tutor that teaches the discovered strategies. We evaluated MGPS on a computational benchmark and tested the intelligent tutor in a training experiment with two control conditions. MGPS outperformed a state-of-the-art method and was more computationally efficient. Moreover, the intelligent tutor significantly improved people's decision strategies. Our results indicate that our method can improve human decision-making in naturalistic settings similar to real-world project selection, a first step towards applying strategy discovery to the real world.

decision strategy, participant, project selection, (13 more...)

arXiv.org Artificial Intelligence

2406.04082

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > New York (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
(2 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Decision Support Systems (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Cognitive Science (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback