AITopics | computation and language

Collaborating Authors

computation and language

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Computational Approaches to Understanding Large Language Model Impact on Writing and Information Ecosystems

Liang, Weixin

arXiv.org Artificial IntelligenceJun-24-2025

Large language models (LLMs) have shown significant potential to change how we write, communicate, and create, leading to rapid adoption across society. This dissertation examines how individuals and institutions are adapting to and engaging with this emerging technology through three research directions. First, I demonstrate how the institutional adoption of AI detectors introduces systematic biases, particularly disadvantaging writers of non-dominant language varieties, highlighting critical equity concerns in AI governance. Second, I present novel population-level algorithmic approaches that measure the increasing adoption of LLMs across writing domains, revealing consistent patterns of AI-assisted content in academic peer reviews, scientific publications, consumer complaints, corporate communications, job postings, and international organization press releases. Finally, I investigate LLMs' capability to provide feedback on research manuscripts through a large-scale empirical analysis, offering insights into their potential to support researchers who face barriers in accessing timely manuscript feedback, particularly early-career researchers and those from under-resourced settings.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2506.17467

Country:

North America > United States (1.00)
Asia (0.67)

Genre:

Summary/Review (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
(4 more...)

Industry:

Media > News (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
(3 more...)

Add feedback

Enhancing Scientific Reproducibility Through Automated BioCompute Object Creation Using Retrieval-Augmented Generation from Publications

Kim, Sean, Mazumder, Raja

arXiv.org Artificial IntelligenceSep-23-2024

The exponential growth in computational power and accessibility has transformed the complexity and scale of bioinformatics research, necessitating standardized documentation for transparency, reproducibility, and regulatory compliance. The IEEE BioCompute Object (BCO) standard addresses this need but faces adoption challenges due to the overhead of creating compliant documentation, especially for legacy research. This paper presents a novel approach to automate the creation of BCOs from scientific papers using Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs). We describe the development of the BCO assistant tool that leverages RAG to extract relevant information from source papers and associated code repositories, addressing key challenges such as LLM hallucination and long-context understanding. The implementation incorporates optimized retrieval processes, including a two-pass retrieval with re-ranking, and employs carefully engineered prompts for each BCO domain. We discuss the tool's architecture, extensibility, and evaluation methods, including automated and manual assessment approaches. The BCO assistant demonstrates the potential to significantly reduce the time and effort required for retroactive documentation of bioinformatics research while maintaining compliance with the standard. This approach opens avenues for AI-assisted scientific documentation and knowledge extraction from publications thereby enhancing scientific reproducibility. The BCO assistant tool and documentation is available at https://biocompute-objects.github.io/bco-rag/.

documentation, information, retrieval process, (17 more...)

arXiv.org Artificial Intelligence

2409.15076

Country: North America > United States > District of Columbia > Washington (0.04)

Genre:

Workflow (0.97)
Research Report (0.72)
Overview > Innovation (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Cost-Aware Uncertainty Reduction in Schema Matching with GPT-4: The Prompt-Matcher Framework

Feng, Longyu, Li, Huahang, Zhang, Chen Jason

arXiv.org Artificial IntelligenceAug-24-2024

Schema matching is the process of identifying correspondences between the elements of two given schemata, essential for database management systems, data integration, and data warehousing. The inherent uncertainty of current schema matching algorithms leads to the generation of a set of candidate matches. Storing these results necessitates the use of databases and systems capable of handling probabilistic queries. This complicates the querying process and increases the associated storage costs. Motivated by GPT-4 outstanding performance, we explore its potential to reduce uncertainty. Our proposal is to supplant the role of crowdworkers with GPT-4 for querying the set of candidate matches. To get more precise correspondence verification responses from GPT-4, We have crafted Semantic-match and Abbreviation-match prompt for GPT-4, achieving state-of-the-art results on two benchmark datasets DeepMDatasets 100% (+0.0) and Fabricated-Datasets 91.8% (+2.2) recall rate. To optimise budget utilisation, we have devised a cost-aware solution. Within the constraints of the budget, our solution delivers favourable outcomes with minimal time expenditure. We introduce a novel framework, Prompt-Matcher, to reduce the uncertainty in the process of integration of multiple automatic schema matching algorithms and the selection of complex parameterization. It assists users in diminishing the uncertainty associated with candidate schema match results and in optimally ranking the most promising matches. We formally define the Correspondence Selection Problem, aiming to optimise the revenue within the confines of the GPT-4 budget. We demonstrate that CSP is NP-Hard and propose an approximation algorithm with minimal time expenditure. Ultimately, we demonstrate the efficacy of Prompt-Matcher through rigorous experiments.

candidate result, correspondence, gpt-4, (15 more...)

arXiv.org Artificial Intelligence

2408.14507

Country: Asia > China > Hong Kong (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Echotune: A Modular Extractor Leveraging the Variable-Length Nature of Speech in ASR Tasks

Chen, Sizhou, Gao, Songyang, Fang, Sen

arXiv.org Artificial IntelligenceSep-14-2023

The Transformer architecture has proven to be highly effective for Automatic Speech Recognition (ASR) tasks, becoming a foundational component for a plethora of research in the domain. Historically, many approaches have leaned on fixed-length attention windows, which becomes problematic for varied speech samples in duration and complexity, leading to data over-smoothing and neglect of essential long-term connectivity. Addressing this limitation, we introduce Echo-MSA, a nimble module equipped with a variable-length attention mechanism that accommodates a range of speech sample complexities and durations. This module offers the flexibility to extract speech features across various granularities, spanning from frames and phonemes to words and discourse. The proposed design captures the variable length feature of speech and addresses the limitations of fixed-length attention. Our evaluation leverages a parallel attention architecture complemented by a dynamic gating mechanism that amalgamates traditional attention with the Echo-MSA module output. Empirical evidence from our study reveals that integrating Echo-MSA into the primary model's training regime significantly enhances the word error rate (WER) performance, all while preserving the intrinsic stability of the original model.

arxiv, recognition, speech recognition, (14 more...)

arXiv.org Artificial Intelligence

2309.07765

Country:

Oceania > Australia (0.04)
Asia > China > Sichuan Province > Chengdu (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Delineating Knowledge Domains in the Scientific Literature Using Visual Information

Yang, Sean, Lee, Po-shen, West, Jevin D., Howe, Bill

arXiv.org Machine LearningAug-12-2019

Figures are an important channel for scientific communication, used to express complex ideas, models and data in ways that words cannot. However, this visual information is mostly ignored in analyses of the scientific literature. In this paper, we demonstrate the utility of using scientific figures as markers of knowledge domains in science, which can be used for classification, recommender systems, and studies of scientific information exchange. We encode sets of images into a visual signature, then use distances between these signatures to understand how patterns of visual communication compare with patterns of jargon and citation structures. We find that figures can be as effective for differentiating communities of practice as text or citation patterns. We then consider where these metrics disagree to understand how different disciplines use visualization to express ideas. Finally, we further consider how specific figure types propagate through the literature, suggesting a new mechanism for understanding the flow of ideas apart from conventional channels of text and citations. Our ultimate aim is to better leverage these information-dense objects to improve scientific communication across disciplinary boundaries.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Machine Learning

1908.07465

Country: North America > United States (0.30)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Parsing English with a link grammar

Sleator, D. | Temperley, D.

ClassicsFeb-1-1993

In just 3 minutes, help us better understand how you perceive arXiv. We gratefully acknowledge support from the Simons Foundation and member institutions.

artificial intelligence, computation and language, natural language, (15 more...)

Classics

Genre: Research Report (0.88)

Industry: Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.42)
Information Technology > Security & Privacy (0.34)

Add feedback