AITopics | ground-truth

Collaborating Authors

ground-truth

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ReactZyme: ABenchmarkfor Enzyme-ReactionPrediction

Neural Information Processing SystemsFeb-10-2026, 12:45:03 GMT

Enzymes, as catalysts of biological systems, are the workhorses of various biological functions [35, 52, 13] (Figure 1a).

artificial intelligence, machine learning, reaction, (19 more...)

Neural Information Processing Systems

Country: Europe > Germany > Rheinland-Pfalz > Mainz (0.04)

Genre: Research Report (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Supplementary Information: Acausalviewofcompositionalzero-shotrecognition

Neural Information Processing SystemsFeb-7-2026, 12:16:07 GMT

Next, we introduce two additional approximations we use to apply Eq. (S.9). An SCM matches a set of assignments to a causal graph. This implies that the error of the approximation Eq. (S.13) is mainly dominated by the gradients of g at hao, and the variance ofnao. Specifically, we use a positive differentiable measure of the statistical dependence, denoted by I. PIDA measures disentanglement of representations for models that are trained from unsupervised data. As a result, we have the following: Minimizing Eq. (S.21) leads topdo(a,o)(ˆφa0) approaching p(ˆφa0|a), which as we have just shown, leads top(ˆφa0|a) approaching pdo(a)(ˆφa0).

artificial intelligence, causal, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

2e68b2367d2e0bc8dd6f0ff86e07c2eb-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsOct-9-2025, 22:14:24 GMT

enzyme, esm 0, reaction, (12 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
North America > Canada (0.04)
Europe > Germany > Rheinland-Pfalz > Mainz (0.04)
Asia > China (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Biomedical Informatics (0.68)
Information Technology > Information Management (0.67)

Add feedback

MP-GUI: Modality Perception with MLLMs for GUI Understanding

Wang, Ziwei, Chen, Weizhi, Yang, Leyang, Zhou, Sheng, Zhao, Shengchu, Zhan, Hanbei, Jin, Jiongchao, Li, Liangcheng, Shao, Zirui, Bu, Jiajun

arXiv.org Artificial IntelligenceMar-18-2025

Graphical user interface (GUI) has become integral to modern society, making it crucial to be understood for human-centric systems. However, unlike natural images or documents, GUIs comprise artificially designed graphical elements arranged to convey specific semantic meanings. Current multi-modal large language models (MLLMs) already proficient in processing graphical and textual components suffer from hurdles in GUI understanding due to the lack of explicit spatial structure modeling. Moreover, obtaining high-quality spatial structure data is challenging due to privacy issues and noisy environments. To address these challenges, we present MP-GUI, a specially designed MLLM for GUI understanding. MP-GUI features three precisely specialized perceivers to extract graphical, textual, and spatial modalities from the screen as GUI-tailored visual clues, with spatial structure refinement strategy and adaptively combined via a fusion gate to meet the specific preferences of different GUI understanding tasks. To cope with the scarcity of training data, we also introduce a pipeline for automatically data collecting. Extensive experiments demonstrate that MP-GUI achieves impressive results on various GUI understanding tasks with limited data.

large language model, machine learning, mp-gui, (21 more...)

arXiv.org Artificial Intelligence

2503.14021

Country:

Asia > China (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > Canada > Quebec > Montreal (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.54)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(3 more...)

Add feedback

Simple is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation

Li, Mufei, Miao, Siqi, Li, Pan

arXiv.org Artificial IntelligenceNov-11-2024

Large Language Models (LLMs) demonstrate strong reasoning abilities but face limitations such as hallucinations and outdated knowledge. Knowledge Graph (KG)-based Retrieval-Augmented Generation (RAG) addresses these issues by grounding LLM outputs in structured external knowledge from KGs. However, current KG-based RAG frameworks still struggle to optimize the trade-off between retrieval effectiveness and efficiency in identifying a suitable amount of relevant graph information for the LLM to digest. We introduce SubgraphRAG, extending the KG-based RAG framework that retrieves subgraphs and leverages LLMs for reasoning and answer prediction. Our approach innovatively integrates a lightweight multilayer perceptron with a parallel triple-scoring mechanism for efficient and flexible subgraph retrieval while encoding directional structural distances to enhance retrieval effectiveness. The size of retrieved subgraphs can be flexibly adjusted to match the query's need and the downstream LLM's capabilities. This design strikes a balance between model complexity and reasoning power, enabling scalable and generalizable retrieval processes. Notably, based on our retrieved subgraphs, smaller LLMs like Llama3.1-8B-Instruct deliver competitive results with explainable reasoning, while larger models like GPT-4o achieve state-of-the-art accuracy compared with previous baselines -- all without fine-tuning. Extensive evaluations on the WebQSP and CWQ benchmarks highlight SubgraphRAG's strengths in efficiency, accuracy, and reliability by reducing hallucinations and improving response grounding.

arxiv preprint arxiv, reasoning, san francisco giant, (13 more...)

arXiv.org Artificial Intelligence

2410.20724

Country:

North America > United States > California > San Francisco County > San Francisco (0.06)
North America > United States > Texas (0.04)
North America > United States > Ohio (0.04)
(9 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Sports > Baseball (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Education (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Mitigating Selection Bias with Node Pruning and Auxiliary Options

Choi, Hyeong Kyu, Xu, Weijie, Xue, Chi, Eckman, Stephanie, Reddy, Chandan K.

arXiv.org Artificial IntelligenceSep-27-2024

To mitigate this selection bias problem, previous solutions utilized debiasing methods to adjust the model's input and/or output. Our work, in contrast, investigates the model's internal representation of the selection bias. Specifically, we introduce a novel debiasing approach, Bias Node Pruning (BNP), which eliminates the linear layer parameters that contribute to the bias. Furthermore, we present Auxiliary Option Injection (AOI), a simple yet effective input modification technique for debiasing, which is compatible even with black-box LLMs. To provide a more systematic evaluation of selection bias, we review existing metrics and introduce Choice Kullback-Leibler Divergence (CKLD), which addresses the insensitivity of the commonly used metrics to imbalance in choice labels. Experiments show that our methods are robust and adaptable across various datasets when applied to three LLMs. The advent of large language models (LLMs) has revolutionized artificial intelligence applications, particularly in the domain of natural language processing. These models have demonstrated outstanding performance across a variety of use cases, including chatbots, machine translation, text generation, data annotation, etc. Their ability to answer questions with high precision has opened up new avenues for automated systems. Despite their remarkable abilities, LLMs suffer from the selection bias problem that often occurs in answering multiplechoice questions (MCQs). When selecting the answer for an MCQ, many LLMs prefer the choices in a given position (e.g., the last choice), or with a specific choice symbol (e.g., (A) or (3)) (Zheng et al., 2024; Wei et al., 2024; Pezeshkpour & Hruschka, 2024). Many previous works have attempted to explain this phenomenon and/or propose diverse ways to mitigate selection bias. While there are a few works focused on either modifying the input format (Li et al., 2023b; Robinson et al., 2023) or calibrating the output probabilities (Zheng et al., 2024; Reif Figure 1: We propose BNP and & Schwartz, 2024; Wei et al., 2024), to the best of our knowledge, AOI to reduce selection bias for no embedding or parameter-level investigation has been white-box and black-box models. Because selection bias originates from internal The CKLD metric is also proposed parameter-level computations, it is crucial to explore how the to encourage a more standardized LLM embeddings contribute to the bias in their output responses. Understanding the internal representation of selection bias can help us combat it. By scrutinizing the interaction between the internal representation and the LLM parameters, we develop a novel approach to debias the model. Specifically, we propose Bias Node Pruning (BNP), which eliminates nodes in the final linear layer that contribute to selection bias. By dropping as few as 32 out of 4096 nodes in the final layer, we can significantly reduce selection bias and improve question-answering performance.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2409.18857

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Virginia (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre:

Research Report > Promising Solution (0.66)
Research Report > New Finding (0.46)

Industry:

Health & Medicine (0.46)
Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reactzyme: A Benchmark for Enzyme-Reaction Prediction

Hua, Chenqing, Zhong, Bozitao, Luan, Sitao, Hong, Liang, Wolf, Guy, Precup, Doina, Zheng, Shuangjia

arXiv.org Artificial IntelligenceAug-24-2024

Enzymes, with their specific catalyzed reactions, are necessary for all aspects of life, enabling diverse biological processes and adaptations. Predicting enzyme functions is essential for understanding biological pathways, guiding drug development, enhancing bioproduct yields, and facilitating evolutionary studies. Addressing the inherent complexities, we introduce a new approach to annotating enzymes based on their catalyzed reactions. This method provides detailed insights into specific reactions and is adaptable to newly discovered reactions, diverging from traditional classifications by protein family or expert-derived reaction classes. We employ machine learning algorithms to analyze enzyme reaction datasets, delivering a much more refined view on the functionality of enzymes. Our evaluation leverages the largest enzyme-reaction dataset to date, derived from the SwissProt and Rhea databases with entries up to January 8, 2024. We frame the enzyme-reaction prediction as a retrieval problem, aiming to rank enzymes by their catalytic ability for specific reactions. With our model, we can recruit proteins for novel reactions and predict reactions in novel proteins, facilitating enzyme discovery and function annotation.

enzyme, esm 0, reaction, (12 more...)

arXiv.org Artificial Intelligence

2408.13659

Country:

North America > United States (0.14)
Europe > Germany > Rheinland-Pfalz > Mainz (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Image Restoration Using Deep Regulated Convolutional Networks

Liu, Peng, Zhou, Xiaoxiao, Li, Yangjunyi, D, El Basha Mohammad, Fang, Ruogu

arXiv.org Artificial IntelligenceJun-21-2024

While the depth of convolutional neural networks has attracted substantial attention in the deep learning research, the width of these networks has recently received greater interest. The width of networks, defined as the size of the receptive fields and the density of the channels, has demonstrated crucial importance in low-level vision tasks such as image denoising and restoration. However, the limited generalization ability, due to the increased width of networks, creates a bottleneck in designing wider networks. In this paper, we propose the Deep Regulated Convolutional Network (RC-Net), a deep network composed of regulated sub-network blocks cascaded by skip-connections, to overcome this bottleneck. Specifically, the Regulated Convolution block (RC-block), featured by a combination of large and small convolution filters, balances the effectiveness of prominent feature extraction and the generalization ability of the network. RC-Nets have several compelling advantages: they embrace diversified features through large-small filter combinations, alleviate the hazy boundary and blurred details in image denoising and super-resolution problems, and stabilize the learning process. Our proposed RC-Nets outperform state-of-the-art approaches with significant performance gains in various image restoration tasks while demonstrating promising generalization ability. The code is available at https://github.com/cswin/RC-Nets.

db 0, feature extraction, rc-net, (15 more...)

arXiv.org Artificial Intelligence

1910.08853

Country:

North America > United States (0.05)
Asia > China (0.04)

Genre: Research Report > Promising Solution (0.35)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

VideoVista: A Versatile Benchmark for Video Understanding and Reasoning

Li, Yunxin, Chen, Xinyu, Hu, Baotian, Wang, Longyue, Shi, Haoyuan, Zhang, Min

arXiv.org Artificial IntelligenceJun-17-2024

Despite significant breakthroughs in video analysis driven by the rapid development of large multimodal models (LMMs), there remains a lack of a versatile evaluation benchmark to comprehensively assess these models' performance in video understanding and reasoning. To address this, we present VideoVista, a video QA benchmark that integrates challenges across diverse content categories, durations, and abilities. Specifically, VideoVista comprises 25,000 questions derived from 3,400 videos spanning 14 categories (e.g., Howto, Film, and Entertainment) with durations ranging from a few seconds to over 10 minutes. Besides, it encompasses 19 types of understanding tasks (e.g., anomaly detection, interaction understanding) and 8 reasoning tasks (e.g., logical reasoning, causal reasoning). To achieve this, we present an automatic data construction framework, leveraging powerful GPT-4o alongside advanced analysis tools (e.g., video splitting, object segmenting, and tracking). We also utilize this framework to construct training data to enhance the capabilities of video-related LMMs (Video-LMMs). Through a comprehensive and quantitative evaluation of cutting-edge models, we reveal that: 1) Video-LMMs face difficulties in fine-grained video tasks involving temporal location, object tracking, and anomaly detection; 2) Video-LMMs present inferior logical and relation reasoning abilities; 3) Open-source Video-LMMs' performance is significantly lower than GPT-4o and Gemini-1.5, lagging by 20 points. This highlights the crucial role VideoVista will play in advancing LMMs that can accurately understand videos and perform precise reasoning.

information, video, video-llm response and evaluation result, (13 more...)

arXiv.org Artificial Intelligence

2406.11303

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(3 more...)

Genre: Research Report > Promising Solution (0.48)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Leisure & Entertainment (0.92)
Health & Medicine > Consumer Health (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MoReVQA: Exploring Modular Reasoning Models for Video Question Answering

Min, Juhong, Buch, Shyamal, Nagrani, Arsha, Cho, Minsu, Schmid, Cordelia

arXiv.org Artificial IntelligenceApr-9-2024

This paper addresses the task of video question answering (videoQA) via a decomposed multi-stage, modular reasoning framework. Previous modular methods have shown promise with a single planning stage ungrounded in visual content. However, through a simple and effective baseline, we find that such systems can lead to brittle behavior in practice for challenging videoQA settings. Thus, unlike traditional single-stage planning methods, we propose a multi-stage system consisting of an event parser, a grounding stage, and a final reasoning stage in conjunction with an external memory. All stages are training-free, and performed using few-shot prompting of large models, creating interpretable intermediate outputs at each stage. By decomposing the underlying planning and task complexity, our method, MoReVQA, improves over prior work on standard videoQA benchmarks (NExT-QA, iVQA, EgoSchema, ActivityNet-QA) with state-of-the-art results, and extensions to related tasks (grounded videoQA, paragraph captioning).

caption, conjunction, video, (16 more...)

arXiv.org Artificial Intelligence

2404.06511

Country: Asia > South Korea > Gyeongsangbuk-do > Pohang (0.04)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback