AITopics | text feature

Collaborating Authors

text feature

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CLAP4 CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models

Neural Information Processing SystemsFeb-18-2026, 13:30:21 GMT

This makes them overlook the many possible interactions across the input modalities and deems them unsafe for high-risk tasks requiring reliable uncertainty estimation.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Oceania > Australia > New South Wales (0.04)
Europe > France (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
(3 more...)

Add feedback

UMFC: Unsupervised Multi-Domain Feature Calibration for Vision-Language Models

Neural Information Processing SystemsFeb-18-2026, 05:22:43 GMT

Specifically, we observe that CLIP's visual encoder tends to

information, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: Asia > China (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

Robust Multimodal Sentiment Analysis of Image-Text Pairs by Distribution-Based Feature Recovery and Fusion

Wu, Daiqing, Yang, Dongbao, Zhou, Yu, Ma, Can

arXiv.org Artificial IntelligenceDec-4-2025

As posts on social media increase rapidly, analyzing the sentiments embedded in image-text pairs has become a popular research topic in recent years. Although existing works achieve impressive accomplishments in simultaneously harnessing image and text information, they lack the considerations of possible low-quality and missing modalities. In real-world applications, these issues might frequently occur, leading to urgent needs for models capable of predicting sentiment robustly. Therefore, we propose a Distribution-based feature Recovery and Fusion (DRF) method for robust multimodal sentiment analysis of image-text pairs. Specifically, we maintain a feature queue for each modality to approximate their feature distributions, through which we can simultaneously handle low-quality and missing modalities in a unified framework. For low-quality modalities, we reduce their contributions to the fusion by quantitatively estimating modality qualities based on the distributions. For missing modalities, we build inter-modal mapping relationships supervised by samples and distributions, thereby recovering the missing modalities from available ones. In experiments, two disruption strategies that corrupt and discard some modalities in samples are adopted to mimic the low-quality and missing modalities in various real-world scenarios. Through comprehensive experiments on three publicly available image-text datasets, we demonstrate the universal improvements of DRF compared to SOTA methods under both two strategies, validating its effectiveness in robust multimodal sentiment analysis.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3664647.3680653

2511.18751

Country:

Europe (1.00)
Asia (1.00)
North America > United States > New York (0.28)
North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Empathy Level Prediction in Multi-Modal Scenario with Supervisory Documentation Assistance

Xiao, Yufei, Wang, Shangfei

arXiv.org Artificial IntelligenceDec-3-2025

Abstract--Prevalent empathy prediction techniques primarily concentrate on a singular modality, typically textual, thus neglecting multi-modal processing capabilities. They also overlook the utilization of certain privileged information, which may encompass additional empathetic content. In response, we introduce an advanced multi-modal empathy prediction method integrating video, audio, and text information. The method comprises the Multi-Modal Empathy Prediction and Supervisory Documentation Assisted Training. We use pre-trained networks in the empathy prediction network to extract features from various modalities, followed by a cross-modal fusion. This process yields a multi-modal feature representation, which is employed to predict empathy labels. T o enhance the extraction of text features, we incorporate supervisory documents as privileged information during the assisted training phase. Specifically, we apply the Latent Dirichlet Allocation model to identify potential topic distributions to constrain text features. These supervisory documents, created by supervisors, focus on the counseling topics and the counselor's display of empathy. Notably, this privileged information is only available during training and is not accessible during the prediction phase. Experimental results on the multi-modal and dialogue empathy datasets demonstrate that our approach is superior to the existing methods. MP A THY is characterized by an emotional response that arises from the interplay between inherent traits and situational factors. This empathetic response is spontaneously triggered, yet it is also sculpted by deliberate cognitive control.

information, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2512.02558

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.88)

Add feedback

M4-BLIP: Advancing Multi-Modal Media Manipulation Detection through Face-Enhanced Local Analysis

Wu, Hang, Sun, Ke, Ji, Jiayi, Sun, Xiaoshuai, Ji, Rongrong

arXiv.org Artificial IntelligenceDec-2-2025

In the contemporary digital landscape, multi-modal media manipulation has emerged as a significant societal threat, impacting the reliability and integrity of information dissemination. Current detection methodologies in this domain often overlook the crucial aspect of localized information, despite the fact that manipulations frequently occur in specific areas, particularly in facial regions. In response to this critical observation, we propose the M4-BLIP framework. This innovative framework utilizes the BLIP-2 model, renowned for its ability to extract local features, as the cornerstone for feature extraction. Complementing this, we incorporate local facial information as prior knowledge. A specially designed alignment and fusion module within M4-BLIP meticulously integrates these local and global features, creating a harmonious blend that enhances detection accuracy. Furthermore, our approach seamlessly integrates with Large Language Models (LLM), significantly improving the interpretability of the detection outcomes. Extensive quantitative and visualization experiments validate the effectiveness of our framework against the state-of-the-art competitors.

information, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2512.01214

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.48)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

ReaGAN: Node-as-Agent-Reasoning Graph Agentic Network

Guo, Minghao, Zhu, Xi, Xue, Haochen, Zhang, Chong, Lin, Shuhang, Huang, Jingyuan, Ye, Ziyi, Zhang, Yongfeng

arXiv.org Artificial IntelligenceOct-21-2025

Graph Neural Networks (GNNs) have achieved remarkable success in graph-based learning by propagating information among neighbor nodes via predefined aggregation mechanisms. However, such fixed schemes often suffer from two key limitations. First, they cannot handle the imbalance in node informativeness -- some nodes are rich in information, while others remain sparse. Second, predefined message passing primarily leverages local structural similarity while ignoring global semantic relationships across the graph, limiting the model's ability to capture distant but relevant information. We propose Retrieval-augmented Graph Agentic Network (ReaGAN), an agent-based framework that empowers each node with autonomous, node-level decision-making. Each node acts as an agent that independently plans its next action based on its internal memory, enabling node-level planning and adaptive message propagation. Additionally, retrieval-augmented generation (RAG) allows nodes to access semantically relevant content and build global relationships in the graph. ReaGAN achieves competitive performance under few-shot in-context settings using a frozen LLM backbone without fine-tuning, showcasing the potential of agentic planning and local-global retrieval in graph learning.

large language model, machine learning, node, (18 more...)

arXiv.org Artificial Intelligence

2508.00429

Country:

North America > United States (0.69)
Europe (0.46)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Machine Learning for Climate Policy: Understanding Policy Progression in the European Green Deal

West, Patricia, Wan, Michelle WL, Hepburn, Alexander, Simpson, Edwin, Santos-Rodriguez, Raul, Clark, Jeffrey N

arXiv.org Artificial IntelligenceOct-21-2025

Climate change demands effective legislative action to mitigate its impacts. This study explores the application of machine learning (ML) to understand the progression of climate policy from announcement to adoption, focusing on policies within the European Green Deal. We present a dataset of 165 policies, incorporating text and metadata. We aim to predict a policy's progression status, and compare text representation methods, including TF-IDF, BERT, and ClimateBERT. Metadata features are included to evaluate the impact on predictive performance. On text features alone, ClimateBERT outperforms other approaches (RMSE = 0.17, R^2 = 0.29), while BERT achieves superior performance with the addition of metadata features (RMSE = 0.16, R^2 = 0.38). Using methods from explainable AI highlights the influence of factors such as policy wording and metadata including political party and country representation. These findings underscore the potential of ML tools in supporting climate policy analysis and decision-making.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2510.16233

Country:

North America > United States (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Law (1.00)
Government (0.89)
Energy > Energy Policy (0.85)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.72)

Add feedback

D-TPT: Dimensional Entropy Maximization for Calibrating Test-Time Prompt Tuning in Vision-Language Models

Han, Jisu, Hwang, Wonjun

arXiv.org Artificial IntelligenceOct-20-2025

Test-time adaptation paradigm provides flexibility towards domain shifts by performing immediate adaptation on unlabeled target data from the source model. Vision-Language Models (VLMs) leverage their generalization capabilities for diverse downstream tasks, and test-time prompt tuning has emerged as a prominent solution for adapting VLMs. In this work, we explore contrastive VLMs and identify the modality gap caused by a single dominant feature dimension across modalities. We observe that the dominant dimensions in both text and image modalities exhibit high predictive sensitivity, and that constraining their influence can improve calibration error. Building on this insight, we propose dimensional entropy maximization that regularizes the distribution of textual features toward uniformity to mitigate the dependency of dominant dimensions. Our method alleviates the degradation of calibration performance in test-time prompt tuning, offering a simple yet effective solution to enhance the reliability of VLMs in real-world deployment scenarios.

dimension, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.09473

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

A Text-Image Fusion Method with Data Augmentation Capabilities for Referring Medical Image Segmentation

Chai, Shurong, JAIN, Rahul Kumar, Xu, Rui, Mo, Shaocong, Hou, Ruibo, Teng, Shiyu, Liu, Jiaqing, Lin, Lanfen, Chen, Yen-Wei

arXiv.org Artificial IntelligenceOct-15-2025

Deep learning relies heavily on data augmentation to mitigate limited data, especially in medical imaging. Recent multimodal learning integrates text and images for segmentation, known as referring or text-guided image segmentation. However, common augmentations like rotation and flipping disrupt spatial alignment between image and text, weakening performance. To address this, we propose an early fusion framework that combines text and visual features before augmentation, preserving spatial consistency. We also design a lightweight generator that projects text embeddings into visual space, bridging semantic gaps. Visualization of generated pseudo-images shows accurate region localization. Our method is evaluated on three medical imaging tasks and four segmentation frameworks, achieving state-of-the-art results. Code is publicly available on GitHub: https://github.com/11yxk/MedSeg_EarlyFusion.

artificial intelligence, image understanding, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2510.12482

Country: Asia > China (0.29)

Genre: Research Report (0.64)

Industry: