AITopics

doi: 10.1007/s10994-025-06903-0

2510.19579

Country:

Europe > France (0.14)
Europe > Germany (0.14)

Genre: Research Report > New Finding (0.86)

Industry: Education (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Shandilya, Utkarsh, Kappan, Marsha Mariya, Jain, Sanyam, Sharma, Vijeta

Advancing Vision-based Human Action Recognition: Exploring Vision-Language CLIP Model for Generalisation in Domain-Independent Tasks

arXiv.org Artificial IntelligenceAug-1-2025

Human action recognition plays a critical role in healthcare and medicine, supporting applications such as patient behavior monitoring, fall detection, surgical robot supervision, and procedural skill assessment. While traditional models like CNNs and RNNs have achieved moderate success, they often struggle to generalize across diverse and complex actions. Recent advancements in vision-language models, especially the transformer-based CLIP model, offer promising capabilities for generalizing action recognition from video data. In this work, we evaluate CLIP on the UCF-101 dataset and systematically analyze its performance under three masking strategies: (1) percentage-based and shape-based black masking at 10%, 30%, and 50%, (2) feature-specific masking to suppress bias-inducing elements, and (3) isolation masking that retains only class-specific regions. Our results reveal that CLIP exhibits inconsistent behavior and frequent misclassifications, particularly when essential visual cues are obscured. To overcome these limitations, we propose incorporating class-specific noise, learned via a custom loss function, to reinforce attention to class-defining features. This enhancement improves classification accuracy and model confidence while reducing bias. We conclude with a discussion on the challenges of applying such models in clinical domains and outline directions for future work to improve generalizability across domain-independent healthcare scenarios.

large language model, machine learning, natural language, (16 more...)

2507.18675

Country: Oceania > Australia (0.28)

Genre: Research Report > New Finding (0.88)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (0.46)
Leisure & Entertainment > Sports > Track & Field (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceApr-24-2025

Disentangling and Generating Modalities for Recommendation in Missing Modality Scenarios

Kim, Jiwan, Kang, Hongseok, Kim, Sein, Kim, Kibum, Park, Chanyoung

Multi-modal recommender systems (MRSs) have achieved notable success in improving personalization by leveraging diverse modalities such as images, text, and audio. However, two key challenges remain insufficiently addressed: (1) Insufficient consideration of missing modality scenarios and (2) the overlooking of unique characteristics of modality features. These challenges result in significant performance degradation in realistic situations where modalities are missing. To address these issues, we propose Disentangling and Generating Modality Recommender (DGMRec), a novel framework tailored for missing modality scenarios. DGMRec disentangles modality features into general and specific modality features from an information-based perspective, enabling richer representations for recommendation. Building on this, it generates missing modality features by integrating aligned features from other modalities and leveraging user modality preferences. Extensive experiments show that DGMRec consistently outperforms state-of-the-art MRSs in challenging scenarios, including missing modalities and new item settings as well as diverse missing ratios and varying levels of missing modalities. Moreover, DGMRec's generation-based approach enables cross-modal retrieval, a task inapplicable for existing MRSs, highlighting its adaptability and potential for real-world applications. Our code is available at https://github.com/ptkjw1997/DGMRec.

artificial intelligence, machine learning, natural language, (18 more...)

2504.16352

Country: Asia (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Social Media (0.96)

Wang, Shanmin, Liu, Chengguang, Liu, Qingshan

Multi-Modality Collaborative Learning for Sentiment Analysis

arXiv.org Artificial IntelligenceJan-21-2025

Multimodal sentiment analysis (MSA) identifies individuals' sentiment states in videos by integrating visual, audio, and text modalities. Despite progress in existing methods, the inherent modality heterogeneity limits the effective capture of interactive sentiment features across modalities. In this paper, by introducing a Multi-Modality Collaborative Learning (MMCL) framework, we facilitate cross-modal interactions and capture enhanced and complementary features from modality-common and modality-specific representations, respectively. Specifically, we design a parameter-free decoupling module and separate uni-modality into modality-common and modality-specific components through semantics assessment of cross-modal elements. For modality-specific representations, inspired by the act-reward mechanism in reinforcement learning, we design policy models to adaptively mine complementary sentiment features under the guidance of a joint reward. For modality-common representations, intra-modal attention is employed to highlight crucial components, playing enhanced roles among modalities. Experimental results, including superiority evaluations on four databases, effectiveness verification of each module, and assessment of complementary features, demonstrate that MMCL successfully learns collaborative features across modalities and significantly improves performance. The code can be available at https://github.com/smwanghhh/MMCL.

machine learning, natural language, reinforcement learning, (18 more...)

2501.12424

Country:

Asia > China > Jiangsu Province > Nanjing (0.05)
Africa > Eswatini > Manzini > Manzini (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.87)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

arXiv.org Artificial IntelligenceApr-5-2024

Reliable Feature Selection for Adversarially Robust Cyber-Attack Detection

Vitorino, João, Silva, Miguel, Maia, Eva, Praça, Isabel

The growing cybersecurity threats make it essential to use high-quality data to train Machine Learning (ML) models for network traffic analysis, without noisy or missing data. By selecting the most relevant features for cyber-attack detection, it is possible to improve both the robustness and computational efficiency of the models used in a cybersecurity system. This work presents a feature selection and consensus process that combines multiple methods and applies them to several network datasets. Two different feature sets were selected and were used to train multiple ML models with regular and adversarial training. Finally, an adversarial evasion robustness benchmark was performed to analyze the reliability of the different feature sets and their impact on the susceptibility of the models to adversarial examples. By using an improved dataset with more data diversity, selecting the best time-related features and a more specific feature set, and performing adversarial training, the ML models were able to achieve a better adversarially robust generalization. The robustness of the models was significantly improved without their generalization to regular traffic flows being affected, without increases of false alarms, and without requiring too many computational resources, which enables a reliable detection of suspicious activity and perturbed traffic flows in enterprise computer networks.

adversarial, dataset, ml model, (17 more...)

doi: 10.1007/s12243-024-01047-z

2404.04188

Country:

North America > United States (0.04)
Europe > Portugal > Porto > Porto (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

#artificialintelligenceJul-29-2022, 08:40:13 GMT

Deep Learning or classical Machine Learning -- which one to use for your project?

During the last decade, Deep Learning has received a lot of attention throughout the globe. "Deep Learning is a superpower. With it, you can make a computer see, synthesise novel art, translate languages, render a medical diagnosis, or build pieces of a car that can drive itself. If that isn't a superpower, I don't know what is." Thus, as many know, one needs to carefully select when to use a superpower.

classical machine learning, deep learning, learning, (10 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceMar-6-2022, 00:10:06 GMT

An Artificial Intelligence Rant: Neural Networks Are Not Magic, They're Code

I was reading yet another document about artificial intelligence (AI). The introduction was covering the basics and the history of the subject. The authors mentioned expert systems and the real flaws that tactic had. Then the authors said that, luckily, there was an alternative called "machine learning." Yet more people who think anything older than them couldn't be classified the same way as the things they know.

artificial intelligence rant, expert system, neural network, (7 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

#artificialintelligenceFeb-17-2022, 21:50:29 GMT

How AI Understand Images in Simple Terms

This article aims to explain one of the most used artificial intelligence models in the world. I will try to make it very simple, so anyone can understand how it works. AI surrounds our daily lives, and it will only become more present, so you need to understand how it works, where we are at, and what's to come. The more you learn about AI, the more you will realize that it is not as advanced as most think due to its narrow intelligence, yet it has powerful applications for individuals and companies. Knowing how it works will help you better understand the possible applications, limitations and communicate better with your tech employees and colleagues.

application, convolution, specific feature, (9 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

#artificialintelligenceJan-20-2022

Face Anonymization Pipeline in Pytorch

Protecting data privacy is critical to preserving customer trust and is also gaining increasing attention from policy makers. Staying ahead of these expectations requires continual improvements to AI toolchains. Anonymizing image data is particularly challenging without badly degrading the quality of the image samples. We developed the capability to anonymize images while preserving the image distribution, giving us an excellent way to maintain the anonymity of the persons in the images while still performing data augmentation tasks. Our approach is based on the paper, "DeepPrivacy: A Generative Adversarial Network for Face Anonymization," published in 2019 at the International Symposium on Visual Computing.

architecture, key point, pipeline, (13 more...)

Industry: Information Technology > Security & Privacy (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)

#artificialintelligenceNov-17-2021, 18:10:24 GMT

Introduction to Computer Vision

Computer vision is a field of AI that focuses on giving computers the ability to see and interpret the world around them in the same way that humans do. Computer vision involves teaching computers to observe the physical world, analyze data, and extract insights from visual inputs. Computer vision is one of the most promising areas of research in artificial intelligence and computer science, and it offers great benefits to businesses today. Basically, image processing involves altering one image in order to produce a new image with improved characteristics. The image might be resized, the brightness and contrast adjusted, the image cropped, blurred, or any number of other digital transformations performed.