AITopics | dimitrio kollia

Collaborating Authors

dimitrio kollia

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Multimodal Alignment with Cross-Attentive GRUs for Fine-Grained Video Understanding

Kim, Namho, Kim, Junhwa

arXiv.org Artificial IntelligenceJul-8-2025

Fine-grained video classification requires understanding complex spatio-temporal and semantic cues that often exceed the capacity of a single modality. In this paper, we propose a multimodal framework that fuses video, image, and text representations using GRU-based sequence encoders and cross-modal attention mechanisms. The model is trained using a combination of classification or regression loss, depending on the task, and is further regularized through feature-level augmentation and autoencoding techniques. To evaluate the generality of our framework, we conduct experiments on two challenging benchmarks: the DVD dataset for real-world violence detection and the Aff-Wild2 dataset for valence-arousal estimation. Our results demonstrate that the proposed fusion strategy significantly outperforms unimodal baselines, with cross-attention and feature augmentation contributing notably to robustness and performance.

dimitrio kollia, machine learning, natural language, (12 more...)

arXiv.org Artificial Intelligence

2507.03531

Country:

Asia > South Korea > Seoul > Seoul (0.04)
Asia > South Korea > Daejeon > Daejeon (0.04)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Semantic Matters: Multimodal Features for Affective Analysis

Hallmen, Tobias, Kampa, Robin-Nico, Deuser, Fabian, Oswald, Norbert, André, Elisabeth

arXiv.org Artificial IntelligenceApr-21-2025

In this study, we present our methodology for two tasks: the Emotional Mimicry Intensity (EMI) Estimation Challenge and the Behavioural Ambivalence/Hesitancy (BAH) Recognition Challenge, both conducted as part of the 8th Workshop and Competition on Affective & Behavior Analysis in-the-wild. We utilize a Wav2Vec 2.0 model pre-trained on a large podcast dataset to extract various audio features, capturing both linguistic and paralinguistic information. Our approach incorporates a valence-arousal-dominance (VAD) module derived from Wav2Vec 2.0, a BERT text encoder, and a vision transformer (ViT) with predictions subsequently processed through a long short-term memory (LSTM) architecture or a convolution-like method for temporal modeling. We integrate the textual and visual modality into our analysis, recognizing that semantic content provides valuable contextual cues and underscoring that the meaning of speech often conveys more critical insights than its acoustic counterpart alone. Fusing in the vision modality helps in some cases to interpret the textual modality more precisely. This combined approach results in significant performance improvements, achieving in EMI $ρ_{\text{TEST}} = 0.706$ and in BAH $F1_{\text{TEST}} = 0.702$, securing first place in the EMI challenge and second place in the BAH challenge.

artificial intelligence, machine learning, modality, (13 more...)

arXiv.org Artificial Intelligence

2504.1146

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

FIESTA: Fisher Information-based Efficient Selective Test-time Adaptation

Honarmand, Mohammadmahdi, Mutlu, Onur Cezmi, Azizian, Parnian, Surabhi, Saimourya, Wall, Dennis P.

arXiv.org Artificial IntelligenceMar-29-2025

Robust facial expression recognition in unconstrained, "in-the-wild" environments remains challenging due to significant domain shifts between training and testing distributions. Test-time adaptation (TTA) offers a promising solution by adapting pre-trained models during inference without requiring labeled test data. However, existing TTA approaches typically rely on manually selecting which parameters to update, potentially leading to suboptimal adaptation and high computational costs. This paper introduces a novel Fisher-driven selective adaptation framework that dynamically identifies and updates only the most critical model parameters based on their importance as quantified by Fisher information. By integrating this principled parameter selection approach with temporal consistency constraints, our method enables efficient and effective adaptation specifically tailored for video-based facial expression recognition. Experiments on the challenging AffWild2 benchmark demonstrate that our approach significantly outperforms existing TTA methods, achieving a 7.7% improvement in F1 score over the base model while adapting only 22,000 parameters-more than 20 times fewer than comparable methods. Our ablation studies further reveal that parameter importance can be effectively estimated from minimal data, with sampling just 1-3 frames sufficient for substantial performance gains. The proposed approach not only enhances recognition accuracy but also dramatically reduces computational overhead, making test-time adaptation more practical for real-world affective computing applications.

artificial intelligence, deep learning, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2503.23257

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Compound Expression Recognition via Large Vision-Language Models

Yu, Jun, Lu, Xilong

arXiv.org Artificial IntelligenceMar-14-2025

Compound Expression Recognition (CER) is crucial for understanding human emotions and improving human-computer interaction. However, CER faces challenges due to the complexity of facial expressions and the difficulty of capturing subtle emotional cues. To address these issues, we propose a novel approach leveraging Large Vision-Language Models (LVLMs). Our method employs a two-stage fine-tuning process: first, pre-trained LVLMs are fine-tuned on basic facial expressions to establish foundational patterns; second, the model is further optimized on a compound-expression dataset to refine visual-language feature interactions. Our approach achieves advanced accuracy on the RAF-DB dataset and demonstrates strong zero-shot generalization on the C-EXPR-DB dataset, showcasing its potential for real-world applications in emotion analysis and human-computer interaction.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2503.11241

Country:

Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Asia > China > Anhui Province > Hefei (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry:

Information Technology (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Lightweight Models for Emotional Analysis in Video

Nguyen, Quoc-Tien, Nguyen, Hong-Hai, Huynh, Van-Thong

arXiv.org Artificial IntelligenceMar-13-2025

In this study, we present an approach for efficient spatiotemporal feature extraction using MobileNetV4 and a multi-scale 3D MLP-Mixer-based temporal aggregation module. MobileNetV4, with its Universal Inverted Bottleneck (UIB) blocks, serves as the backbone for extracting hierarchical feature representations from input image sequences, ensuring both computational efficiency and rich semantic encoding. To capture temporal dependencies, we introduce a three-level MLP-Mixer module, which processes spatial features at multiple resolutions while maintaining structural integrity. Experimental results on the ABAW 8th competition demonstrate the effectiveness of our approach, showing promising performance in affective behavior analysis. By integrating an efficient vision backbone with a structured temporal modeling mechanism, the proposed framework achieves a balance between computational efficiency and predictive accuracy, making it well-suited for real-time applications in mobile and embedded computing environments.

dimitrio kollia, recognition, zafeiriou, (13 more...)

arXiv.org Artificial Intelligence

2503.1053

Country: Asia > Vietnam > Hồ Chí Minh City > Hồ Chí Minh City (0.04)

Genre: Research Report (0.84)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Add feedback

Using Prompts to Guide Large Language Models in Imitating a Real Person's Language Style

Chen, Ziyang, Moscholios, Stylios

arXiv.org Artificial IntelligenceOct-4-2024

Large language models (LLMs), such as GPT series and Llama series have demonstrated strong capabilities in natural language processing, contextual understanding, and text generation. In recent years, researchers are trying to enhance the abilities of LLMs in performing various tasks, and numerous studies have proved that well-designed prompts can significantly improve the performance of LLMs on these tasks. This study compares the language style imitation ability of three different large language models under the guidance of the same zero-shot prompt. It also involves comparing the imitation ability of the same large language model when guided by three different prompts individually. Additionally, by applying a Tree-of-Thoughts (ToT) Prompting method to Llama 3, a conversational AI with the language style of a real person was created. In this study, three evaluation methods were used to evaluate LLMs and prompts. The results show that Llama 3 performs best at imitating language styles, and that the ToT prompting method is the most effective to guide it in imitating language styles. Using a ToT framework, Llama 3 was guided to interact with users in the language style of a specific individual without altering its core parameters, thereby creating a text-based conversational AI that reflects the language style of the individual.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.03848

Country:

North America > United States > Texas (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > Middle East > Malta > Eastern Region > Northern Harbour District > St. Julian's (0.04)
(5 more...)

Genre: Research Report > New Finding (0.88)

Industry: Health & Medicine (0.95)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Temporal Label Hierachical Network for Compound Emotion Recognition

Li, Sunan, Lian, Hailun, Lu, Cheng, Zhao, Yan, Qi, Tianhua, Yang, Hao, Zong, Yuan, Zheng, Wenming

arXiv.org Artificial IntelligenceJul-17-2024

The emotion recognition has attracted more attention in recent decades. Although significant progress has been made in the recognition technology of the seven basic emotions, existing methods are still hard to tackle compound emotion recognition that occurred commonly in practical application. This article introduces our achievements in the 7th Field Emotion Behavior Analysis (ABAW) competition. In the competition, we selected pre trained ResNet18 and Transformer, which have been widely validated, as the basic network framework. Considering the continuity of emotions over time, we propose a time pyramid structure network for frame level emotion prediction. Furthermore. At the same time, in order to address the lack of data in composite emotion recognition, we utilized fine-grained labels from the DFEW database to construct training data for emotion categories in competitions. Taking into account the characteristics of valence arousal of various complex emotions, we constructed a classification framework from coarse to fine in the label space.

dimitrio kollia, emotion, recognition, (11 more...)

arXiv.org Artificial Intelligence

2407.12973

Country:

Asia > China > Jiangsu Province > Nanjing (0.08)
North America > United States > District of Columbia > Washington (0.06)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.71)

Add feedback

Unimodal Multi-Task Fusion for Emotional Mimicry Intensity Prediction

Hallmen, Tobias, Deuser, Fabian, Oswald, Norbert, André, Elisabeth

arXiv.org Artificial IntelligenceJun-16-2024

In this research, we introduce a novel methodology for assessing Emotional Mimicry Intensity (EMI) as part of the 6th Workshop and Competition on Affective Behavior Analysis in-the-wild. Our methodology utilises the Wav2Vec 2.0 architecture, which has been pre-trained on an extensive podcast dataset, to capture a wide array of audio features that include both linguistic and paralinguistic components. We refine our feature extraction process by employing a fusion technique that combines individual features with a global mean vector, thereby embedding a broader contextual understanding into our analysis. A key aspect of our approach is the multi-task fusion strategy that not only leverages these features but also incorporates a pre-trained Valence-Arousal-Dominance (VAD) model. This integration is designed to refine emotion intensity prediction by concurrently processing multiple emotional dimensions, thereby embedding a richer contextual understanding into our framework. For the temporal analysis of audio data, our feature fusion process utilises a Long Short-Term Memory (LSTM) network. This approach, which relies solely on the provided audio data, shows marked advancements over the existing baseline, offering a more comprehensive understanding of emotional mimicry in naturalistic settings, achieving the second place in the EMI challenge.

emotion, recognition, wav2vec 2, (13 more...)

arXiv.org Artificial Intelligence

2403.11879

Country:

North America > United States > New York > New York County > New York City (0.05)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Closer Look at Spatial-Slice Features Learning for COVID-19 Detection

Hsu, Chih-Chung, Lee, Chia-Ming, Chiang, Yang Fan, Chou, Yi-Shiuan, Jiang, Chih-Yu, Tai, Shen-Chieh, Tsai, Chi-Han

arXiv.org Artificial IntelligenceApr-20-2024

Conventional Computed Tomography (CT) imaging recognition faces two significant challenges: (1) There is often considerable variability in the resolution and size of each CT scan, necessitating strict requirements for the input size and adaptability of models. (2) CT-scan contains large number of out-of-distribution (OOD) slices. The crucial features may only be present in specific spatial regions and slices of the entire CT scan. How can we effectively figure out where these are located? To deal with this, we introduce an enhanced Spatial-Slice Feature Learning (SSFL++) framework specifically designed for CT scan. It aim to filter out a OOD data within whole CT scan, enabling our to select crucial spatial-slice for analysis by reducing 70% redundancy totally. Meanwhile, we proposed Kernel-Density-based slice Sampling (KDS) method to improve the stability when training and inference stage, therefore speeding up the rate of convergence and boosting performance. As a result, the experiments demonstrate the promising performance of our model using a simple EfficientNet-2D (E2D) model, even with only 1% of the training data. The efficacy of our approach has been validated on the COVID-19-CT-DB datasets provided by the DEF-AI-MIA workshop, in conjunction with CVPR 2024. Our source code is available at https://github.com/ming053l/E2D

computer vision, ct scan, information, (16 more...)

arXiv.org Artificial Intelligence

2404.01643

Country:

Asia > Taiwan (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Russia > Volga Federal District > Samara Oblast > Samara (0.04)
Asia > Russia (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Audio-Visual Compound Expression Recognition Method based on Late Modality Fusion and Rule-based Decision

Ryumina, Elena, Markitantov, Maxim, Ryumin, Dmitry, Kaya, Heysem, Karpov, Alexey

arXiv.org Artificial IntelligenceMar-29-2024

This paper presents the results of the SUN team for the Compound Expressions Recognition Challenge of the 6th ABAW Competition. We propose a novel audio-visual method for compound expression recognition. Our method relies on emotion recognition models that fuse modalities at the emotion probability level, while decisions regarding the prediction of compound expressions are based on predefined rules. Notably, our method does not use any training data specific to the target task. Thus, the problem is a zero-shot classification task. The method is evaluated in multi-corpus training and cross-corpus validation setups. Using our proposed method is achieved an F1-score value equals to 22.01% on the C-EXPR-DB test subset. Our findings from the challenge demonstrate that the proposed method can potentially form a basis for developing intelligent tools for annotating audio-visual data in the context of human's basic and compound emotions.

dimitrio kollia, emotion, recognition, (15 more...)

arXiv.org Artificial Intelligence

2403.12687

Country:

Europe > Russia > Northwestern Federal District > Leningrad Oblast > Saint Petersburg (0.05)
Asia > Russia (0.05)
Europe > Netherlands (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback