AITopics

ABSTRACT Acoustic scene classification (ASC) models on edge devices typically operate under fixed class assumptions, lacking the transferability needed for real-world applications that require adaptation to new or refined acoustic categories. We propose ContrastASC, which learns generalizable acoustic scene representations by structuring the embedding space to preserve semantic relationships between scenes, enabling adaptation to unseen categories without retraining. Our approach combines supervised contrastive fine-tuning of pre-trained models with contrastive representation distillation to transfer this structured knowledge to compact student models. Our evaluation shows that ContrastASC demonstrates improved few-shot adaptation to unseen categories while maintaining strong closed-set performance. Index T erms-- Acoustic Scene Classification, Contrastive Learning, Knowledge Distillation, Model Fine-tuning 1. INTRODUCTION Acoustic scene classification (ASC) has attracted significant research attention as a crucial capability for context-aware AI systems on edge devices [1, 2].

acoustic scene classification, artificial intelligence, machine learning, (14 more...)

2510.03728

Genre: Research Report (0.64)

Industry: Education (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Bridging the Gap Between Multimodal Foundation Models and World Models

He, Xuehai

Humans understand the world through the integration of multiple sensory modalities, enabling them to perceive, reason about, and imagine dynamic physical processes. Inspired by this capability, multimodal foundation models (MFMs) have emerged as powerful tools for multimodal understanding and generation. However, today's MFMs fall short of serving as effective world models. They lack the essential ability such as perform counterfactual reasoning, simulate dynamics, understand the spatiotemporal information, control generated visual outcomes, and perform multifaceted reasoning. We investigates what it takes to bridge the gap between multimodal foundation models and world models. We begin by improving the reasoning capabilities of MFMs through discriminative tasks and equipping MFMs with structured reasoning skills, such as causal inference, counterfactual thinking, and spatiotemporal reasoning, enabling them to go beyond surface correlations and understand deeper relationships within visual and textual data. Next, we explore generative capabilities of multimodal foundation models across both image and video modalities, introducing new frameworks for structured and controllable generation. Our approaches incorporate scene graphs, multimodal conditioning, and multimodal alignment strategies to guide the generation process, ensuring consistency with high-level semantics and fine-grained user intent. We further extend these techniques to controllable 4D generation, enabling interactive, editable, and morphable object synthesis over time and space.

artificial intelligence, machine learning, natural language, (21 more...)

2510.03727

Country:

North America > United States (0.92)
Asia (0.92)
Europe (0.67)

Genre:

Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Leisure & Entertainment (1.00)
Education (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(2 more...)

Girrbach, Leander, Alaniz, Stephan, Smith, Genevieve, Darrell, Trevor, Akata, Zeynep

Person-Centric Annotations of LAION-400M: Auditing Bias and Its Transfer to Models

Vision-language models trained on large-scale multimodal datasets show strong demographic biases, but the role of training data in producing these biases remains unclear. A major barrier has been the lack of demographic annotations in web-scale datasets such as LAION-400M. We address this gap by creating person-centric annotations for the full dataset, including over 276 million bounding boxes, perceived gender and race/ethnicity labels, and automatically generated captions. These annotations are produced through validated automatic labeling pipelines combining object detection, multimodal captioning, and finetuned classifiers. Using them, we uncover demographic imbalances and harmful associations, such as the disproportionate linking of men and individuals perceived as Black or Middle Eastern with crime-related and negative content. We also show that 60-70% of gender bias in CLIP and Stable Diffusion can be linearly explained by direct co-occurrences in the data. Our resources establish the first large-scale empirical link between dataset composition and downstream model bias.

large language model, machine learning, natural language, (19 more...)

2510.03721

Country:

South America (1.00)
North America > United States (1.00)
Europe (1.00)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)
Education (1.00)
(8 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.92)

Aina, Kehinde O., Ha, Sehoon

Deep Reinforcement Learning for Multi-Agent Coordination

We address the challenge of coordinating multiple robots in narrow and confined environments, where congestion and interference often hinder collective task performance. Drawing inspiration from insect colonies, which achieve robust coordination through stigmergy -- modifying and interpreting environmental traces -- we propose a Stigmergic Multi-Agent Deep Reinforcement Learning (S-MADRL) framework that leverages virtual pheromones to model local and social interactions, enabling decentralized emergent coordination without explicit communication. To overcome the convergence and scalability limitations of existing algorithms such as MADQN, MADDPG, and MAPPO, we leverage curriculum learning, which decomposes complex tasks into progressively harder sub-problems. Simulation results show that our framework achieves the most effective coordination of up to eight agents, where robots self-organize into asymmetric workload distributions that reduce congestion and modulate group performance. This emergent behavior, analogous to strategies observed in nature, demonstrates a scalable solution for decentralized multi-agent coordination in crowded environments with communication constraints.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2510.03592

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

A Qualitative Comparative Evaluation of Cognitive and Generative Theories

Rosenbloom, Paul S.

Evaluation is a critical activity associated with any theory. Yet this has proven to be a n exceptionally challenging activity for theories based on cognitive architectures. For an overlapping set of reasons, evaluation can also be challenging for theories based on generative neural architectures. T h is dual challenge is approached here by leveraging a broad perspective on theory evaluation to yield a wide - ranging, albeit qualitative, comparison of whole - mind - orie n ted cognitive and generative architectures an d the full systems th a t are based on these architectures .

artificial intelligence, machine learning, natural language, (17 more...)

2510.03453

Country:

North America > United States > California (0.46)
Europe > United Kingdom > England (0.28)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.68)
Education > Educational Setting (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Higham, Frederic, Yuan, Tommy

Can an AI-Powered Presentation Platform Based On The Game "Just a Minute" Be Used To Improve Students' Public Speaking Skills?

This study explores the effectiveness of applying AI and gamification into a presentation platform aimed at University students wanting to improve their public speaking skills in their native tongue. Specifically, a platform based on the radio show, Just a Minute (JAM), is explored. In this game, players are challenged to speak fluently on a topic for 60 seconds without repeating themselves, hesitating or deviating from the topic. JAM has proposed benefits such as allowing students to improve their spontaneous speaking skills and reduce their use of speech disfluencies ("um", "uh", etc.). Previous research has highlighted the difficulties students face when speaking publicly, the main one being anxiety. AI Powered Presentation Platforms (AI-PPPs), where students can speak with an immersive AI audience and receive real-time feedback, have been explored as a method to improve student's speaking skills and confidence. So far they have shown promising results which this study aims to build upon. A group of students from the University of York are enlisted to evaluate the effectiveness of the JAM platform. They are asked to fill in a questionnaire, play through the game twice and then complete a final questionnaire to discuss their experiences playing the game. Various statistics are gathered during their gameplay such as the number of points they gained and the number of rules they broke. The results showed that students found the game promising and believed that their speaking skills could improve if they played the game for longer. More work will need to be carried out to prove the effectiveness of the game beyond the short term.

large language model, machine learning, natural language, (22 more...)

2510.03379

Country: Europe (0.28)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Education > Educational Setting (1.00)
Leisure & Entertainment > Games > Computer Games (0.68)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
(2 more...)

Conditional Pseudo-Supervised Contrast for Data-Free Knowledge Distillation

Shao, Renrong, Zhang, Wei, wang, Jun

Data-free knowledge distillation (DFKD) is an effective manner to solve model compression and transmission restrictions while retaining privacy protection, which has attracted extensive attention in recent years. Currently, the majority of existing methods utilize a generator to synthesize images to support the distillation. Although the current methods have achieved great success, there are still many issues to be explored. Firstly, the outstanding performance of supervised learning in deep learning drives us to explore a pseudo-supervised paradigm on DFKD. Secondly, current synthesized methods cannot distinguish the distributions of different categories of samples, thus producing ambiguous samples that may lead to an incorrect evaluation by the teacher. Besides, current methods cannot optimize the category-wise diversity samples, which will hinder the student model learning from diverse samples and further achieving better performance. In this paper, to address the above limitations, we propose a novel learning paradigm, i.e., conditional pseudo-supervised contrast for data-free knowledge distillation (CPSC-DFKD). The primary innovations of CPSC-DFKD are: (1) introducing a conditional generative adversarial network to synthesize category-specific diverse images for pseudo-supervised learning, (2) improving the modules of the generator to distinguish the distributions of different categories, and (3) proposing pseudo-supervised contrastive learning based on teacher and student views to enhance diversity. Comprehensive experiments on three commonly-used datasets validate the performance lift of both the student and generator brought by CPSC-DFKD. The code is available at https://github.com/RoryShao/CPSC-DFKD.git Keywords: model compression, knowledge distillation, representation learning, contrastive learning, privacy protection1. Introduction With the development of artificial intelligence, the deep con-volutional neural networks (DCNNs) have been widely applied in various computer vision tasks and achieved remarkable success, such as image classification [1], object detection [2], and semantic segmentation [3]. Nevertheless, in practical applications, DCNNs suffer from some heavy issues. Firstly, DCNNs always require heavy computation and storage. For example, only to handle one image, a VGG network commonly requires more than 500MB of memory, which makes them hard to be deployed on resource-constrained embedded or edge devices such as mobile phones and autonomous cars.

artificial intelligence, generator, machine learning, (15 more...)

doi: 10.1016/j.patcog.2023.109781

2510.03375

Country: Asia (0.28)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Yaacoub, Antoun, Assaghir, Zainab, Da-Rugna, Jérôme

Lightweight Prompt Engineering for Cognitive Alignment in Educational AI: A OneClickQuiz Case Study

The rapid integration of Artificial Intelligence (AI) into educational technology promises to revolutionize content creation and assessment. However, the quality and pedagogical alignment of AI-generated content remain critical challenges. This paper investigates the impact of lightweight prompt engineering strategies on the cognitive alignment of AI-generated questions within OneClickQuiz, a Moodle plugin leveraging generative AI. We evaluate three prompt variants-a detailed baseline, a simpler version, and a persona-based approach-across Knowledge, Application, and Analysis levels of Bloom's Taxonomy. Utilizing an automated classification model (from prior work) and human review, our findings demonstrate that explicit, detailed prompts are crucial for precise cognitive alignment. While simpler and persona-based prompts yield clear and relevant questions, they frequently misalign with intended Bloom's levels, generating outputs that are either too complex or deviate from the desired cognitive objective. This study underscores the importance of strategic prompt engineering in fostering pedagogically sound AI-driven educational solutions and advises on optimizing AI for quality content generation in learning analytics and smart learning environments.

large language model, machine learning, natural language, (17 more...)

2510.03374

Country: Europe > Croatia (0.30)

Genre: Research Report > New Finding (0.87)

Industry: Education > Educational Technology (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.50)

Defining a Strategic Action Plan for AI in Higher Education

Avouris, Nikolaos

We start with reviewing normative actions of international organizations and concerns expressed about the current technical landscape. Then we proceed with proposing a framework that comprises five key dimensions relating to the main challenges relating to AI in higher education institutions, followed by five key strategic actions that the main stakeholders need to take in order to address the current developments . W e map these actions to the main stakeholders of higher education and propose a deployment plan . This defines a framework along the dimensions: C hallenges, Actions, Stakeholders, Deployment CASD . Examples of AI specific actions at the institutional and individu al course level are also provided and discussed.

artificial intelligence, machine learning, natural language, (16 more...)

2510.03343

Country: Europe (1.00)

Genre:

Overview (1.00)
Instructional Material > Course Syllabus & Notes (1.00)
Research Report (0.82)

Industry: Education > Educational Setting > Higher Education (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Applied AI (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

NS-Pep: De novo Peptide Design with Non-Standard Amino Acids

Guo, Tao, Yin, Junbo, Wang, Yu, Gao, Xin

Peptide drugs incorporating non-standard amino acids (NSAAs) offer improved binding affinity and improved pharmacological properties. However, existing peptide design methods are limited to standard amino acids, leaving NSAA-aware design largely unexplored. We introduce NS-Pep, a unified framework for co-designing peptide sequences and structures with NSAAs. The main challenge is that NSAAs are extremely underrepresented-even the most frequent one, SEP, accounts for less than 0.4% of residues-resulting in a severe long-tailed distribution. To improve generalization to rare amino acids, we propose Residue Frequency-Guided Modification (RFGM), which mitigates over-penalization through frequency-aware logit calibration, supported by both theoretical and empirical analysis. Furthermore, we identify that insufficient side-chain modeling limits geometric representation of NSAAs. To address this, we introduce Progressive Side-chain Perception (PSP) for coarse-to-fine torsion and location prediction, and Interaction-Aware Weighting (IAW) to emphasize pocket-proximal residues. Moreover, NS-Pep generalizes naturally to the peptide folding task with NSAAs, addressing a major limitation of current tools. Experiments show that NS-Pep improves sequence recovery rate and binding affinity by 6.23% and 5.12%, respectively, and outperforms AlphaFold3 by 17.76% in peptide folding success rate.

artificial intelligence, machine learning, peptide, (19 more...)

2510.03326

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Education > Health & Safety > School Nutrition (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)