Goto

Collaborating Authors

 Overview


Advanced Clustering Framework for Semiconductor Image Analytics Integrating Deep TDA with Self-Supervised and Transfer Learning Techniques

arXiv.org Artificial Intelligence

Semiconductor manufacturing generates vast amounts of image data, crucial for defect identification and yield optimization, yet often exceeds manual inspection capabilities. Traditional clustering techniques struggle with high-dimensional, unlabeled data, limiting their effectiveness in capturing nuanced patterns. This paper introduces an advanced clustering framework that integrates deep Topological Data Analysis (TDA) with self-supervised and transfer learning techniques, offering a novel approach to unsupervised image clustering. TDA captures intrinsic topological features, while self-supervised learning extracts meaningful representations from unlabeled data, reducing reliance on labeled datasets. Transfer learning enhances the framework's adaptability and scalability, allowing fine-tuning to new datasets without retraining from scratch. Validated on synthetic and open-source semiconductor image datasets, the framework successfully identifies clusters aligned with defect patterns and process variations. This study highlights the transformative potential of combining TDA, self-supervised learning, and transfer learning, providing a scalable solution for proactive process monitoring and quality control in semiconductor manufacturing and other domains with large-scale image datasets.


Sparsity is All You Need: Rethinking Biological Pathway-Informed Approaches in Deep Learning

arXiv.org Artificial Intelligence

Sparsity is All You Need: Rethinking Biological Pathway-Informed Approaches in Deep Learning Isabella Caranzano 1, Corrado Pancotti 1, Cesare Rollo 1, Flavio Sartori 1, Pietro Liรฒ 2, Piero Fariselli 1, Tiziana Sanavia 1 1 Computational Biomedicine Unit, Department of Medical Sciences, University of Torino, Torino, Italy 2 Department of Computer Science and Technology, University of Cambridge, Cambridge, UK Abstract Biologically-informed neural networks typically leverage pathway annotations to enhance performance in biomedical applications. We hypothesized that the benefits of pathway integration does not arise from its biological relevance, but rather from the sparsity it introduces. We conducted a comprehensive analysis of all relevant pathway-based neural network models for predictive tasks, critically evaluating each study's contributions. From this review, we curated a subset of methods for which the source code was publicly available. The comparison of the biologically informed state-of-the-art deep learning models and their randomized counterparts showed that models based on randomized information performed equally well as biologically informed ones across different metrics and datasets. Notably, in 3 out of the 15 analyzed models, the randomized versions even outperformed their biologically informed counterparts. Moreover, pathway-informed models did not show any clear advantage in interpretability, as randomized models were still able to identify relevant disease biomarkers despite lacking explicit pathway information. Our findings suggest that pathway annotations may be too noisy or inadequately explored by current methods. Therefore, we propose a methodology that can be applied to different domains and can serve as a robust benchmark for systematically comparing novel pathway-informed models against their randomized counterparts. This approach enables researchers to rigorously determine whether observed performance improvements can be attributed to biological insights. Background & Summary When dealing with deep learning models, many functions that are efficiently computable through a machine learning approach exhibit what is called "compositional sparsity", meaning that they can be decomposed into a few simpler functions, each depending on only a arXiv:2505.04300v1 Deep networks, such as Convolutional Neural Networks (CNNs) and Transformers, align with the compositional structure of many target functions, leading to better generalization since they approximate such functions efficiently without falling victim to the "curse of dimensionality", i.e. the exponential growth of computational complexity with input dimension [37, 12, 31, 13, 32]. This compositional sparsity can be further enhanced by introducing prior constraints on features, such as grouping features into concepts or modelling interactions among them.


Facilitating Trustworthy Human-Agent Collaboration in LLM-based Multi-Agent System oriented Software Engineering

arXiv.org Artificial Intelligence

Multi-agent autonomous systems (MAS) are better at addressing challenges that spans across multiple domains than singular autonomous agents. This holds true within the field of software engineering (SE) as well. The state-of-the-art research on MAS within SE focuses on integrating LLMs at the core of autonomous agents to create LLM-based multi-agent autonomous (LMA) systems. However, the introduction of LMA systems into SE brings a plethora of challenges. One of the major challenges is the strategic allocation of tasks between humans and the LMA system in a trustworthy manner. To address this challenge, a RACI-based framework is proposed in this work in progress article, along with implementation guidelines and an example implementation of the framework. The proposed framework can facilitate efficient collaboration, ensure accountability, and mitigate potential risks associated with LLM-driven automation while aligning with the Trustworthy AI guidelines. The future steps for this work delineating the planned empirical validation method are also presented.


Deepfakes on Demand: the rise of accessible non-consensual deepfake image generators

arXiv.org Artificial Intelligence

Advances in multimodal machine learning have made text-to-image (T2I) models increasingly accessible and popular. However, T2I models introduce risks such as the generation of non-consensual depictions of identifiable individuals, otherwise known as deepfakes. This paper presents an empirical study exploring the accessibility of deepfake model variants online. Through a metadata analysis of thousands of publicly downloadable model variants on two popular repositories, Hugging Face and Civitai, we demonstrate a huge rise in easily accessible deepfake models. Almost 35,000 examples of publicly downloadable deepfake model variants are identified, primarily hosted on Civitai. These deepfake models have been downloaded almost 15 million times since November 2022, with the models targeting a range of individuals from global celebrities to Instagram users with under 10,000 followers. Both Stable Diffusion and Flux models are used for the creation of deepfake models, with 96% of these targeting women and many signalling intent to generate non-consensual intimate imagery (NCII). Deepfake model variants are often created via the parameter-efficient fine-tuning technique known as low rank adaptation (LoRA), requiring as few as 20 images, 24GB VRAM, and 15 minutes of time, making this process widely accessible via consumer-grade computers. Despite these models violating the Terms of Service of hosting platforms, and regulation seeking to prevent dissemination, these results emphasise the pressing need for greater action to be taken against the creation of deepfakes and NCII.


VideoLLM Benchmarks and Evaluation: A Survey

arXiv.org Artificial Intelligence

The rapid development of Large Language Models (LLMs) has catalyzed significant advancements in video understanding technologies. This survey provides a comprehensive analysis of benchmarks and evaluation methodologies specifically designed or used for Video Large Language Models (VideoLLMs). We examine the current landscape of video understanding benchmarks, discussing their characteristics, evaluation protocols, and limitations. The paper analyzes various evaluation methodologies, including closed-set, open-set, and specialized evaluations for temporal and spatiotemporal understanding tasks. We highlight the performance trends of state-of-the-art VideoLLMs across these benchmarks and identify key challenges in current evaluation frameworks. Additionally, we propose future research directions to enhance benchmark design, evaluation metrics, and protocols, including the need for more diverse, multimodal, and interpretability-focused benchmarks. This survey aims to equip researchers with a structured understanding of how to effectively evaluate VideoLLMs and identify promising avenues for advancing the field of video understanding with large language models.


Sentiment-Aware Recommendation Systems in E-Commerce: A Review from a Natural Language Processing Perspective

arXiv.org Artificial Intelligence

E-commerce platforms generate vast volumes of user feedback, such as star ratings, written reviews, and comments. However, most recommendation engines rely primarily on numerical scores, often overlooking the nuanced opinions embedded in free text. This paper comprehensively reviews sentiment-aware recommendation systems from a natural language processing perspective, covering advancements from 2023 to early 2025. It highlights the benefits of integrating sentiment analysis into e-commerce recommenders to enhance prediction accuracy and explainability through detailed opinion extraction. Our survey categorizes recent work into four main approaches: deep learning classifiers that combine sentiment embeddings with user item interactions, transformer based methods for nuanced feature extraction, graph neural networks that propagate sentiment signals, and conversational recommenders that adapt in real time to user feedback. We summarize model architectures and demonstrate how sentiment flows through recommendation pipelines, impacting dialogue-based suggestions. Key challenges include handling noisy or sarcastic text, dynamic user preferences, and bias mitigation. Finally, we outline research gaps and provide a roadmap for developing smarter, fairer, and more user-centric recommendation tools.


Towards Cognitive Collaborative Robots: Semantic-Level Integration and Explainable Control for Human-Centric Cooperation

arXiv.org Artificial Intelligence

This is a preprint of a review article that has not yet undergone peer review. The content is intended for early dissemination and academic discussion. The final version may differ upon formal publication. As the Fourth Industrial Revolution reshapes industrial paradigms, human-robot collaboration (HRC) has transitioned from a desirable capability to an operational necessity. In response, collaborative robots (Cobots) are evolving beyond repetitive tasks toward adaptive, semantically informed interaction with humans and environments. This paper surveys five foundational pillars enabling this transformation: semantic-level perception, cognitive action planning, explainable learning and control, safety-aware motion design, and multimodal human intention recognition. We examine the role of semantic mapping in transforming spatial data into meaningful context, and explore cognitive planning frameworks that leverage this context for goal-driven decision-making. Additionally, we analyze explainable reinforcement learning methods, including policy distillation and attention mechanisms, which enhance interpretability and trust. Safety is addressed through force-adaptive control and risk-aware trajectory planning, while seamless human interaction is supported via gaze and gesture-based intent recognition. Despite these advancements, challenges such as perception-action disjunction, real-time explainability limitations, and incomplete human trust persist. To address these, we propose a unified Cognitive Synergy Architecture, integrating all modules into a cohesive framework for truly human-centric cobot collaboration.


Soft yet Effective Robots via Holistic Co-Design

arXiv.org Artificial Intelligence

Soft robots promise inherent safety via their material compliance for seamless interactions with humans or delicate environments. Yet, their development is challenging because it requires integrating materials, geometry, actuation, and autonomy into complex mechatronic systems. Despite progress, the field struggles to balance task-specific performance with broader factors like durability and manufacturability - a difficulty that we find is compounded by traditional sequential design processes with their lack of feedback loops. In this perspective, we review emerging co-design approaches that simultaneously optimize the body and brain, enabling the discovery of unconventional designs highly tailored to the given tasks. We then identify three key shortcomings that limit the broader adoption of such co-design methods within the soft robotics domain. First, many rely on simulation-based evaluations focusing on a single metric, while real-world designs must satisfy diverse criteria. Second, current methods emphasize computational modeling without ensuring feasible realization, risking sim-to-real performance gaps. Third, high computational demands limit the exploration of the complete design space. Finally, we propose a holistic co-design framework that addresses these challenges by incorporating a broader range of design values, integrating real-world prototyping to refine evaluations, and boosting efficiency through surrogate metrics and model-based control strategies. This holistic framework, by simultaneously optimizing functionality, durability, and manufacturability, has the potential to enhance reliability and foster broader acceptance of soft robotics, transforming human-robot interactions.


Developing A Framework to Support Human Evaluation of Bias in Generated Free Response Text

arXiv.org Artificial Intelligence

LLM evaluation is challenging even the case of base models. In real world deployments, evaluation is further complicated by th e interplay of task specific prompts and experiential context. A t scale, bias evaluation is often based on short context, fixed choicebench-marks that can be rapidly evaluated, however, these can lose validity when the LLMs' deployed context differs. Large scale h u-man evaluation is often seen as too intractable and costly. H ere we present our journey towards developing a semi-automatedbias evaluation framework for free text responses that has human insights at its core. We discuss how we developed an operational definition of bias that helped us automate our pipeline and a methodology for classifying bias beyond multiple choice. We additionally comment on how human evaluation helped us uncover problematic templates in a bias benchmark.


Artificial Behavior Intelligence: Technology, Challenges, and Future Directions

arXiv.org Artificial Intelligence

--Understanding and predicting human behavior has emerged as a core capability in various AI application domains such as autonomous driving, smart healthcare, surveillance systems, and social robotics. This paper defines the technical framework of Artificial Behavior Intelligence (ABI), which comprehensively analyzes and interprets human posture, facial expressions, emotions, behavioral sequences, and contextual cues. It details the essential components of ABI, including pose estimation, face and emotion recognition, sequential behavior analysis, and context-aware modeling. Furthermore, we highlight the transformative potential of recent advances in large-scale pretrained models, such as large language models (LLMs), vision foundation models, and multimodal integration models, in significantly improving the accuracy and interpretability of behavior recognition. Our research team has a strong interest in the ABI domain and is actively conducting research, particularly focusing on the development of intelligent lightweight models capable of efficiently inferring complex human behaviors. This paper identifies several technical challenges that must be addressed to deploy ABI in real-world applications including learning behavioral intelligence from limited data, quantifying uncertainty in complex behavior prediction, and optimizing model structures for low-power, real-time inference. T o tackle these challenges, our team is exploring various optimization strategies including lightweight transformers, graph-based recognition architectures, energy-aware loss functions, and multimodal knowledge distillation, while validating their applicability in real-time environments. The philosopher Aristotle once described human beings as "social animals." This statement implies that humans do not exist as isolated entities, but rather live in constant interaction and communication with others. Humans intuitively perceive others' emotions, states, and intentions through their tone of voice, facial expressions, gestures, and behavioral patterns. These abilities are fundamental to mutual understanding and empathetic social interaction.