Goto

Collaborating Authors

 Oceania


Position: Intelligent Science Laboratory Requires the Integration of Cognitive and Embodied AI

arXiv.org Artificial Intelligence

Scientific discovery has long been constrained by human limitations in expertise, physical capability, and sleep cycles. The recent rise of AI scientists and automated laboratories has accelerated both the cognitive and operational aspects of research. However, key limitations persist: AI systems are often confined to virtual environments, while automated laboratories lack the flexibility and autonomy to adaptively test new hypotheses in the physical world. Recent advances in embodied AI, such as generalist robot foundation models, diffusion-based action policies, fine-grained manipulation learning, and sim-to-real transfer, highlight the promise of integrating cognitive and embodied intelligence. This convergence opens the door to closed-loop systems that support iterative, autonomous experimentation and the possibility of serendipitous discovery. In this position paper, we propose the paradigm of Intelligent Science Laboratories (ISLs): a multi-layered, closed-loop framework that deeply integrates cognitive and embodied intelligence. ISLs unify foundation models for scientific reasoning, agent-based workflow orchestration, and embodied agents for robust physical experimentation. We argue that such systems are essential for overcoming the current limitations of scientific discovery and for realizing the full transformative potential of AI-driven science.


Breaking the Transcription Bottleneck: Fine-tuning ASR Models for Extremely Low-Resource Fieldwork Languages

arXiv.org Artificial Intelligence

Automatic Speech Recognition (ASR) has reached impressive accuracy for high-resource languages, yet its utility in linguistic fieldwork remains limited. Recordings collected in fieldwork contexts present unique challenges, including spontaneous speech, environmental noise, and severely constrained datasets from under-documented languages. In this paper, we benchmark the performance of two fine-tuned multilingual ASR models, MMS and XLS-R, on five typologically diverse low-resource languages with control of training data duration. Our findings show that MMS is best suited when extremely small amounts of training data are available, whereas XLS-R shows parity performance once training data exceed one hour. We provide linguistically grounded analysis for further provide insights towards practical guidelines for field linguists, highlighting reproducible ASR adaptation approaches to mitigate the transcription bottleneck in language documentation.


ConCM: Consistency-Driven Calibration and Matching for Few-Shot Class-Incremental Learning

arXiv.org Artificial Intelligence

Few-Shot Class-Incremental Learning (FSCIL) requires models to adapt to novel classes with limited supervision while preserving learned knowledge. Existing prospective learning-based space construction methods reserve space to accommodate novel classes. However, prototype deviation and structure fixity limit the expressiveness of the embedding space. In contrast to fixed space reservation, we explore the optimization of feature-structure dual consistency and propose a Consistency-driven Calibration and Matching Framework (ConCM) that systematically mitigate the knowledge conflict inherent in FSCIL. Specifically, inspired by hippocampal associative memory, we design a memory-aware prototype calibration that extracts generalized semantic attributes from base classes and reintegrates them into novel classes to enhance the conceptual center consistency of features. Further, we propose dynamic structure matching, which adaptively aligns the calibrated features to a session-specific optimal manifold space, ensuring cross-session structure consistency. Theoretical analysis shows that our method satisfies both geometric optimality and maximum matching, thereby overcoming the need for class-number priors. On large-scale FSCIL benchmarks including mini-ImageNet and CUB200, ConCM achieves state-of-the-art performance, surpassing current optimal method by 3.20% and 3.68% in harmonic accuracy of incremental sessions.


Geometric-Aware Variational Inference: Robust and Adaptive Regularization with Directional Weight Uncertainty

arXiv.org Machine Learning

Deep neural networks require principled uncertainty quantification, yet existing variational inference methods often employ isotropic Gaussian approximations in weight space that poorly match the network's inherent geometry. We address this mismatch by introducing Concentration-Adapted Perturbations (CAP), a variational framework that models weight uncertainties directly on the unit hypersphere using von Mises-Fisher distributions. Building on recent work in radial-directional posterior decompositions and spherical weight constraints, CAP provides the first complete theoretical framework connecting directional statistics to practical noise regularization in neural networks. Our key contribution is an analytical derivation linking vMF concentration parameters to activation noise variance, enabling each layer to learn its optimal uncertainty level through a novel closed-form KL divergence regularizer. In experiments on CIFAR-10, CAP significantly improves model calibration - reducing Expected Calibration Error by 5.6x - while providing interpretable layer-wise uncertainty profiles. CAP requires minimal computational overhead and integrates seamlessly into standard architectures, offering a theoretically grounded yet practical approach to uncertainty quantification in deep learning.


Second Order State Hallucinations for Adversarial Attack Mitigation in Formation Control of Multi-Agent Systems

arXiv.org Artificial Intelligence

The increasing deployment of multi-agent systems (MAS) in critical infrastructures such as autonomous transportation, disaster relief, and smart cities demands robust formation control mechanisms resilient to adversarial attacks. Traditional consensus-based controllers, while effective under nominal conditions, are highly vulnerable to data manipulation, sensor spoofing, and communication failures. To address this challenge, we propose Second-Order State Hallucination (SOSH), a novel framework that detects compromised agents through distributed residual monitoring and maintains formation stability by replacing attacked states with predictive second-order approximations. Unlike existing mitigation strategies that require significant restructuring or induce long transients, SOSH offers a lightweight, decentralized correction mechanism based on second-order Taylor expansions, enabling rapid and scalable resilience. We establish rigorous Lyapunov-based stability guarantees, proving that formation errors remain exponentially bounded even under persistent attacks, provided the hallucination parameters satisfy explicit conditions. Comprehensive Monte Carlo experiments on a 5-agent complete graph formation demonstrate that SOSH outperforms established robust control schemes, including W-MSR and Huber-based consensus filters, achieving faster convergence rates, lower steady-state error, and superior transient recovery. Our results confirm that SOSH combines theoretical robustness with practical deployability, offering a promising direction for securing MAS formations against sophisticated adversarial threats.


Answer-Centric or Reasoning-Driven? Uncovering the Latent Memory Anchor in LLMs

arXiv.org Artificial Intelligence

While Large Language Models (LLMs) demonstrate impressive reasoning capabilities, growing evidence suggests much of their success stems from memorized answer-reasoning patterns rather than genuine inference. In this work, we investigate a central question: are LLMs primarily anchored to final answers or to the textual pattern of reasoning chains? We propose a five-level answer-visibility prompt framework that systematically manipulates answer cues and probes model behavior through indirect, behavioral analysis. Experiments across state-of-the-art LLMs reveal a strong and consistent reliance on explicit answers. The performance drops by 26.90\% when answer cues are masked, even with complete reasoning chains. These findings suggest that much of the reasoning exhibited by LLMs may reflect post-hoc rationalization rather than true inference, calling into question their inferential depth. Our study uncovers the answer-anchoring phenomenon with rigorous empirical validation and underscores the need for a more nuanced understanding of what constitutes reasoning in LLMs.


Two Sonification Methods for the MindCube

arXiv.org Artificial Intelligence

In this work, we explore the musical interface potential of the MindCube, an interactive device designed to study emotions. Embedding diverse sensors and input devices, this interface resembles a fidget cube toy commonly used to help users relieve their stress and anxiety. As such, it is a particularly well-suited controller for musical systems that aim to help with emotion regulation. In this regard, we present two different mappings for the MindCube, with and without AI. With our generative AI mapping, we propose a way to infuse meaning within a latent space and techniques to navigate through it with an external controller. We discuss our results and propose directions for future work.


When can isotropy help adapt LLMs' next word prediction to numerical domains?

arXiv.org Artificial Intelligence

Vector representations of contextual embeddings learned by pre-trained large language models (LLMs) are effective in various downstream tasks in numerical domains such as time series forecasting. Despite their significant benefits, the tendency of LLMs to hallucinate in such domains can have severe consequences in applications such as energy, nature, finance, healthcare, retail and transportation, among others. To guarantee prediction reliability and accuracy in numerical domains, it is necessary to open the black box behind the LLM and provide performance guarantees through explanation. However, there is little theoretical understanding of when pre-trained language models help solve numerical downstream tasks. This paper seeks to bridge this gap by understanding when the next-word prediction capability of LLMs can be adapted to numerical domains through a novel analysis based on the concept of isotropy in the contextual embedding space. Specifically, a log-linear model for LLMs is considered in which numerical data can be predicted from its context through a network with softmax in the output layer of LLMs (i.e., language model head in self-attention). For this model, it is demonstrated that, in order to achieve state-of-the-art performance in numerical domains, the hidden representations of the LLM embeddings must possess a structure that accounts for the shift-invariance of the softmax function. By formulating a gradient structure of self-attention in pre-trained models, it is shown how the isotropic property of LLM embeddings in contextual embedding space preserves the underlying structure of representations, thereby resolving the shift-invariance problem and providing a performance guarantee. Experiments show that different characteristics of numerical data and model architectures have different impacts on isotropy, and this variability directly affects the performances.


Phase retrieval with rank $d$ measurements -- \emph{descending} algorithms phase transitions

arXiv.org Machine Learning

Companion paper [118] developed a powerful \emph{Random duality theory} (RDT) based analytical program to statistically characterize performance of \emph{descending} phase retrieval algorithms (dPR) (these include all variants of gradient descents and among them widely popular Wirtinger flows). We here generalize the program and show how it can be utilized to handle rank $d$ positive definite phase retrieval (PR) measurements (with special cases $d=1$ and $d=2$ serving as emulations of the real and complex phase retrievals, respectively). In particular, we observe that the minimal sample complexity ratio (number of measurements scaled by the dimension of the unknown signal) which ensures dPR's success exhibits a phase transition (PT) phenomenon. For both plain and lifted RDT we determine phase transitions locations. To complement theoretical results we implement a log barrier gradient descent variant and observe that, even in small dimensional scenarios (with problem sizes on the order of 100), the simulated phase transitions are in an excellent agreement with the theoretical predictions.


Imputation of Longitudinal Data Using GANs: Challenges and Implications for Classification

arXiv.org Machine Learning

Longitudinal data is commonly utilised across various domains, such as health, biomedical, education and survey studies. This ubiquity has led to a rise in statistical, machine and deep learning-based methods for Longitudinal Data Classification (LDC). However, the intricate nature of the data, characterised by its multi-dimensionality, causes instance-level heterogeneity and temporal correlations that add to the complexity of longitudinal data analysis. Additionally, LDC accuracy is often hampered by the pervasiveness of missing values in longitudinal data. Despite ongoing research that draw on the generative power and utility of Generative Adversarial Networks (GANs) to address the missing data problem, critical considerations include statistical assumptions surrounding longitudinal data and missingness within it, as well as other data-level challenges like class imbalance and mixed data types that impact longitudinal data imputation (LDI) and the subsequent LDC process in GANs. This paper provides a comprehensive overview of how GANs have been applied in LDI, with a focus whether GANS have adequately addressed fundamental assumptions about the data from a LDC perspective. We propose a categorisation of main approaches to GAN-based LDI, highlight strengths and limitations of methods, identify key research trends, and provide promising future directions. Our findings indicate that while GANs show great potential for LDI to improve usability and quality of longitudinal data for tasks like LDC, there is need for more versatile approaches that can handle the wider spectrum of challenges presented by longitudinal data with missing values. By synthesising current knowledge and identifying critical research gaps, this survey aims to guide future research efforts in developing more effective GAN-based solutions to address LDC challenges.