Goto

Collaborating Authors

 Penang


QuanvNeXt: An end-to-end quanvolutional neural network for EEG-based detection of major depressive disorder

Orka, Nabil Anan, Haque, Ehtashamul, Jannat, Maftahul, Awal, Md Abdul, Moni, Mohammad Ali

arXiv.org Artificial Intelligence

This study presents QuanvNeXt, an end-to-end fully quanvolutional model for EEG-based depression diagnosis. QuanvNeXt incorporates a novel Cross Residual block, which reduces feature homogeneity and strengthens cross-feature relationships while retaining parameter efficiency. We evaluated QuanvNeXt on two open-source datasets, where it achieved an average accuracy of 93.1% and an average AUC-ROC of 97.2%, outperforming state-of-the-art baselines such as InceptionTime (91.7% accuracy, 95.9% AUC-ROC). An uncertainty analysis across Gaussian noise levels demonstrated well-calibrated predictions, with ECE scores remaining low (0.0436, Dataset 1) to moderate (0.1159, Dataset 2) even at the highest perturbation (ε = 0.1). Additionally, a post-hoc explainable AI analysis confirmed that QuanvNeXt effectively identifies and learns spectrotemporal patterns that distinguish between healthy controls and major depressive disorder. Overall, QuanvNeXt establishes an efficient and reliable approach for EEG-based depression diagnosis.


GuideNav: User-Informed Development of a Vision-Only Robotic Navigation Assistant For Blind Travelers

Hwang, Hochul, Yang, Soowan, Monon, Jahir Sadik, Giudice, Nicholas A, Lee, Sunghoon Ivan, Biswas, Joydeep, Kim, Donghyun

arXiv.org Artificial Intelligence

While commendable progress has been made in user-centric research on mobile assistive systems for blind and low-vision (BLV) individuals, references that directly inform robot navigation design remain rare. To bridge this gap, we conducted a comprehensive human study involving interviews with 26 guide dog handlers, four white cane users, nine guide dog trainers, and one O\&M trainer, along with 15+ hours of observing guide dog-assisted walking. After de-identification, we open-sourced the dataset to promote human-centered development and informed decision-making for assistive systems for BLV people. Building on insights from this formative study, we developed GuideNav, a vision-only, teach-and-repeat navigation system. Inspired by how guide dogs are trained and assist their handlers, GuideNav autonomously repeats a path demonstrated by a sighted person using a robot. Specifically, the system constructs a topological representation of the taught route, integrates visual place recognition with temporal filtering, and employs a relative pose estimator to compute navigation actions - all without relying on costly, heavy, power-hungry sensors such as LiDAR. In field tests, GuideNav consistently achieved kilometer-scale route following across five outdoor environments, maintaining reliability despite noticeable scene variations between teach and repeat runs. A user study with 3 guide dog handlers and 1 guide dog trainer further confirmed the system's feasibility, marking (to our knowledge) the first demonstration of a quadruped mobile system retrieving a path in a manner comparable to guide dogs.


PromptTailor: Multi-turn Intent-Aligned Prompt Synthesis for Lightweight LLMs

Xu, Yizhou, Davis, Janet

arXiv.org Artificial Intelligence

Lightweight language models remain attractive for on-device and privacy-sensitive applications, but their responses are highly sensitive to prompt quality. For open-ended generation, non-expert users often lack the knowledge or time to consistently craft high-quality prompts, leading them to rely on prompt optimization tools. However, a key challenge is ensuring the optimized prompts genuinely align with users' original intents and preferences. We introduce PromptTailor, a system for controllable prompt generation for open-ended text that improves model output quality by intent-aligned prompt synthesis. PromptTailor expands minimal user instructions into rich, domain-aware prompts while preserving the user's stated preferences. The system is a quantized Llama3-8B model fine-tuned with a lightweight LoRA adapter on 12,300 prompt-refinement dialogues spanning 41 everyday domains, distilled from three stronger LLMs. The adapter attaches to any Llama3-8B base, enabling edge deployment. In human and LLM-judge evaluations across multiple target models and optimization baselines, PromptTailor yields higher preference rates than chain-of-thought prompting and matches or surpasses state-of-the-art prompt optimization methods while requiring fewer model calls (e.g., 3 vs. 9). These results show that a compact student, guided by powerful teachers, can learn effective prompt-generation strategies that enhance response quality while maintaining alignment with user intent.


What is Implementation Science; and Why It Matters for Bridging the Artificial Intelligence Innovation-to-Application Gap in Medical Imaging

Fayaz-Bakhsh, Ahmad, Tania, Janice, Lutfi, Syaheerah Lebai, Jha, Abhinav K., Rahmim, Arman

arXiv.org Artificial Intelligence

The transformative potential of artificial intelligence (AI) in medical Imaging (MI) is well recognized. Yet despite promising reports in research settings, many AI tools fail to achieve clinical adoption in practice. In fact, more generally, there is a documented 17-year average delay between evidence generation and implementation of a technology. Implementation science (IS) may provide a practical, evidence-based framework to bridge the gap between AI development and real-world clinical imaging use, to shorten this lag through systematic frameworks, strategies, and hybrid research designs. We outline challenges specific to AI adoption in MI workflows, including infrastructural, educational, and cultural barriers. We highlight the complementary roles of effectiveness research and implementation research, emphasizing hybrid study designs and the role of integrated KT (iKT), stakeholder engagement, and equity-focused co-creation in designing sustainable and generalizable solutions. We discuss integration of Human-Computer Interaction (HCI) frameworks in MI towards usable AI. Adopting IS is not only a methodological advancement; it is a strategic imperative for accelerating translation of innovation into improved patient outcomes.



Governments are spending billions on their own 'sovereign' AI technologies – is it a big waste of money?

The Guardian

As part of a trend loosely called'sovereign AI', governments around the world are developing their own AI technologies As part of a trend loosely called'sovereign AI', governments around the world are developing their own AI technologies Governments are spending billions on their own'sovereign' AI technologies - is it a big waste of money? The Guardian's journalism is independent. We will earn a commission if you buy something through an affiliate link. In Malaysia, ILMUchat, built by a local construction conglomerate, boasts that it "knows which Georgetown you're referring to" - that is, the capital of Penang and not the private university in the US. Meanwhile, Switzerland's Apertus, unveiled in September, understands when to use the Swiss German "ss" and not the German-language character "ß".


Precise Information Control in Long-Form Text Generation

He, Jacqueline, Yen, Howard, Li, Margaret, Li, Shuyue Stella, Zeng, Zhiyuan, Shi, Weijia, Tsvetkov, Yulia, Chen, Danqi, Koh, Pang Wei, Zettlemoyer, Luke

arXiv.org Artificial Intelligence

A central challenge in language models (LMs) is faithfulness hallucination: the generation of information unsubstantiated by input context. To study this problem, we propose Precise Information Control (PIC), a new task formulation that requires models to generate long-form outputs grounded in a provided set of short self-contained statements, without adding any unsupported ones. PIC includes a full setting that tests a model's ability to include exactly all input claims, and a partial setting that requires the model to selectively incorporate only relevant claims. We present PIC-Bench, a benchmark of eight long-form generation tasks (e.g., summarization, biography generation) adapted to the PIC setting, where LMs are supplied with well-formed, verifiable input claims. Our evaluation of a range of open and proprietary LMs on PIC-Bench reveals that, surprisingly, state-of-the-art LMs still hallucinate against user-provided input in over 70% of generations. To alleviate this lack of faithfulness, we introduce a post-training framework that uses a weakly supervised preference data construction method to train an 8B PIC-LM with stronger PIC ability--improving from 69.1% to 91.0% F1 in the full PIC setting. When integrated into end-to-end factual generation pipelines, PIC-LM improves exact match recall by 17.1% on ambiguous QA with retrieval, and factual precision by 30.5% on a birthplace fact-checking task, underscoring the potential of precisely grounded generation.


PGF-Net: A Progressive Gated-Fusion Framework for Efficient Multimodal Sentiment Analysis

Wen, Bin, Tan, Tien-Ping

arXiv.org Artificial Intelligence

We introduce PGF-Net (Progressive Gated-Fusion Network), a novel deep learning framework designed for efficient and interpretable multimodal sentiment analysis. Our framework incorporates three primary innovations. Firstly, we propose a Progressive Intra-Layer Fusion paradigm, where a Cross-Attention mechanism empowers the textual representation to dynamically query and integrate non-linguistic features from audio and visual streams within the deep layers of a Transformer encoder. This enables a deeper, context-dependent fusion process. Secondly, the model incorporates an Adaptive Gated Arbitration mechanism, which acts as a dynamic controller to balance the original linguistic information against the newly fused multimodal context, ensuring stable and meaningful integration while preventing noise from overwhelming the signal. Lastly, a hybrid Parameter-Efficient Fine-Tuning (PEFT) strategy is employed, synergistically combining global adaptation via LoRA with local refinement through Post-Fusion Adapters. This significantly reduces trainable parameters, making the model lightweight and suitable for resource-limited scenarios. These innovations are integrated into a hierarchical encoder architecture, enabling PGF-Net to perform deep, dynamic, and interpretable multimodal sentiment analysis while maintaining exceptional parameter efficiency. Experimental results on MOSI dataset demonstrate that our proposed PGF-Net achieves state-of-the-art performance, with a Mean Absolute Error (MAE) of 0.691 and an F1-Score of 86.9%. Notably, our model achieves these results with only 3.09M trainable parameters, showcasing a superior balance between performance and computational efficiency.


EffiFusion-GAN: Efficient Fusion Generative Adversarial Network for Speech Enhancement

Wen, Bin, Tan, Tien-Ping

arXiv.org Artificial Intelligence

-- We introduce EffiFusion - GAN (Efficient Fusion Generative Adversarial Network), a novel deep learning model designed to enhance speech processing tasks by leveragin g advanced techniques. Our model incorporates three primary innovations. Firstly, we employ Depthwise Separable Convolutions within a Multi - Scale Convolutional Block to significantly reduce computational complexity while capturin g rich features at multiple scales. This approach enhances the model's ability to process diverse auditory inputs efficiently. S econdly, the model features an enhanced attention mechanism that includes dual Layer Normalization and optimized residual connections, improving stability and convergence during trainin g. Lastly, dynamic pruning is applied to the convolutional layers, whic h reduces the model size without compromising performance, making it ideal for deployment in resource - constrain ed environments.