Goto

Collaborating Authors

 Songkhla


KidSpeak: A General Multi-purpose LLM for Kids' Speech Recognition and Screening

Sharma, Rohan, Liu, Dancheng, Sun, Jingchen, Zhou, Shijie, Qin, Jiayu, Xiong, Jinjun, Chen, Changyou

arXiv.org Artificial Intelligence

With the rapid advancement of conversational and diffusion-based AI, there is a growing adoption of AI in educational services, ranging from grading and assessment tools to personalized learning systems that provide targeted support for students. However, this adaptability has yet to fully extend to the domain of children's speech, where existing models often fail due to their reliance on datasets designed for clear, articulate adult speech. Children, particularly those in early developmental stages or with speech and language pathologies, present unique challenges that current AI models and datasets are ill-equipped to handle. To address this, we introduce KidSpeak, a multi-task speech-enhanced Foundation Model capable of both generative and discriminative tasks specifically tailored to children's speech patterns. Our framework employs a two-stage training process that incorporates phonetic knowledge into the speech encoder, achieving an average accuracy of 87% across four separate tasks. Furthermore, recognizing the limitations of scalable human annotation and existing speech alignment tools, we propose the Flexible and Automatic Speech Aligner (F ASA) and leverage the method to construct high quality datasets for training and evaluation. This novel alignment tool significantly improves the quality of aligned children's speech from noisy data, enhancing data quality by 13.6 compared to human annotations, as demonstrated on the CHILDES dataset. To the best of our knowledge, KidSpeak and F ASA represent the first comprehensive solution designed for speech and language therapy in children, offering both a multi-purpose speech LLM and a robust alignment tool.


SEADialogues: A Multilingual Culturally Grounded Multi-turn Dialogue Dataset on Southeast Asian Languages

Kautsar, Muhammad Dehan Al, Candra, Aswin, Hakim, Muhammad Alif Al, Kahfi, Maxalmina Satria, Koto, Fajri, Aji, Alham Fikri, Limkonchotiwat, Peerat, Chuangsuwanich, Ekapol, Winata, Genta Indra

arXiv.org Artificial Intelligence

Although numerous datasets have been developed to support dialogue systems, most existing chit-chat datasets overlook the cultural nuances inherent in natural human conversations. To address this gap, we introduce SEADialogues, a culturally grounded dialogue dataset centered on Southeast Asia, a region with over 700 million people and immense cultural diversity. Our dataset features dialogues in eight languages from six Southeast Asian countries, many of which are low-resource despite having sizable speaker populations. To enhance cultural relevance and personalization, each dialogue includes persona attributes and two culturally grounded topics that reflect everyday life in the respective communities. Furthermore, we release a multi-turn dialogue dataset to advance research on culturally aware and human-centric large language models, including conversational dialogue agents.


Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact

Qureshi, Rizwan, Sapkota, Ranjan, Shah, Abbas, Muneer, Amgad, Zafar, Anas, Vayani, Ashmal, Shoman, Maged, Eldaly, Abdelrahman B. M., Zhang, Kai, Sadak, Ferhat, Raza, Shaina, Fan, Xinqi, Shwartz-Ziv, Ravid, Yan, Hong, Jain, Vinjia, Chadha, Aman, Karkee, Manoj, Wu, Jia, Mirjalili, Seyedali

arXiv.org Artificial Intelligence

Can machines truly think, reason and act in domains like humans? This enduring question continues to shape the pursuit of Artificial General Intelligence (AGI). Despite the growing capabilities of models such as GPT-4.5, DeepSeek, Claude 3.5 Sonnet, Phi-4, and Grok 3, which exhibit multimodal fluency and partial reasoning, these systems remain fundamentally limited by their reliance on token-level prediction and lack of grounded agency. This paper offers a cross-disciplinary synthesis of AGI development, spanning artificial intelligence, cognitive neuroscience, psychology, generative models, and agent-based systems. We analyze the architectural and cognitive foundations of general intelligence, highlighting the role of modular reasoning, persistent memory, and multi-agent coordination. In particular, we emphasize the rise of Agentic RAG frameworks that combine retrieval, planning, and dynamic tool use to enable more adaptive behavior. We discuss generalization strategies, including information compression, test-time adaptation, and training-free methods, as critical pathways toward flexible, domain-agnostic intelligence. Vision-Language Models (VLMs) are reexamined not just as perception modules but as evolving interfaces for embodied understanding and collaborative task completion. We also argue that true intelligence arises not from scale alone but from the integration of memory and reasoning: an orchestration of modular, interactive, and self-improving components where compression enables adaptive behavior. Drawing on advances in neurosymbolic systems, reinforcement learning, and cognitive scaffolding, we explore how recent architectures begin to bridge the gap between statistical learning and goal-directed cognition. Finally, we identify key scientific, technical, and ethical challenges on the path to AGI.


Personal Intelligence System UniLM: Hybrid On-Device Small Language Model and Server-Based Large Language Model for Malay Nusantara

Nazri, Azree, Agbolade, Olalekan, Aziz, Faisal

arXiv.org Artificial Intelligence

In contexts with limited computational and data resources, high-resource language models often prove inadequate, particularly when addressing the specific needs of Malay languages. This paper introduces a Personal Intelligence System designed to efficiently integrate both on-device and server-based models. The system incorporates SLiM-34M for on-device processing, optimized for low memory and power usage, and MANYAK-1.3B for server-based tasks, allowing for scalable, high-performance language processing. The models achieve significant results across various tasks, such as machine translation, question-answering, and translate IndoMMLU. Particularly noteworthy is SLiM-34M's ability to achieve a high improvement in accuracy compared to other LLMs while using 2 times fewer pre-training tokens. This work challenges the prevailing assumption that large-scale computational resources are necessary to build effective language models, contributing to the development of resource-efficient models for the Malay language with the unique orchestration between SLiM-34M and MANYAK-1.3B.


Thai Universal Dependency Treebank

Sriwirote, Panyut, Leong, Wei Qi, Polpanumas, Charin, Thanyawong, Santhawat, Tjhi, William Chandra, Aroonmanakun, Wirote, Rutherford, Attapol T.

arXiv.org Artificial Intelligence

Automatic dependency parsing of Thai sentences has been underexplored, as evidenced by the lack of large Thai dependency treebanks with complete dependency structures and the lack of a published systematic evaluation of state-of-the-art models, especially transformer-based parsers. In this work, we address these problems by introducing Thai Universal Dependency Treebank (TUD), a new largest Thai treebank consisting of 3,627 trees annotated in accordance with the Universal Dependencies (UD) framework. We then benchmark dependency parsing models that incorporate pretrained transformers as encoders and train them on Thai-PUD and our TUD. The evaluation results show that most of our models can outperform other models reported in previous papers and provide insight into the optimal choices of components to include in Thai dependency parsers. The new treebank and every model's full prediction generated in our experiment are made available on a GitHub repository for further study.


Know When To Stop: A Study of Semantic Drift in Text Generation

Spataru, Ava, Hambro, Eric, Voita, Elena, Cancedda, Nicola

arXiv.org Artificial Intelligence

In this work, we explicitly show that modern LLMs tend to generate correct facts first, then "drift away" and generate incorrect facts later: this was occasionally observed but never properly measured. We develop a semantic drift score that measures the degree of separation between correct and incorrect facts in generated texts and confirm our hypothesis when generating Wikipedia-style biographies. This correct-then-incorrect generation pattern suggests that factual accuracy can be improved by knowing when to stop generation. Therefore, we explore the trade-off between information quantity and factual accuracy for several early stopping methods and manage to improve factuality by a large margin. We further show that reranking with semantic similarity can further improve these results, both compared to the baseline and when combined with early stopping. Finally, we try calling external API to bring the model back to the right generation path, but do not get positive results. Overall, our methods generalize and can be applied to any long-form text generation to produce more reliable information, by balancing trade-offs between factual accuracy, information quantity and computational cost.


sEMG-Based Upper Limb Movement Classifier: Current Scenario and Upcoming Challenges

Cagliari Tosin, Maurício (a:1:{s:5:"en_US";s:41:"Universidade Federal do Rio Grande do Sul";}) | Machado, Juliano Costa | Balbinot, Alexandre

Journal of Artificial Intelligence Research

Despite achieving accuracies higher than 90% on recognizing upper-limb movements through sEMG (surface Electromyography) signal with the state of art classifiers in the laboratory environment, there are still issues to be addressed for a myo-controlled prosthesis achieve similar performance in real environment conditions. Thereby, the main goal of this review is to expose the latest researches in terms of strategies in each block of the system, giving a global view of the current state of academic research. A systematic review was conducted, and the retrieved papers were organized according to the system step related to the proposed method. Then, for each stage of the upper limb motion recognition system, the works were described and compared in terms of strategy, methodology and issue addressed. An additional section was destined for the description of works related to signal contamination that is often neglected in reviews focused on sEMG based motion classifiers. Therefore, this section is the main contribution of this paper. Deep learning methods are a current trend for classification stage, providing strategies based on time-series and transfer learning to address the issues related to limb position, temporal/inter-subject variation, and electrode displacement. Despite the promising strategies presented for contaminant detection, identification, and removal, there are still some factors to be considered, such as the occurrence of simultaneous contaminants.


ANet: Autoencoder-Based Local Field Potential Feature Extractor for Evaluating An Antidepressant Effect in Mice after Administering Kratom Leaf Extracts

Nukitram, Jakkrit, Chaisaen, Rattanaphon, Autthasan, Phairot, Sengnon, Narumon, Wungsintaweekul, Juraithip, Saengmolee, Wanumaidah, Cheaha, Dania, Kumarnsit, Ekkasit, Sudhawiyangkul, Thapanun, Wilaiprasitporn, Theerawit

arXiv.org Artificial Intelligence

Kratom (KT) typically exerts antidepressant (AD) effects. However, evaluating which form of KT extracts possesses AD properties similar to the standard AD fluoxetine (flu) remained challenging. Here, we adopted an autoencoder (AE)-based anomaly detector called ANet to measure the similarity of mice's local field potential (LFP) features that responded to KT leave extracts and AD flu. The features that responded to KT syrup had the highest similarity to those that responded to the AD flu at 85.62 $\pm$ 0.29%. This finding presents the higher feasibility of using KT syrup as an alternative substance for depressant therapy than KT alkaloids and KT aqueous, which are the other candidates in this study. Apart from the similarity measurement, we utilized ANet as a multi-task AE and evaluated the performance in discriminating multi-class LFP responses corresponding to the effect of different KT extracts and AD flu simultaneously. Furthermore, we visualized learned latent features among LFP responses qualitatively and quantitatively as t-SNE projection and maximum mean discrepancy distance, respectively. The classification results reported the accuracy and F1-score of 79.78 $\pm$ 0.39% and 79.53 $\pm$ 0.00%. In summary, the outcomes of this research might help therapeutic design devices for an alternative substance profile evaluation, such as Kratom-based form in real-world applications.


Phone users in Thailand's Muslim-majority south ordered to give authorities photos of themselves

The Japan Times

BANGKOK - An order for mobile phone users in Thailand's restive south to submit a photo of themselves for facial recognition purposes is causing uproar from opponents who see it as further curtailing the rights of the Muslim-majority population. But an army spokesman on Wednesday defended the move, saying the facial identification scheme is needed to root out insurgents deploying mobile phone-detonated home-made bombs. Thailand's three southernmost states -- Yala, Pattani and Narathiwat -- have since 2004 been rife with conflict between Malay-Muslim rebels and the Buddhist-majority Thai state, which annexed the region around a century ago. The tit-for-tat violence has claimed around 7,000 lives, mostly civilians of both faiths, and security forces have detained individuals suspected of being separatist rebels without warrants in the past. Now telecoms companies are requiring all users of the region's 1.5 million mobile numbers to submit a photo of themselves for facial recognition purposes following orders from the army -- a move that is drawing anger from rights groups as the deadline to register photos nears.


Vol 14, No 02 (2019). International Journal of Emerging Technologies in Learning (iJET)

#artificialintelligence

Hoy traemos a este espacio el último número, recién salido de la revista International Journal of Emerging Technologies in Learning (iJET) This interdisciplinary journal aims to focus on the exchange of relevant trends and research results as well as the presentation of practical experiences gained while developing and testing elements of technology enhanced learning. So it aims to bridge the gap between pure academic research journals and more practical publications. So it covers the full range from research, application development to experience reports and product descriptions. Readers don't have to pay any fee. Vol 14, No 02 (2019) Table of Contents Papers Multi-Dimensional Analysis to Predict Students' Grades in Higher Education Eslam Abou Gamie, Samir Abou El-Seoud, Mostafa Salama, Walid Hussein Implemented and Tested Conception Proposal of Adaptation Model for Adaptive Hypermedia Mehdi Tmimi, Mohamed Benslimane, Mohammed Berrada, Kamar Ouzzani Multidimensional Approach Based on Deep Learning to Improve the Prediction Performance of DNN Models Mohamed El Fouki, Noura Aknin, Kamal Eddine El Kadiri Visualization Teaching of Deformation Monitoring and Data Processing based on MATLAB 3D Course Teaching Based on Educational Game Development Theory – Case Study of Game Design Course The Development and Performance Evaluation of Digital Museums Toward Second Classroom of Primary and Secondary School – Taking Zhejiang Education Technology Digital Museum as An Example Ying Zheng, Yuhui Yang, Huifang Chai, Mo Chen, Jianping Zhang Students' Beliefs Regarding the Use of E-portfolio to Enhance Cognitive Skills in a Blended Learning Environment Prakob Koraneekij, Jintavee Khlaisang Learning Effect of Implicit Learning in Joining-in-type Robot-assisted Language Learning System AlBara Khalifa, Tsuneo Kato, Seiichi Yamamoto The Different Roles of Help-Seeking Personalities in Social Support Group Activity on E-Portfolio for Career Development Suthanit Wetcho, Jaitip Na-Songkhla Short Papers A Review of Digital Skills of Malaysian English Language Teachers Mohd Zulhilmi Che Had, Radzuwan Ab Rashid International Journal of Emerging Technologies in Learning.