AITopics | Wang, Bin

Collaborating Authors

Wang, Bin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ReachAgent: Enhancing Mobile Agent via Page Reaching and Operation

Wu, Qinzhuo, Liu, Wei, Luan, Jian, Wang, Bin

arXiv.org Artificial IntelligenceFeb-5-2025

Recently, mobile AI agents have gained increasing attention. Given a task, mobile AI agents can interact with mobile devices in multiple steps and finally form a GUI flow that solves the task. However, existing agents tend to focus on most task-relevant elements at each step, leading to local optimal solutions and ignoring the overall GUI flow. To address this issue, we constructed a training dataset called MobileReach, which breaks the task into page reaching and operation subtasks. Furthermore, we propose ReachAgent, a two-stage framework that focuses on improving its task-completion abilities. It utilizes the page reaching and page operation subtasks, along with reward-based preference GUI flows, to further enhance the agent. Experimental results show that ReachAgent significantly improves the IoU Acc and Text Acc by 7.12% and 7.69% on the step-level and 4.72% and 4.63% on the task-level compared to the SOTA agent. Our data and code will be released upon acceptance.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.02955

Genre:

Workflow (0.66)
Research Report > New Finding (0.34)

Industry: Information Technology (0.93)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

A Genetic Algorithm-Based Approach for Automated Optimization of Kolmogorov-Arnold Networks in Classification Tasks

Long, Quan, Wang, Bin, Xue, Bing, Zhang, Mengjie

arXiv.org Artificial IntelligenceJan-28-2025

To address the issue of interpretability in multilayer perceptrons (MLPs), Kolmogorov-Arnold Networks (KANs) are introduced in 2024. However, optimizing KAN structures is labor-intensive, typically requiring manual intervention and parameter tuning. This paper proposes GA-KAN, a genetic algorithm-based approach that automates the optimization of KANs, requiring no human intervention in the design process. To the best of our knowledge, this is the first time that evolutionary computation is explored to optimize KANs automatically. Furthermore, inspired by the use of sparse connectivity in MLPs in effectively reducing the number of parameters, GA-KAN further explores sparse connectivity to tackle the challenge of extensive parameter spaces in KANs. GA-KAN is validated on two toy datasets, achieving optimal results without the manual tuning required by the original KAN. Additionally, GA-KAN demonstrates superior performance across five classification datasets, outperforming traditional methods on all datasets and providing interpretable symbolic formulae for the Wine and Iris datasets, thereby enhancing model transparency. Furthermore, GA-KAN significantly reduces the number of parameters over the standard KAN across all the five datasets. The core contributions of GA-KAN include automated optimization, a new encoding strategy, and a new decoding process, which together improve the accuracy and interpretability, and reduce the number of parameters.

artificial intelligence, evolutionary algorithm, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2501.17411

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

MERaLiON-TextLLM: Cross-Lingual Understanding of Large Language Models in Chinese, Indonesian, Malay, and Singlish

Huang, Xin, Vangani, Tarun Kumar, Pham, Minh Duc, Zou, Xunlong, Wang, Bin, Liu, Zhengyuan, Aw, Ai Ti

arXiv.org Artificial IntelligenceJan-16-2025

Multilingual large language models (MLLMs) have shown impressive capabilities across a variety of languages. However, efficacy can differ greatly between different language families, especially for those with limited linguistic resources. This report presents MERaLiON-TextLLM, a series of open-source language models specifically tailored to improve understanding and generation in Chinese, Indonesian, Malay, and Singlish. The initial released model is built on Llama-3-8B-Base and refined through a meticulously crafted process of continued pre-training and weight merging. Our approach achieves performance improvements across benchmarks in these languages, exceeding the capabilities of the official Llama-3 models. We provide the model checkpoints as a resource to support further research and development in cross-lingual language understanding.

large language model, meralion-textllm, natural language, (3 more...)

arXiv.org Artificial Intelligence

2501.08335

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Consistency of Responses and Continuations Generated by Large Language Models on Social Media

Fan, Wenlu, Zhu, Yuqi, Wang, Chenyang, Wang, Bin, Xu, Wentao

arXiv.org Artificial IntelligenceJan-15-2025

Large Language Models (LLMs) demonstrate remarkable capabilities in text generation, yet their emotional consistency and semantic coherence in social media contexts remain insufficiently understood. This study investigates how LLMs handle emotional content and maintain semantic relationships through continuation and response tasks using two open-source models: Gemma and Llama. By analyzing climate change discussions from Twitter and Reddit, we examine emotional transitions, intensity patterns, and semantic similarity between human-authored and LLM-generated content. Our findings reveal that while both models maintain high semantic coherence, they exhibit distinct emotional patterns: Gemma shows a tendency toward negative emotion amplification, particularly anger, while maintaining certain positive emotions like optimism. Llama demonstrates superior emotional preservation across a broader spectrum of affects. Both models systematically generate responses with attenuated emotional intensity compared to human-authored content and show a bias toward positive emotions in response tasks. Additionally, both models maintain strong semantic similarity with original texts, though performance varies between continuation and response tasks. These findings provide insights into LLMs' emotional and semantic processing capabilities, with implications for their deployment in social media contexts and human-AI interaction design.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2501.08102

Country: Asia > China (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Media (0.69)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.35)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Advancing Singlish Understanding: Bridging the Gap with Datasets and Multimodal Models

Wang, Bin, Zou, Xunlong, Sun, Shuo, Zhang, Wenyu, He, Yingxu, Liu, Zhuohan, Wei, Chengwei, Chen, Nancy F., Aw, AiTi

arXiv.org Artificial IntelligenceJan-10-2025

Existing Singlish spoken corpora have primarily focused on linguistic analysis and speech recognition Speech technologies have evolved over decades, tasks (Deterding and Low, 2001; Chen et al., progressing from modularized solutions for speech 2010; Lyu et al., 2010; Tan, 2019). Given the relatively recognition (Povey et al., 2011; Radford et al., small population of Singlish speakers, estimated 2023), speaker identification (Togneri and Pullella, at just a few million, resources for Singlish 2011), and gender recognition (Hechmi et al., speech corpora are significantly more limited compared 2021) with modularized toolkits like Kaldi (Povey to major languages like English, Chinese, et al., 2011) and ESPnet (Watanabe et al., 2018) French, and Spanish. Singapore's government to advanced solutions integrating large language agency, IMDA, has open-sourced the largest available models for multimodal understanding in an allencompassing, Singlish corpus, known as the National Speech omni-style approach (Team et al., Corpus (Koh et al., 2019).

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2501.01034

Country: Asia > Singapore (0.58)

Genre: Research Report (1.00)

Industry: Government (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training

Xia, Renqiu, Li, Mingsheng, Ye, Hancheng, Wu, Wenjie, Zhou, Hongbin, Yuan, Jiakang, Peng, Tianshuo, Cai, Xinyu, Yan, Xiangchao, Wang, Bin, He, Conghui, Shi, Botian, Chen, Tao, Yan, Junchi, Zhang, Bo

arXiv.org Artificial IntelligenceJan-10-2025

Despite their proficiency in general tasks, Multi-modal Large Language Models (MLLMs) struggle with automatic Geometry Problem Solving (GPS), which demands understanding diagrams, interpreting symbols, and performing complex reasoning. This limitation arises from their pre-training on natural images and texts, along with the lack of automated verification in the problem-solving process. Besides, current geometric specialists are limited by their task-specific designs, making them less effective for broader geometric problems. To this end, we present GeoX, a multi-modal large model focusing on geometric understanding and reasoning tasks. Given the significant differences between geometric diagram-symbol and natural image-text, we introduce unimodal pre-training to develop a diagram encoder and symbol decoder, enhancing the understanding of geometric images and corpora. Furthermore, we introduce geometry-language alignment, an effective pre-training paradigm that bridges the modality gap between unimodal geometric experts. We propose a Generator-And-Sampler Transformer (GS-Former) to generate discriminative queries and eliminate uninformative representations from unevenly distributed geometric signals. Finally, GeoX benefits from visual instruction tuning, empowering it to take geometric images and questions as input and generate verifiable solutions. Experiments show that GeoX outperforms both generalists and geometric specialists on publicly recognized benchmarks, such as GeoQA, UniGeo, Geometry3K, and PGPS9k.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2412.11863

Country: Asia > China (0.14)

Genre: Research Report (0.64)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Learn A Flexible Exploration Model for Parameterized Action Markov Decision Processes

Wang, Zijian, Wang, Bin, Shao, Mingwen, Dou, Hongbo, Tao, Boxiang

arXiv.org Artificial IntelligenceJan-6-2025

Hybrid action models are widely considered an effective approach to reinforcement learning (RL) modeling. The current mainstream method is to train agents under Parameterized Action Markov Decision Processes (PAMDPs), which performs well in specific environments. Unfortunately, these models either exhibit drastic low learning efficiency in complex PAMDPs or lose crucial information in the conversion between raw space and latent space. To enhance the learning efficiency and asymptotic performance of the agent, we propose a model-based RL (MBRL) algorithm, FLEXplore. FLEXplore learns a parameterized-action-conditioned dynamics model and employs a modified Model Predictive Path Integral control. Unlike conventional MBRL algorithms, we carefully design the dynamics loss function and reward smoothing process to learn a loose yet flexible model. Additionally, we use the variational lower bound to maximize the mutual information between the state and the hybrid action, enhancing the exploration effectiveness of the agent. We theoretically demonstrate that FLEXplore can reduce the regret of the rollout trajectory through the Wasserstein Metric under given Lipschitz conditions. Our empirical results on several standard benchmarks show that FLEXplore has outstanding learning efficiency and asymptotic performance compared to other baselines.

machine learning, natural language, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2501.02774

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

DoTA: Weight-Decomposed Tensor Adaptation for Large Language Models

Hu, Xiaolin, Cheng, Xiang, Liu, Peiyu, Liu, Wei, Luan, Jian, Wang, Bin, Liu, Yong

arXiv.org Artificial IntelligenceDec-30-2024

Low-rank adaptation (LoRA) reduces the computational and memory demands of fine-tuning large language models (LLMs) by approximating updates with low-rank matrices. However, low-rank approximation in two-dimensional space fails to capture high-dimensional structures within the target matrix. Recently, tensor decomposition methods have been explored for fine-tuning LLMs, leveraging their ability to extract structured information. Yet, these approaches primarily rely on random initialization, and the impact of initialization on tensor adaptation remains underexplored. In this paper, we reveal that random initialization significantly diverges from the validation loss achieved by full fine-tuning. To address this, we propose Weight-Decomposed Tensor Adaptation (DoTA), which leverages the Matrix Product Operator (MPO) decomposition of pre-trained weights for effective initialization in fine-tuning LLMs. Additionally, we introduce QDoTA, a quantized version of DoTA designed for 4-bit quantization. Experiments on commonsense and arithmetic reasoning tasks show that DoTA outperforms random initialization methods with fewer parameters. QDoTA further reduces memory consumption and achieves comparable performance to DoTA on commonsense reasoning tasks. We will release our code to support future research.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.20891

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Path-of-Thoughts: Extracting and Following Paths for Robust Relational Reasoning with Large Language Models

Zhang, Ge, Alomrani, Mohammad Ali, Gu, Hongjian, Zhou, Jiaming, Hu, Yaochen, Wang, Bin, Liu, Qun, Coates, Mark, Zhang, Yingxue, Hao, Jianye

arXiv.org Artificial IntelligenceDec-23-2024

Large language models (LLMs) possess vast semantic knowledge but often struggle with complex reasoning tasks, particularly in relational reasoning problems such as kinship or spatial reasoning. In this paper, we present Path-of-Thoughts (PoT), a novel framework designed to tackle relation reasoning by decomposing the task into three key stages: graph extraction, path identification, and reasoning. Unlike previous approaches, PoT efficiently extracts a task-agnostic graph that identifies crucial entities, relations, and attributes within the problem context. Subsequently, PoT identifies relevant reasoning chains within the graph corresponding to the posed question, facilitating inference of potential answers. Experimental evaluations on four benchmark datasets, demanding long reasoning chains, demonstrate that PoT surpasses state-of-the-art baselines by a significant margin (maximum 21.3%) without necessitating fine-tuning or extensive LLM calls. Furthermore, as opposed to prior neuro-symbolic methods, PoT exhibits improved resilience against LLM errors by leveraging the compositional nature of graphs.

large language model, natural language, relation, (15 more...)

arXiv.org Artificial Intelligence

2412.17963

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

MERaLiON-AudioLLM: Bridging Audio and Language with Large Language Models

He, Yingxu, Liu, Zhuohan, Sun, Shuo, Wang, Bin, Zhang, Wenyu, Zou, Xunlong, Chen, Nancy F., Aw, Ai Ti

arXiv.org Artificial IntelligenceDec-17-2024

We introduce MERaLiON-AudioLLM (Multimodal Empathetic Reasoning and Learning in One Network), the first speech-text model tailored for Singapore's multilingual and multicultural landscape. Developed under the National Large Language Models Funding Initiative, Singapore, MERaLiON-AudioLLM integrates advanced speech and text processing to address the diverse linguistic nuances of local accents and dialects, enhancing accessibility and usability in complex, multilingual environments. Our results demonstrate improvements in both speech recognition and task-specific understanding, positioning MERaLiON-AudioLLM as a pioneering solution for region specific AI applications. We envision this release to set a precedent for future models designed to address localised linguistic and cultural contexts in a global framework.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.09818

Country: Asia > Singapore (0.47)

Genre: Research Report > New Finding (0.68)

Industry: Social Sector (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback