AITopics | Zhang, Chen

Collaborating Authors

Zhang, Chen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

VoiceBench: Benchmarking LLM-Based Voice Assistants

Chen, Yiming, Yue, Xianghu, Zhang, Chen, Gao, Xiaoxue, Tan, Robby T., Li, Haizhou

arXiv.org Artificial IntelligenceDec-11-2024

Building on the success of large language models (LLMs), recent advancements such as GPT-4o have enabled real-time speech interactions through LLM-based voice assistants, offering a significantly improved user experience compared to traditional text-based interactions. However, the absence of benchmarks designed to evaluate these speech interaction capabilities has hindered progress of LLM-based voice assistants development. Current evaluations focus primarily on automatic speech recognition (ASR) or general knowledge evaluation with clean speeches, neglecting the more intricate, real-world scenarios that involve diverse speaker characteristics, environmental and content factors. To address this, we introduce VoiceBench, the first benchmark designed to provide a multi-faceted evaluation of LLM-based voice assistants. VoiceBench also includes both real and synthetic spoken instructions that incorporate the above three key real-world variations. Extensive experiments reveal the limitations of current LLM-based voice assistant models and offer valuable insights for future research and development in this field.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.17196

Country:

North America > United States (0.14)
North America > Canada (0.14)
Europe > France (0.14)
Asia > Thailand (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Beyond Single-Audio: Advancing Multi-Audio Processing in Audio Large Language Models

Chen, Yiming, Yue, Xianghu, Gao, Xiaoxue, Zhang, Chen, D'Haro, Luis Fernando, Tan, Robby T., Li, Haizhou

arXiv.org Artificial IntelligenceNov-6-2024

Various audio-LLMs (ALLMs) have been explored recently for tackling different audio tasks simultaneously using a single, unified model. While existing evaluations of ALLMs primarily focus on single-audio tasks, real-world applications often involve processing multiple audio streams simultaneously. To bridge this gap, we propose the first multi-audio evaluation (MAE) benchmark that consists of 20 datasets from 11 multi-audio tasks encompassing both speech and sound scenarios. Comprehensive experiments on MAE demonstrate that the existing ALLMs, while being powerful in comprehending primary audio elements in individual audio inputs, struggling to handle multi-audio scenarios. To this end, we propose a novel multi-audio-LLM (MALLM) to capture audio context among multiple similar audios using discriminative learning on our proposed synthetic data. The results demonstrate that the proposed MALLM outperforms all baselines and achieves high data efficiency using synthetic data without requiring human annotations. The proposed MALLM opens the door for ALLMs towards multi-audio processing era and brings us closer to replicating human auditory capabilities in machines.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2409.1868

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Media (0.66)
Leisure & Entertainment (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

FreqMark: Invisible Image Watermarking via Frequency Based Optimization in Latent Space

Guo, Yiyang, Li, Ruizhe, Hui, Mude, Guo, Hanzhong, Zhang, Chen, Cai, Chuangjian, Wan, Le, Wang, Shangfei

arXiv.org Artificial IntelligenceOct-28-2024

Invisible watermarking is essential for safeguarding digital content, enabling copyright protection and content authentication. However, existing watermarking methods fall short in robustness against regeneration attacks. In this paper, we propose a novel method called FreqMark that involves unconstrained optimization of the image latent frequency space obtained after VAE encoding. Specifically, FreqMark embeds the watermark by optimizing the latent frequency space of the images and then extracts the watermark through a pre-trained image encoder. This optimization allows a flexible trade-off between image quality with watermark robustness and effectively resists regeneration attacks. Experimental results demonstrate that FreqMark offers significant advantages in image quality and robustness, permits flexible selection of the encoding bit number, and achieves a bit accuracy exceeding 90% when encoding a 48-bit hidden message under various attack scenarios.

artificial intelligence, machine learning, robustness, (17 more...)

arXiv.org Artificial Intelligence

2410.20824

Country: North America > United States > California (0.14)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(2 more...)

Add feedback

Fast Graph Sharpness-Aware Minimization for Enhancing and Accelerating Few-Shot Node Classification

Luo, Yihong, Chen, Yuhan, Qiu, Siya, Wang, Yiwei, Zhang, Chen, Zhou, Yan, Cao, Xiaochun, Tang, Jing

arXiv.org Artificial IntelligenceOct-22-2024

Graph Neural Networks (GNNs) have shown superior performance in node classification. However, GNNs perform poorly in the Few-Shot Node Classification (FSNC) task that requires robust generalization to make accurate predictions for unseen classes with limited labels. To tackle the challenge, we propose the integration of Sharpness-Aware Minimization (SAM)--a technique designed to enhance model generalization by finding a flat minimum of the loss landscape--into GNN training. The standard SAM approach, however, consists of two forward-backward steps in each training iteration, doubling the computational cost compared to the base optimizer (e.g., Adam). To mitigate this drawback, we introduce a novel algorithm, Fast Graph Sharpness-Aware Minimization (FGSAM), that integrates the rapid training of Multi-Layer Perceptrons (MLPs) with the superior performance of GNNs. Specifically, we utilize GNNs for parameter perturbation while employing MLPs to minimize the perturbed loss so that we can find a flat minimum with good generalization more efficiently. Moreover, our method reutilizes the gradient from the perturbation phase to incorporate graph topology into the minimization process at almost zero additional cost. To further enhance training efficiency, we develop FGSAM+ that executes exact perturbations periodically. Extensive experiments demonstrate that our proposed algorithm outperforms the standard SAM with lower computational costs in FSNC tasks. In particular, our FGSAM+ as a SAM variant offers a faster optimization than the base optimizer in most cases. In addition to FSNC, our proposed methods also demonstrate competitive performance in the standard node classification task for heterophilic graphs, highlighting the broad applicability. The code is available at https://github.com/draym28/FGSAM_NeurIPS24.

artificial intelligence, information, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2410.16845

Country: North America > United States > California > Los Angeles County (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Add feedback

MoDification: Mixture of Depths Made Easy

Zhang, Chen, Zhong, Meizhi, Wang, Qimeng, Lu, Xuantao, Ye, Zheyu, Lu, Chengqiang, Gao, Yan, Hu, Yao, Chen, Kehai, Zhang, Min, Song, Dawei

arXiv.org Artificial IntelligenceOct-18-2024

Long-context efficiency has recently become a trending topic in serving large language models (LLMs). And mixture of depths (MoD) is proposed as a perfect fit to bring down both latency and memory. In this paper, however, we discover that MoD can barely transform existing LLMs without costly training over an extensive number of tokens. To enable the transformations from any LLMs to MoD ones, we showcase top-k operator in MoD should be promoted to threshold-p operator, and refinement to architecture and data should also be crafted along. All these designs form our method termed MoDification. Through a comprehensive set of experiments covering model scales from 3B to 70B, we exhibit MoDification strikes an excellent balance between efficiency and effectiveness. MoDification can achieve up to ~1.2x speedup in latency and ~1.8x reduction in memory compared to original LLMs especially in long-context applications.

large language model, machine learning, modification, (19 more...)

arXiv.org Artificial Intelligence

2410.14268

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > Hawaii (0.14)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Training Interactive Agent in Large FPS Game Map with Rule-enhanced Reinforcement Learning

Zhang, Chen, Hu, Huan, Zhou, Yuan, Cao, Qiyang, Liu, Ruochen, Wei, Wenya, Liu, Elvis S.

arXiv.org Artificial IntelligenceOct-7-2024

--In the realm of competitive gaming, 3D first-person shooter (FPS) games have gained immense popularity, prompting the development of game AI systems to enhance gameplay. However, deploying game AI in practical scenarios still poses challenges, particularly in large-scale and complex FPS games. In this paper, we focus on the practical deployment of game AI in the online multiplayer competitive 3D FPS game called Arena Breakout, developed by T encent Games. We propose a novel gaming AI system named Private Military Company Agent (PMCA), which is interactable within a large game map and engages in combat with players while utilizing tactical advantages provided by the surrounding terrain. T o address the challenges of navigation and combat in modern 3D FPS games, we introduce a method that combines navigation mesh (Navmesh) and shooting-rule with deep reinforcement learning (NSRL). The integration of Navmesh enhances the agent's global navigation capabilities while shooting behavior is controlled using rule-based methods to ensure controllability. NSRL employs a DRL model to predict when to enable the navigation mesh, resulting in a diverse range of behaviors for the game AI. Customized rewards for human-like behaviors are also employed to align PMCA's behavior with that of human players. I NTRODUCTION First-person shooter (FPS) games in 3D have gained immense popularity in the competitive gaming realm. As these games have evolved from early titles like Maze War and Half-Life to more recent ones such as Apex Legends, CS: GO, and V alorant, there has been a growing interest in developing intelligent AI systems for FPS games.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2410.04936

Country: Asia > China (0.14)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Benchmarking Large Language Models for Conversational Question Answering in Multi-instructional Documents

Wu, Shiwei, Zhang, Chen, Gao, Yan, Wang, Qimeng, Xu, Tong, Hu, Yao, Chen, Enhong

arXiv.org Artificial IntelligenceOct-1-2024

Instructional documents are rich sources of knowledge for completing various tasks, yet their unique challenges in conversational question answering (CQA) have not been thoroughly explored. Existing benchmarks have primarily focused on basic factual question-answering from single narrative documents, making them inadequate for assessing a model`s ability to comprehend complex real-world instructional documents and provide accurate step-by-step guidance in daily life. To bridge this gap, we present InsCoQA, a novel benchmark tailored for evaluating large language models (LLMs) in the context of CQA with instructional documents. Sourced from extensive, encyclopedia-style instructional content, InsCoQA assesses models on their ability to retrieve, interpret, and accurately summarize procedural guidance from multiple documents, reflecting the intricate and multi-faceted nature of real-world instructional tasks. Additionally, to comprehensively assess state-of-the-art LLMs on the InsCoQA benchmark, we propose InsEval, an LLM-assisted evaluator that measures the integrity and accuracy of generated responses and procedural instructions.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.00526

Country: Asia (0.14)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback

EmoPro: A Prompt Selection Strategy for Emotional Expression in LM-based Speech Synthesis

Wang, Haoyu, Qiang, Chunyu, Wang, Tianrui, Gong, Cheng, Liu, Qiuyu, Jiang, Yu, Wang, Xiaobao, Wang, Chenyang, Zhang, Chen

arXiv.org Artificial IntelligenceSep-27-2024

Recent advancements in speech synthesis models, trained on extensive datasets, have demonstrated remarkable zero-shot capabilities. These models can control content, timbre, and emotion in generated speech based on prompt inputs. Despite these advancements, the choice of prompts significantly impacts the output quality, yet most existing selection schemes do not adequately address the control of emotional intensity. To address this question, this paper proposes a two-stage prompt selection strategy EmoPro, which is specifically designed for emotionally controllable speech synthesis. This strategy focuses on selecting highly expressive and high-quality prompts by evaluating them from four perspectives: emotional expression strength, speech quality, text-emotion consistency, and model generation performance. Experimental results show that prompts selected using the proposed method result in more emotionally expressive and engaging synthesized speech compared to those obtained through baseline. Audio samples and codes will be available at https://whyrrrrun.github.io/EmoPro/.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2409.18512

Country: Asia > China (0.29)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Disentangling Age and Identity with a Mutual Information Minimization Approach for Cross-Age Speaker Verification

Zhang, Fengrun, Zhou, Wangjin, Liu, Yiming, Geng, Wang, Shan, Yahui, Zhang, Chen

arXiv.org Artificial IntelligenceSep-24-2024

There has been an increasing research interest in cross-age speaker verification~(CASV). However, existing speaker verification systems perform poorly in CASV due to the great individual differences in voice caused by aging. In this paper, we propose a disentangled representation learning framework for CASV based on mutual information~(MI) minimization. In our method, a backbone model is trained to disentangle the identity- and age-related embeddings from speaker information, and an MI estimator is trained to minimize the correlation between age- and identity-related embeddings via MI minimization, resulting in age-invariant speaker embeddings. Furthermore, by using the age gaps between positive and negative samples, we propose an aging-aware MI minimization loss function that allows the backbone model to focus more on the vocal changes with large age gaps. Experimental results show that the proposed method outperforms other methods on multiple Cross-Age test sets of Vox-CA.

artificial intelligence, information, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2409.15974

Country:

Asia > Japan (0.14)
Asia > China (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Speech > Acoustic Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

XTraffic: A Dataset Where Traffic Meets Incidents with Explainability and More

Gou, Xiaochuan, Li, Ziyue, Lan, Tian, Lin, Junpeng, Li, Zhishuai, Zhao, Bingyu, Zhang, Chen, Wang, Di, Zhang, Xiangliang

arXiv.org Artificial IntelligenceJul-16-2024

Long-separated research has been conducted on two highly correlated tracks: traffic and incidents. Traffic track witnesses complicating deep learning models, e.g., to push the prediction a few percent more accurate, and the incident track only studies the incidents alone, e.g., to infer the incident risk. We, for the first time, spatiotemporally aligned the two tracks in a large-scale region (16,972 traffic nodes) over the whole year of 2023: our XTraffic dataset includes traffic, i.e., time-series indexes on traffic flow, lane occupancy, and average vehicle speed, and incidents, whose records are spatiotemporally-aligned with traffic data, with seven different incident classes. Additionally, each node includes detailed physical and policy-level meta-attributes of lanes. Our data can revolutionalize traditional traffic-related tasks towards higher interpretability and practice: instead of traditional prediction or classification tasks, we conduct: (1) post-incident traffic forecasting to quantify the impact of different incidents on traffic indexes; (2) incident classification using traffic indexes to determine the incidents types for precautions measures; (3) global causal analysis among the traffic indexes, meta-attributes, and incidents to give high-level guidance of the interrelations of various factors; (4) local causal analysis within road nodes to examine how different incidents affect the road segments' relations. The dataset is available at http://xaitraffic.github.io.

artificial intelligence, incident, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2407.11477

Country:

Europe (1.00)
North America > United States > California (0.14)

Genre: Research Report > Experimental Study (0.46)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback