AITopics | Chen, Hang

Collaborating Authors

Chen, Hang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Deep CLAS: Deep Contextual Listen, Attend and Spell

Wang, Mengzhi, Xiong, Shifu, Wan, Genshun, Chen, Hang, Gao, Jianqing, Dai, Lirong

arXiv.org Artificial IntelligenceDec-19-2024

Contextual-LAS (CLAS) has been shown effective in improving Automatic Speech Recognition (ASR) of rare words. It relies on phrase-level contextual modeling and attention-based relevance scoring without explicit contextual constraint which lead to insufficient use of contextual information. In this work, we propose deep CLAS to use contextual information better. We introduce bias loss forcing model to focus on contextual information. The query of bias attention is also enriched to improve the accuracy of the bias attention score. To get fine-grained contextual information, we replace phrase-level encoding with character-level encoding and encode contextual information with conformer rather than LSTM. Moreover, we directly use the bias attention score to correct the output probability distribution of the model. Experiments using the public AISHELL-1 and AISHELL-NER. On AISHELL-1, compared to CLAS baselines, deep CLAS obtains a 65.78% relative recall and a 53.49% relative F1-score increase in the named entity recognition scene.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2409.17603

Country:

North America (0.46)
Asia > China (0.29)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.90)

Add feedback

Unveiling Language Skills via Path-Level Circuit Discovery

Chen, Hang, Zhu, Jiaying, Yang, Xinyu, Wang, Wenya

arXiv.org Artificial IntelligenceDec-15-2024

Circuit discovery with edge-level ablation has become a foundational framework for mechanism interpretability of language models. However, its focus on individual edges often overlooks the sequential, path-level causal relationships that underpin complex behaviors, thus potentially leading to misleading or incomplete circuit discoveries. To address this issue, we propose a novel path-level circuit discovery framework capturing how behaviors emerge through interconnected linear chain and build towards complex behaviors. Our framework is constructed upon a fully-disentangled linear combinations of ``memory circuits'' decomposed from the original model. To discover functional circuit paths, we leverage a 2-step pruning strategy by first reducing the computational graph to a faithful and minimal subgraph and then applying causal mediation to identify common paths of a specific skill, termed as skill paths. In contrast to circuit graph from existing works, we focus on the complete paths of a generic skill rather than on the fine-grained responses to individual components of the input. To demonstrate this, we explore three generic language skills, namely Previous Token Skill, Induction Skill and In-Context Learning Skill using our framework and provide more compelling evidence to substantiate stratification and inclusiveness of these skills.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.01334

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (0.67)
Media (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)

Add feedback

Lightweight Transducer Based on Frame-Level Criterion

Wan, Genshun, Wang, Mengzhi, Mao, Tingzhi, Chen, Hang, Ye, Zhongfu

arXiv.org Artificial IntelligenceNov-1-2024

The transducer model trained based on sequence-level criterion requires a lot of memory due to the generation of the large probability matrix. We proposed a lightweight transducer model based on frame-level criterion, which uses the results of the CTC forced alignment algorithm to determine the label for each frame. Then the encoder output can be combined with the decoder output at the corresponding time, rather than adding each element output by the encoder to each element output by the decoder as in the transducer. This significantly reduces memory and computation requirements. To address the problem of imbalanced classification caused by excessive blanks in the label, we decouple the blank and non-blank probabilities and truncate the gradient of the blank classifier to the main network. Experiments on the AISHELL-1 demonstrate that this enables the lightweight transducer to achieve similar results to transducer. Additionally, we use richer information to predict the probability of blank, achieving superior results to transducer.

artificial intelligence, machine learning, transducer, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.21437/Interspeech.2024-768

2409.13698

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Individualized Federated Learning for Traffic Prediction with Error Driven Aggregation

Chen, Hang, Meese, Collin, Nejad, Mark, Shen, Chien-Chung

arXiv.org Artificial IntelligenceJul-16-2024

Low-latency traffic prediction is vital for smart city traffic management. Federated Learning has emerged as a promising technique for Traffic Prediction (FLTP), offering several advantages such as privacy preservation, reduced communication overhead, improved prediction accuracy, and enhanced adaptability to changing traffic conditions. However, majority of the current FLTP frameworks lack a real-time model updating scheme, which hinders their ability to continuously incorporate new incoming traffic data and adapt effectively to the changing dynamics of traffic trends. Another concern with the existing FLTP frameworks is their reliance on the conventional FL model aggregation method, which involves assigning an identical model (i.e., the global model) to all traffic monitoring devices to predict their individual local traffic trends, thereby neglecting the non-IID characteristics of traffic data collected in different locations. Building upon these findings and harnessing insights from reinforcement learning, we propose NeighborFL, an individualized real-time federated learning scheme that introduces a haversine distance-based and error-driven, personalized local models grouping heuristic from the perspective of each individual traffic node. This approach allows NeighborFL to create location-aware and tailored prediction models for each client while fostering collaborative learning. Simulations demonstrate the effectiveness of NeighborFL, offering improved real-time prediction accuracy over three baseline models, with one experimental setting showing a 16.9% reduction in MSE value compared to a naive FL setting.

artificial intelligence, individualized federated learning, machine learning, (2 more...)

arXiv.org Artificial Intelligence

2407.12226

Genre: Research Report (1.00)

Industry: Transportation (0.53)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Balance of Number of Embedding and their Dimensions in Vector Quantization

Chen, Hang, Reddy, Sankepally Sainath, Chen, Ziwei, Liu, Dianbo

arXiv.org Artificial IntelligenceJul-5-2024

The dimensionality of the embedding and the number of available embeddings ( also called codebook size) are critical factors influencing the performance of Vector Quantization(VQ), a discretization process used in many models such as the Vector Quantized Variational Autoencoder (VQ-VAE) architecture. This study examines the balance between the codebook sizes and dimensions of embeddings in VQ, while maintaining their product constant. Traditionally, these hyper parameters are static during training; however, our findings indicate that augmenting the codebook size while simultaneously reducing the embedding dimension can significantly boost the effectiveness of the VQ-VAE. As a result, the strategic selection of codebook size and embedding dimensions, while preserving the capacity of the discrete codebook space, is critically important. To address this, we propose a novel adaptive dynamic quantization approach, underpinned by the Gumbel-Softmax mechanism, which allows the model to autonomously determine the optimal codebook configuration for each data instance. This dynamic discretizer gives the VQ-VAE remarkable flexibility. Thorough empirical evaluations across multiple benchmark datasets validate the notable performance enhancements achieved by our approach, highlighting the significant potential of adaptive dynamic quantization to improve model performance.

artificial intelligence, codebook, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2407.04939

Country:

Asia (0.14)
North America > United States (0.14)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Super-resolution imaging using super-oscillatory diffractive neural networks

Chen, Hang, Gao, Sheng, Zhao, Zejia, Duan, Zhengyang, Zhang, Haiou, Wetzstein, Gordon, Lin, Xing

arXiv.org Artificial IntelligenceJun-27-2024

The Abbe-Rayleigh diffraction limit of traditional optical equipment has always been an obstacle to the study of micro-/nano-scale objects [1, 2]. Near-field microscopic imaging techniques, such as contact photography [3] and scanning near-field imaging (SNOM) [4, 5], capture evanescent fields by placing a probe or light-sensitive material extremely close to the object to achieve nanoscale resolution, which is not possible for imaging inside biological samples or encapsulated micro-/nano-structures. Far-field microscopic imaging technology is not restricted by the above bottlenecks. Some typical far-field microscopic imaging techniques, such as single-molecule localization (SML) microscopy [6, 7] or stimulated emission depletion (STED) [8, 9], have demonstrated the possibility of nanoscale imaging without capturing evanescent fields. However, SML microscopy and STED typically require intense beams to excite, deplete, or bleach fluorophores in a sample that produces photobleaching and phototoxicity in living samples. Optical super-oscillations are rapid sub-wavelength spatial variations of light intensity and phase that occur in complex electromagnetic fields formed by the precise interference of coherent waves, which provide an advanced method for far-field super-resolution imaging beyond the diffraction limit [10, 11]. To generate optical super-oscillation, the complicated lens design methods [12-14] or Fresnel zone plate (FZP) optimization design methods, including optimizing algorithms [15-18] or optimization-free algorithms [19, 20], have been proposed.

artificial intelligence, machine learning, sodnn, (16 more...)

arXiv.org Artificial Intelligence

2406.19126

Country:

Asia (0.70)
North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Health Care Technology (0.66)
Education > Educational Setting > Higher Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)

Add feedback

Enhancing Voice Wake-Up for Dysarthria: Mandarin Dysarthria Speech Corpus Release and Customized System Design

Gao, Ming, Chen, Hang, Du, Jun, Xu, Xin, Guo, Hongxiao, Bu, Hui, Yang, Jianxing, Li, Ming, Lee, Chin-Hui

arXiv.org Artificial IntelligenceJun-13-2024

Smart home technology has gained widespread adoption, facilitating effortless control of devices through voice commands. However, individuals with dysarthria, a motor speech disorder, face challenges due to the variability of their speech. This paper addresses the wake-up word spotting (WWS) task for dysarthric individuals, aiming to integrate them into real-world applications. To support this, we release the open-source Mandarin Dysarthria Speech Corpus (MDSC), a dataset designed for dysarthric individuals in home environments. MDSC encompasses information on age, gender, disease types, and intelligibility evaluations. Furthermore, we perform comprehensive experimental analysis on MDSC, highlighting the challenges encountered. We also develop a customized dysarthria WWS system that showcases robustness in handling intelligibility and achieving exceptional performance. MDSC will be released on https://www.aishelltech.com/AISHELL_6B.

artificial intelligence, dysarthria, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2406.10304

Country:

Asia > China (0.15)
North America > United States (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.48)

Add feedback

Quantifying Emergence in Large Language Models

Chen, Hang, Yang, Xinyu, Zhu, Jiaying, Wang, Wenya

arXiv.org Artificial IntelligenceMay-21-2024

Emergence, broadly conceptualized as the ``intelligent'' behaviors of LLMs, has recently been studied and proved challenging to quantify due to the lack of a measurable definition. Most commonly, it has been estimated statistically through model performances across extensive datasets and tasks, which consumes significant resources. In addition, such estimation is difficult to interpret and may not accurately reflect the models' intrinsic emergence. In this work, we propose a quantifiable solution for estimating emergence. Inspired by emergentism in dynamics, we quantify the strength of emergence by comparing the entropy reduction of the macroscopic (semantic) level with that of the microscopic (token) level, both of which are derived from the representations within the transformer block. Using a low-cost estimator, our quantification method demonstrates consistent behaviors across a suite of LMs (GPT-2, GEMMA, etc.) under both in-context learning (ICL) and natural sentences. Empirical results show that (1) our method gives consistent measurements which align with existing observations based on performance metrics, validating the effectiveness of our emergence quantification; (2) our proposed metric uncovers novel emergence patterns such as the correlations between the variance of our metric and the number of ``shots'' in ICL, which further suggests a new way of interpreting hallucinations in LLMs; (3) we offer a potential solution towards estimating the emergence of larger and closed-resource LMs via smaller LMs like GPT-2. Our codes are available at: https://github.com/Zodiark-ch/Emergence-of-LLMs/.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2405.12617

Country:

Europe (0.94)
Asia (0.93)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition

Dai, Yusheng, Chen, Hang, Du, Jun, Wang, Ruoyu, Chen, Shihao, Ma, Jiefeng, Wang, Haotian, Lee, Chin-Hui

arXiv.org Artificial IntelligenceMar-7-2024

Advanced Audio-Visual Speech Recognition (AVSR) systems have been observed to be sensitive to missing video frames, performing even worse than single-modality models. While applying the dropout technique to the video modality enhances robustness to missing frames, it simultaneously results in a performance loss when dealing with complete data input. In this paper, we investigate this contrasting phenomenon from the perspective of modality bias and reveal that an excessive modality bias on the audio caused by dropout is the underlying reason. Moreover, we present the Modality Bias Hypothesis (MBH) to systematically describe the relationship between modality bias and robustness against missing modality in multimodal systems. Building on these findings, we propose a novel Multimodal Distribution Approximation with Knowledge Distillation (MDA-KD) framework to reduce over-reliance on the audio modality and to maintain performance and robustness simultaneously. Finally, to address an entirely missing modality, we adopt adapters to dynamically switch decision strategies. The effectiveness of our proposed approach is evaluated and validated through a series of comprehensive experiments using the MISP2021 and MISP2022 datasets. Our code is available at https://github.com/dalision/ModalBiasAVSR

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2403.04245

Country:

Europe (0.14)
Asia > China (0.14)
Africa (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Towards Causal Relationship in Indefinite Data: Baseline Model and New Datasets

Chen, Hang, Yang, Xinyu, Du, Keqing

arXiv.org Artificial IntelligenceJan-16-2024

Integrating deep learning and causal discovery has encouraged us to spot that learning causal structures and representations in dialogue and video is full of challenges. We defined These data forms as "Indefinite Data", characterized by multi-structure data and multi-value representations. Unlike existing adaptable data forms, Indefinite Data still faces gaps in datasets and methods. To address the dataset gap, we release two high-quality datasets - Causalogue and Causaction, containing text dialogue samples and video action samples with causal annotations respectively. Moreover, the method gap arises from the coexistence of multi-structure data and multi-value representations, breaking the assumptions of all current methods and rendering them infeasible on Indefinite Data. To this end, we propose a probabilistic framework as a baseline, incorporating three designed highlights for this gap: 1) establishing Causation Condition of representations using the independence of noise terms under non-fixed causal structures, 2) treating causal strength as a latent variable and measuring the reconstruction loss in the correlation space, and 3) estimating the effects of latent confounders. These highpoints make the probabilistic model capable of overcoming challenges brought by the coexistence of multi-structure data and multi-value representations and pave the way for the extension of latent confounders. Comprehensive experiments have evaluated baseline results of causal structures, causal representations, and confounding disentanglement.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2401.08221

Country: Asia (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback