AITopics | Lv, Zhao

Collaborating Authors

Lv, Zhao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

COLA: A Scalable Multi-Agent Framework For Windows UI Task Automation

Zhao, Di, Ma, Longhui, Wang, Siwei, Wang, Miao, Lv, Zhao

arXiv.org Artificial IntelligenceMar-12-2025

With the rapid advancements in Large Language Models (LLMs), an increasing number of studies have leveraged LLMs as the cognitive core of agents to address complex task decision-making challenges. Specially, recent research has demonstrated the potential of LLM-based agents on automating Windows GUI operations. However, existing methodologies exhibit two critical challenges: (1) static agent architectures fail to dynamically adapt to the heterogeneous requirements of OS-level tasks, leading to inadequate scenario generalization;(2) the agent workflows lack fault tolerance mechanism, necessitating complete process re-execution for UI agent decision error. To address these limitations, we introduce \textit{COLA}, a collaborative multi-agent framework for automating Windows UI operations. In this framework, a scenario-aware agent Task Scheduler decomposes task requirements into atomic capability units, dynamically selects the optimal agent from a decision agent pool, effectively responds to the capability requirements of diverse scenarios. The decision agent pool supports plug-and-play expansion for enhanced flexibility. In addition, we design a memory unit equipped to all agents for their self-evolution. Furthermore, we develop an interactive backtracking mechanism that enables human to intervene to trigger state rollbacks for non-destructive process repair. Our experimental results on the GAIA benchmark demonstrates that the \textit{COLA} framework achieves state-of-the-art performance with an average score of 31.89\%, significantly outperforming baseline approaches without web API integration. Ablation studies further validate the individual contributions of our dynamic scheduling. The code is available at https://github.com/Alokia/COLA-demo.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.09263

Country: Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Automated Detection of Epileptic Spikes and Seizures Incorporating a Novel Spatial Clustering Prior

Dong, Hanyang, Sheng, Shurong, Wang, Xiongfei, Gao, Jiahong, Sun, Yi, Yang, Wanli, Xiao, Kuntao, Teng, Pengfei, Luan, Guoming, Lv, Zhao

arXiv.org Artificial IntelligenceJan-4-2025

A Magnetoencephalography (MEG) time-series recording consists of multi-channel signals collected by superconducting sensors, with each signal's intensity reflecting magnetic field changes over time at the sensor location. Automating epileptic MEG spike detection significantly reduces manual assessment time and effort, yielding substantial clinical benefits. Existing research addresses MEG spike detection by encoding neural network inputs with signals from all channel within a time segment, followed by classification. However, these methods overlook simultaneous spiking occurred from nearby sensors. We introduce a simple yet effective paradigm that first clusters MEG channels based on their sensor's spatial position. Next, a novel convolutional input module is designed to integrate the spatial clustering and temporal changes of the signals. This module is fed into a custom MEEG-ResNet3D developed by the authors, which learns to extract relevant features and classify the input as a spike clip or not. Our method achieves an F1 score of 94.73% on a large real-world MEG dataset Sanbo-CMR collected from two centers, outperforming state-of-the-art approaches by 1.85%. Moreover, it demonstrates efficacy and stability in the Electroencephalographic (EEG) seizure detection task, yielding an improved weighted F1 score of 1.4% compared to current state-of-the-art techniques evaluated on TUSZ, whch is the largest EEG seizure dataset.

artificial intelligence, detection, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2501.10404

Country: Asia > China > Anhui Province (0.14)

Genre: Research Report > Promising Solution (0.54)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Region-Based Optimization in Continual Learning for Audio Deepfake Detection

Chen, Yujie, Yi, Jiangyan, Fan, Cunhang, Tao, Jianhua, Ren, Yong, Zeng, Siding, Zhang, Chu Yuan, Yan, Xinrui, Gu, Hao, Xue, Jun, Wang, Chenglong, Lv, Zhao, Zhang, Xiaohui

arXiv.org Artificial IntelligenceDec-16-2024

Rapid advancements in speech synthesis and voice conversion bring convenience but also new security risks, creating an urgent need for effective audio deepfake detection. Although current models perform well, their effectiveness diminishes when confronted with the diverse and evolving nature of real-world deepfakes. To address this issue, we propose a continual learning method named Region-Based Optimization (RegO) for audio deepfake detection. Specifically, we use the Fisher information matrix to measure important neuron regions for real and fake audio detection, dividing them into four regions. First, we directly fine-tune the less important regions to quickly adapt to new tasks. Next, we apply gradient optimization in parallel for regions important only to real audio detection, and in orthogonal directions for regions important only to fake audio detection. For regions that are important to both, we use sample proportion-based adaptive gradient optimization. This region-adaptive optimization ensures an appropriate trade-off between memory stability and learning plasticity. Additionally, to address the increase of redundant neurons from old tasks, we further introduce the Ebbinghaus forgetting mechanism to release them, thereby promoting the capability of the model to learn more generalized discriminative features. Experimental results show our method achieves a 21.3% improvement in EER over the state-of-the-art continual learning approach RWM for audio deepfake detection. Moreover, the effectiveness of RegO extends beyond the audio deepfake detection domain, showing potential significance in other tasks, such as image recognition. The code is available at https://github.com/cyjie429/RegO

artificial intelligence, detection, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2412.11551

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.88)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)

Add feedback

DARNet: Dual Attention Refinement Network with Spatiotemporal Construction for Auditory Attention Detection

Yan, Sheng, fan, Cunhang, Zhang, Hongyu, Yang, Xiaoke, Tao, Jianhua, Lv, Zhao

arXiv.org Artificial IntelligenceNov-18-2024

At a cocktail party, humans exhibit an impressive ability to direct their attention. The auditory attention detection (AAD) approach seeks to identify the attended speaker by analyzing brain signals, such as EEG signals. However, current AAD algorithms overlook the spatial distribution information within EEG signals and lack the ability to capture long-range latent dependencies, limiting the model's ability to decode brain activity. To address these issues, this paper proposes a dual attention refinement network with spatiotemporal construction for AAD, named DARNet, which consists of the spatiotemporal construction module, dual attention refinement module, and feature fusion \& classifier module. Specifically, the spatiotemporal construction module aims to construct more expressive spatiotemporal feature representations, by capturing the spatial distribution characteristics of EEG signals. The dual attention refinement module aims to extract different levels of temporal patterns in EEG signals and enhance the model's ability to capture long-range latent dependencies. The feature fusion \& classifier module aims to aggregate temporal patterns and dependencies from different levels and obtain the final classification results. The experimental results indicate that compared to the state-of-the-art models, DARNet achieves an average classification accuracy improvement of 5.9\% for 0.1s, 4.6\% for 1s, and 3.9\% for 2s on the DTU dataset. While maintaining excellent classification performance, DARNet significantly reduces the number of required parameters. Compared to the state-of-the-art models, DARNet reduces the parameter count by 91\%. Code is available at: https://github.com/fchest/DARNet.git.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2410.11181

Country: Asia > China (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Mitigating Gender Bias in Code Large Language Models via Model Editing

Qin, Zhanyue, Wang, Haochuan, Wang, Zecheng, Liu, Deyuan, Fan, Cunhang, Lv, Zhao, Tu, Zhiying, Chu, Dianhui, Sui, Dianbo

arXiv.org Artificial IntelligenceOct-10-2024

In recent years, with the maturation of large language model (LLM) technology and the emergence of high-quality programming code datasets, researchers have become increasingly confident in addressing the challenges of program synthesis automatically. However, since most of the training samples for LLMs are unscreened, it is inevitable that LLMs' performance may not align with real-world scenarios, leading to the presence of social bias. To evaluate and quantify the gender bias in code LLMs, we propose a dataset named CodeGenBias (Gender Bias in the Code Generation) and an evaluation metric called FB-Score (Factual Bias Score) based on the actual gender distribution of correlative professions. With the help of CodeGenBias and FB-Score, we evaluate and analyze the gender bias in eight mainstream Code LLMs. Previous work has demonstrated that model editing methods that perform well in knowledge editing have the potential to mitigate social bias in LLMs. Therefore, we develop a model editing approach named MG-Editing (Multi-Granularity model Editing), which includes the locating and editing phases. Our model editing method MG-Editing can be applied at five different levels of model parameter granularity: full parameters level, layer level, module level, row level, and neuron level. Extensive experiments not only demonstrate that our MG-Editing can effectively mitigate the gender bias in code LLMs while maintaining their general code generation capabilities, but also showcase its excellent generalization. At the same time, the experimental results show that, considering both the gender bias of the model and its general code generation capability, MG-Editing is most effective when applied at the row and neuron levels of granularity.

artificial intelligence, large language model, natural language, (13 more...)

arXiv.org Artificial Intelligence

2410.0782

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging

Liu, Deyuan, Qin, Zhanyue, Wang, Hairu, Yang, Zhao, Wang, Zecheng, Rong, Fangying, Liu, Qingbin, Hao, Yanchao, Chen, Xi, Fan, Cunhang, Lv, Zhao, Tu, Zhiying, Chu, Dianhui, Li, Bo, Sui, Dianbo

arXiv.org Artificial IntelligenceJun-24-2024

While large language models (LLMs) excel in many domains, their complexity and scale challenge deployment in resource-limited environments. Current compression techniques, such as parameter pruning, often fail to effectively utilize the knowledge from pruned parameters. To address these challenges, we propose Manifold-Based Knowledge Alignment and Layer Merging Compression (MKA), a novel approach that uses manifold learning and the Normalized Pairwise Information Bottleneck (NPIB) measure to merge similar layers, reducing model size while preserving essential performance. We evaluate MKA on multiple benchmark datasets and various LLMs. Our findings show that MKA not only preserves model performance but also achieves substantial compression ratios, outperforming traditional pruning methods. Moreover, when coupled with quantization, MKA delivers even greater compression. Specifically, on the MMLU dataset using the Llama3-8B model, MKA achieves a compression ratio of 43.75% with a minimal performance decrease of only 2.82\%. The proposed MKA method offers a resource-efficient and performance-preserving model compression technique for LLMs.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2406.1633

Country:

North America > Canada (0.14)
Asia > China (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

UNO Arena for Evaluating Sequential Decision-Making Capability of Large Language Models

Qin, Zhanyue, Wang, Haochuan, Liu, Deyuan, Song, Ziyang, Fan, Cunhang, Lv, Zhao, Wu, Jinlin, Lei, Zhen, Tu, Zhiying, Chu, Dianhui, Yu, Xiaoyan, Sui, Dianbo

arXiv.org Artificial IntelligenceJun-24-2024

Sequential decision-making refers to algorithms that take into account the dynamics of the environment, where early decisions affect subsequent decisions. With large language models (LLMs) demonstrating powerful capabilities between tasks, we can't help but ask: Can Current LLMs Effectively Make Sequential Decisions? In order to answer this question, we propose the UNO Arena based on the card game UNO to evaluate the sequential decision-making capability of LLMs and explain in detail why we choose UNO. In UNO Arena, We evaluate the sequential decision-making capability of LLMs dynamically with novel metrics based Monte Carlo methods. We set up random players, DQN-based reinforcement learning players, and LLM players (e.g. GPT-4, Gemini-pro) for comparison testing. Furthermore, in order to improve the sequential decision-making capability of LLMs, we propose the TUTRI player, which can involves having LLMs reflect their own actions wtih the summary of game history and the game strategy. Numerous experiments demonstrate that the TUTRI player achieves a notable breakthrough in the performance of sequential decision-making compared to the vanilla LLM player.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2406.16382

Country: Asia (0.14)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Progressive Distillation Based on Masked Generation Feature Method for Knowledge Graph Completion

Fan, Cunhang, Chen, Yujie, Xue, Jun, Kong, Yonghui, Tao, Jianhua, Lv, Zhao

arXiv.org Artificial IntelligenceJan-19-2024

In recent years, knowledge graph completion (KGC) models based on pre-trained language model (PLM) have shown promising results. However, the large number of parameters and high computational cost of PLM models pose challenges for their application in downstream tasks. This paper proposes a progressive distillation method based on masked generation features for KGC task, aiming to significantly reduce the complexity of pre-trained models. Specifically, we perform pre-distillation on PLM to obtain high-quality teacher models, and compress the PLM network to obtain multi-grade student models. However, traditional feature distillation suffers from the limitation of having a single representation of information in teacher models. To solve this problem, we propose masked generation of teacher-student features, which contain richer representation information. Furthermore, there is a significant gap in representation ability between teacher and student. Therefore, we design a progressive distillation method to distill student models at each grade level, enabling efficient knowledge transfer from teachers to students. The experimental results demonstrate that the model in the pre-distillation stage surpasses the existing state-of-the-art methods. Furthermore, in the progressive distillation stage, the model significantly reduces the model parameters while maintaining a certain level of performance. Specifically, the model parameters of the lower-grade student model are reduced by 56.7\% compared to the baseline.

information, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2401.12997

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.48)

Industry:

Education (1.00)
Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.86)

Add feedback

DGSD: Dynamical Graph Self-Distillation for EEG-Based Auditory Spatial Attention Detection

Fan, Cunhang, Zhang, Hongyu, Huang, Wei, Xue, Jun, Tao, Jianhua, Yi, Jiangyan, Lv, Zhao, Wu, Xiaopei

arXiv.org Artificial IntelligenceSep-7-2023

Auditory Attention Detection (AAD) aims to detect target speaker from brain signals in a multi-speaker environment. Although EEG-based AAD methods have shown promising results in recent years, current approaches primarily rely on traditional convolutional neural network designed for processing Euclidean data like images. This makes it challenging to handle EEG signals, which possess non-Euclidean characteristics. In order to address this problem, this paper proposes a dynamical graph self-distillation (DGSD) approach for AAD, which does not require speech stimuli as input. Specifically, to effectively represent the non-Euclidean properties of EEG signals, dynamical graph convolutional networks are applied to represent the graph structure of EEG signals, which can also extract crucial features related to auditory spatial attention in EEG signals. In addition, to further improve AAD detection performance, self-distillation, consisting of feature distillation and hierarchical distillation strategies at each layer, is integrated. These strategies leverage features and classification results from the deepest network layers to guide the learning of shallow layers. Our experiments are conducted on two publicly available datasets, KUL and DTU. Under a 1-second time window, we achieve results of 90.0\% and 79.6\% accuracy on KUL and DTU, respectively. We compare our DGSD method with competitive baselines, and the experimental results indicate that the detection performance of our proposed DGSD method is not only superior to the best reproducible baseline but also significantly reduces the number of trainable parameters by approximately 100 times.

artificial intelligence, eeg-based auditory spatial attention detection, machine learning, (2 more...)

arXiv.org Artificial Intelligence

2309.07147

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.53)

Add feedback

Multi-perspective Information Fusion Res2Net with RandomSpecmix for Fake Speech Detection

Dong, Shunbo, Xue, Jun, Fan, Cunhang, Zhu, Kang, Chen, Yujie, Lv, Zhao

arXiv.org Artificial IntelligenceJun-27-2023

In this paper, we propose the multi-perspective information fusion (MPIF) Res2Net with random Specmix for fake speech detection (FSD). The main purpose of this system is to improve the model's ability to learn precise forgery information for FSD task in low-quality scenarios. The task of random Specmix, a data augmentation, is to improve the generalization ability of the model and enhance the model's ability to locate discriminative information. Specmix cuts and pastes the frequency dimension information of the spectrogram in the same batch of samples without introducing other data, which helps the model to locate the really useful information. At the same time, we randomly select samples for augmentation to reduce the impact of data augmentation directly changing all the data. Once the purpose of helping the model to locate information is achieved, it is also important to reduce unnecessary information. The role of MPIF-Res2Net is to reduce redundant interference information. Deceptive information from a single perspective is always similar, so the model learning this similar information will produce redundant spoofing clues and interfere with truly discriminative information. The proposed MPIF-Res2Net fuses information from different perspectives, making the information learned by the model more diverse, thereby reducing the redundancy caused by similar information and avoiding interference with the learning of discriminative information. The results on the ASVspoof 2021 LA dataset demonstrate the effectiveness of our proposed method, achieving EER and min-tDCF of 3.29% and 0.2557, respectively.

artificial intelligence, information, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2306.15389

Country: Asia > China > Anhui Province (0.14)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.49)

Add feedback