AITopics | Li, Xiang-Yang

Collaborating Authors

Li, Xiang-Yang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Beyond Local Selection: Global Cut Selection for Enhanced Mixed-Integer Programming

Zeng, Shuli, Zhang, Sijia, Li, Shaoang, Wu, Feng, Li, Xiang-Yang

arXiv.org Artificial IntelligenceMar-20-2025

In mixed-integer programming (MIP) solvers, cutting planes are essential for Branch-and-Cut (B&C) algorithms as they reduce the search space and accelerate the solving process. Traditional methods rely on hard-coded heuristics for cut plane selection but fail to leverage problem-specific structural features. Recent machine learning approaches use neural networks for cut selection but focus narrowly on the efficiency of single-node within the B&C algorithm, without considering the broader contextual information. To address this, we propose Global Cut Selection (GCS), which uses a bipartite graph to represent the search tree and combines graph neural networks with reinforcement learning to develop cut selection strategies. Unlike prior methods, GCS applies cutting planes across all nodes, incorporating richer contextual information. Experiments show GCS significantly improves solving efficiency for synthetic and large-scale real-world MIPs compared to traditional and learning-based methods.

artificial intelligence, machine learning, node, (19 more...)

arXiv.org Artificial Intelligence

2503.15847

Country: Europe > France (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

FFCG: Effective and Fast Family Column Generation for Solving Large-Scale Linear Program

Hu, Yi-Xiang, Wu, Feng, Li, Shaoang, Zhao, Yifang, Li, Xiang-Yang

arXiv.org Artificial IntelligenceDec-26-2024

Column Generation (CG) is an effective and iterative algorithm to solve large-scale linear programs (LP). During each CG iteration, new columns are added to improve the solution of the LP. Typically, CG greedily selects one column with the most negative reduced cost, which can be improved by adding more columns at once. However, selecting all columns with negative reduced costs would lead to the addition of redundant columns that do not improve the objective value. Therefore, selecting the appropriate columns to add is still an open problem and previous machine-learning-based approaches for CG only add a constant quantity of columns per iteration due to the state-space explosion problem. To address this, we propose Fast Family Column Generation (FFCG) -- a novel reinforcement-learning-based CG that selects a variable number of columns as needed in an iteration. Specifically, we formulate the column selection problem in CG as an MDP and design a reward metric that balances both the convergence speed and the number of redundant columns. In our experiments, FFCG converges faster on the common benchmarks and reduces the number of CG iterations by 77.1% for Cutting Stock Problem (CSP) and 84.8% for Vehicle Routing Problem with Time Windows (VRPTW), and a 71.4% reduction in computing time for CSP and 84.0% for VRPTW on average compared to several state-of-the-art baselines.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2412.19066

Genre: Research Report > New Finding (0.88)

Industry: Transportation (0.55)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

Add feedback

DeepCore: Simple Fingerprint Construction for Differentiating Homologous and Piracy Models

Sun, Haifeng, Zhang, Lan, Li, Xiang-Yang

arXiv.org Artificial IntelligenceNov-1-2024

As intellectual property rights, the copyright protection of deep models is becoming increasingly important. Existing work has made many attempts at model watermarking and fingerprinting, but they have ignored homologous models trained with similar structures or training datasets. We highlight challenges in efficiently querying black-box piracy models to protect model copyrights without misidentifying homologous models. To address these challenges, we propose a novel method called DeepCore, which discovers that the classification confidence of the model is positively correlated with the distance of the predicted sample from the model decision boundary and piracy models behave more similarly at high-confidence classified sample points. Then DeepCore constructs core points far away from the decision boundary by optimizing the predicted confidence of a few sample points and leverages behavioral discrepancies between piracy and homologous models to identify piracy models. Finally, we design different model identification methods, including two similarity-based methods and a clustering-based method to identify piracy models using models' predictions of core points. Extensive experiments show the effectiveness of DeepCore in identifying various piracy models, achieving lower missed and false identification rates, and outperforming state-of-the-art methods.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2411.0038

Country: Asia > China (0.28)

Genre: Research Report > Promising Solution (0.86)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

SGSM: A Foundation-model-like Semi-generalist Sensing Model

Yang, Tianjian, Zhou, Hao, Liu, Shuo, Guo, Kaiwen, Hou, Yiwen, Du, Haohua, Liu, Zhi, Li, Xiang-Yang

arXiv.org Artificial IntelligenceJun-15-2024

Intelligent sensing systems have shown remarkable performance on many environmental perception (e.g., liquid recognition [1], soil moisture estimation [2], temperature monitoring [3]) and human activity (e.g., fall detection [4], vital sign estimation [5], location tracking [6]) tasks, becoming the core component of smart physical-related services, such as smart city and smart manufacturing. However, the current cost of designing intelligent sensing systems is relatively high since the models were designed to solve specific tasks with expensive expert knowledge [7] or a substantial amount of domain-specific data [8], one at a time. Foundation models [9] - the latest generation of artificial intelligence (AI) models - are intuitively used to generalize the model for numerous downstream tasks, which are trained on large multimodal datasets. They can solve entirely new tasks which the models are never explicitly trained for. Although the foundation models paradigm perform well in computer vision or natural language processing area, applying them in the intelligent sensing area is still challenging for two reasons. First, it is difficult to generate or access massive and diverse sensing datasets. Massive high-quality data is crucial for foundation model applications, such as computer vision [10] and natural language processing [9]. However, this requirement is often unmet in the sensing field.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.16933

Country: Asia > China (0.28)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Ju, Zeqian, Wang, Yuancheng, Shen, Kai, Tan, Xu, Xin, Detai, Yang, Dongchao, Liu, Yanqing, Leng, Yichong, Song, Kaitao, Tang, Siliang, Wu, Zhizheng, Qin, Tao, Li, Xiang-Yang, Ye, Wei, Zhang, Shikun, Bian, Jiang, He, Lei, Li, Jinyu, Zhao, Sheng

arXiv.org Artificial IntelligenceApr-23-2024

While recent large-scale text-to-speech (TTS) models have achieved significant progress, they still fall short in speech quality, similarity, and prosody. Considering speech intricately encompasses various attributes (e.g., content, prosody, timbre, and acoustic details) that pose significant challenges for generation, a natural idea is to factorize speech into individual subspaces representing different attributes and generate them individually. Motivated by it, we propose NaturalSpeech 3, a TTS system with novel factorized diffusion models to generate natural speech in a zero-shot way. Specifically, 1) we design a neural codec with factorized vector quantization (FVQ) to disentangle speech waveform into subspaces of content, prosody, timbre, and acoustic details; 2) we propose a factorized diffusion model to generate attributes in each subspace following its corresponding prompt. With this factorization design, NaturalSpeech 3 can effectively and efficiently model intricate speech with disentangled subspaces in a divide-and-conquer way. Experiments show that NaturalSpeech 3 outperforms the state-of-the-art TTS systems on quality, similarity, prosody, and intelligibility, and achieves on-par quality with human recordings. Furthermore, we achieve better performance by scaling to 1B parameters and 200K hours of training data.

large language model, machine learning, naturalspeech 3, (19 more...)

arXiv.org Artificial Intelligence

2403.031

Country: Asia > China (0.28)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.86)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)
(2 more...)

Add feedback

SoftCorrect: Error Correction with Soft Detection for Automatic Speech Recognition

Leng, Yichong, Tan, Xu, Liu, Wenjie, Song, Kaitao, Wang, Rui, Li, Xiang-Yang, Qin, Tao, Lin, Edward, Liu, Tie-Yan

arXiv.org Artificial IntelligenceDec-20-2023

Error correction in automatic speech recognition (ASR) aims to correct those incorrect words in sentences generated by ASR models. Since recent ASR models usually have low word error rate (WER), to avoid affecting originally correct tokens, error correction models should only modify incorrect words, and therefore detecting incorrect words is important for error correction. Previous works on error correction either implicitly detect error words through target-source attention or CTC (connectionist temporal classification) loss, or explicitly locate specific deletion/substitution/insertion errors. However, implicit error detection does not provide clear signal about which tokens are incorrect and explicit error detection suffers from low detection accuracy. In this paper, we propose SoftCorrect with a soft error detection mechanism to avoid the limitations of both explicit and implicit error detection. Specifically, we first detect whether a token is correct or not through a probability produced by a dedicatedly designed language model, and then design a constrained CTC loss that only duplicates the detected incorrect tokens to let the decoder focus on the correction of error tokens. Compared with implicit error detection with CTC loss, SoftCorrect provides explicit signal about which words are incorrect and thus does not need to duplicate every token but only incorrect tokens; compared with explicit error detection, SoftCorrect does not detect specific deletion/substitution/insertion errors but just leaves it to CTC loss. Experiments on AISHELL-1 and Aidatatang datasets show that SoftCorrect achieves 26.1% and 9.4% CER reduction respectively, outperforming previous works by a large margin, while still enjoying fast speed of parallel generation.

data quality, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2212.01039

Country: Asia (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Quality > Data Cleaning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Secure Transformer Inference

Yuan, Mu, Zhang, Lan, Li, Xiang-Yang

arXiv.org Artificial IntelligenceNov-14-2023

Applications of Transformer models are exploding, e.g., ChatGPT [1]. Security is critical to Transformer-based services, which determines whether applications can be scaled to privacy-sensitive areas like cloud copilot for proprietary code and documents [2]. Existing work [3, 4] studied this problem under the classic secure multi-party computing framework. Using encryption and decryption methods requires approximation of complex nonlinear layers and introduces heavy computational overhead. In this work, we propose a three-party protocol using permutation to protect both model parameters and user data without any approximation of Transformer models.

large language model, layernorm, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2312.00025

Country: Asia > China (0.15)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

PromptTTS 2: Describing and Generating Voices with Text Prompt

Leng, Yichong, Guo, Zhifang, Shen, Kai, Tan, Xu, Ju, Zeqian, Liu, Yanqing, Liu, Yufei, Yang, Dongchao, Zhang, Leying, Song, Kaitao, He, Lei, Li, Xiang-Yang, Zhao, Sheng, Qin, Tao, Bian, Jiang

arXiv.org Artificial IntelligenceOct-11-2023

Speech conveys more information than text, as the same word can be uttered in various voices to convey diverse information. Compared to traditional text-to-speech (TTS) methods relying on speech prompts (reference speech) for voice variability, using text prompts (descriptions) is more user-friendly since speech prompts can be hard to find or may not exist at all. TTS approaches based on the text prompt face two main challenges: 1) the one-to-many problem, where not all details about voice variability can be described in the text prompt, and 2) the limited availability of text prompt datasets, where vendors and large cost of data labeling are required to write text prompts for speech. In this work, we introduce PromptTTS 2 to address these challenges with a variation network to provide variability information of voice not captured by text prompts, and a prompt generation pipeline to utilize the large language models (LLM) to compose high quality text prompts. Specifically, the variation network predicts the representation extracted from the reference speech (which contains full information about voice variability) based on the text prompt representation. For the prompt generation pipeline, it generates text prompts for speech with a speech language understanding model to recognize voice attributes (e.g., gender, speed) from speech and a large language model to formulate text prompts based on the recognition results. Experiments on a large-scale (44K hours) speech dataset demonstrate that compared to the previous works, PromptTTS 2 generates voices more consistent with text prompts and supports the sampling of diverse voice variability, thereby offering users more choices on voice generation. Additionally, the prompt generation pipeline produces high-quality text prompts, eliminating the large labeling cost. The demo page of PromptTTS 2 is available online.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2309.02285

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Tight Memory-Regret Lower Bounds for Streaming Bandits

Li, Shaoang, Zhang, Lan, Wang, Junhao, Li, Xiang-Yang

arXiv.org Artificial IntelligenceJun-13-2023

In this paper, we investigate the streaming bandits problem, wherein the learner aims to minimize regret by dealing with online arriving arms and sublinear arm memory. We establish the tight worst-case regret lower bound of $\Omega \left( (TB)^{\alpha} K^{1-\alpha}\right), \alpha = 2^{B} / (2^{B+1}-1)$ for any algorithm with a time horizon $T$, number of arms $K$, and number of passes $B$. The result reveals a separation between the stochastic bandits problem in the classical centralized setting and the streaming setting with bounded arm memory. Notably, in comparison to the well-known $\Omega(\sqrt{KT})$ lower bound, an additional double logarithmic factor is unavoidable for any streaming bandits algorithm with sublinear memory permitted. Furthermore, we establish the first instance-dependent lower bound of $\Omega \left(T^{1/(B+1)} \sum_{\Delta_x>0} \frac{\mu^*}{\Delta_x}\right)$ for streaming bandits. These lower bounds are derived through a unique reduction from the regret-minimization setting to the sample complexity analysis for a sequence of $\epsilon$-optimal arms identification tasks, which maybe of independent interest. To complement the lower bound, we also provide a multi-pass algorithm that achieves a regret upper bound of $\tilde{O} \left( (TB)^{\alpha} K^{1 - \alpha}\right)$ using constant arm memory.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2306.07903

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

MLink: Linking Black-Box Models from Multiple Domains for Collaborative Inference

Yuan, Mu, Zhang, Lan, Zheng, Zimu, Zhang, Yi-Nan, Li, Xiang-Yang

arXiv.org Artificial IntelligenceJun-6-2023

The cost efficiency of model inference is critical to real-world machine learning (ML) applications, especially for delay-sensitive tasks and resource-limited devices. A typical dilemma is: in order to provide complex intelligent services (e.g. smart city), we need inference results of multiple ML models, but the cost budget (e.g. GPU memory) is not enough to run all of them. In this work, we study underlying relationships among black-box ML models and propose a novel learning task: model linking, which aims to bridge the knowledge of different black-box models by learning mappings (dubbed model links) between their output spaces. We propose the design of model links which supports linking heterogeneous black-box ML models. Also, in order to address the distribution discrepancy challenge, we present adaptation and aggregation methods of model links. Based on our proposed model links, we developed a scheduling algorithm, named MLink. Through collaborative multi-model inference enabled by model links, MLink can improve the accuracy of obtained inference results under the cost budget. We evaluated MLink on a multi-modal dataset with seven different ML models and two real-world video analytics systems with six ML models and 3,264 hours of video. Experimental results show that our proposed model links can be effectively built among various black-box models. Under the budget of GPU memory, MLink can save 66.7% inference computations while preserving 94% inference accuracy, which outperforms multi-task learning, deep reinforcement learning-based scheduler and frame filtering baselines.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2209.13883

Country:

Asia > China (0.29)
North America > United States (0.28)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation > Air (1.00)
Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback