AITopics | Sun, Zeyu

Collaborating Authors

Sun, Zeyu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Grammar-Based Code Representation: Is It a Worthy Pursuit for LLMs?

Liang, Qingyuan, Zhang, Zhao, Sun, Zeyu, Lin, Zheng, Luo, Qi, Xiao, Yueyi, Chen, Yizhou, Zhang, Yuqun, Zhang, Haotian, Zhang, Lu, Chen, Bin, Xiong, Yingfei

arXiv.org Artificial IntelligenceMar-7-2025

Grammar serves as a cornerstone in programming languages and software engineering, providing frameworks to define the syntactic space and program structure. Existing research demonstrates the effectiveness of grammar-based code representations in small-scale models, showing their ability to reduce syntax errors and enhance performance. However, as language models scale to the billion level or beyond, syntax-level errors become rare, making it unclear whether grammar information still provides performance benefits. To explore this, we develop a series of billion-scale GrammarCoder models, incorporating grammar rules in the code generation process. Experiments on HumanEval (+) and MBPP (+) demonstrate a notable improvement in code generation accuracy. Further analysis shows that grammar-based representations enhance LLMs' ability to discern subtle code differences, reducing semantic errors caused by minor variations. These findings suggest that grammar-based code representations remain valuable even in billion-scale models, not only by maintaining syntax correctness but also by improving semantic differentiation.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2503.05507

Country: Europe > Portugal (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

Yuan 2.0-M32: Mixture of Experts with Attention Router

Wu, Shaohua, Luo, Jiangang, Chen, Xi, Li, Lingjun, Zhao, Xudong, Yu, Tong, Wang, Chao, Wang, Yue, Wang, Fei, Qiao, Weixu, He, Houbo, Zhang, Zeru, Sun, Zeyu, Mao, Junxiong, Shen, Chong

arXiv.org Artificial IntelligenceMay-29-2024

Yuan 2.0-M32, with a similar base architecture as Yuan-2.0 2B, uses a mixture-of-experts architecture with 32 experts of which 2 experts are active. A new router network, Attention Router, is proposed and adopted for a more efficient selection of experts, which improves the accuracy compared to the model with classical router network. Yuan 2.0-M32 is trained with 2000B tokens from scratch, and the training computation consumption is only 9.25% of a dense model at the same parameter scale. Yuan 2.0-M32 demonstrates competitive capability on coding, math, and various domains of expertise, with only 3.7B active parameters of 40B in total, and 7.4 GFlops forward computation per token, both of which are only 1/19 of Llama3-70B. Yuan 2.0-M32 surpass Llama3-70B on MATH and ARC-Challenge benchmark, with accuracy of 55.89 and 95.8 respectively. The models and source codes of Yuan 2.0-M32 are released at Github1.

large language model, machine learning, yuan 2, (14 more...)

arXiv.org Artificial Intelligence

2405.17976

Genre: Research Report (0.50)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

A Large-scale Empirical Study on Improving the Fairness of Deep Learning Models

Yang, Junjie, Jiang, Jiajun, Sun, Zeyu, Chen, Junjie

arXiv.org Artificial IntelligenceJan-8-2024

Fairness has been a critical issue that affects the adoption of deep learning models in real practice. To improve model fairness, many existing methods have been proposed and evaluated to be effective in their own contexts. However, there is still no systematic evaluation among them for a comprehensive comparison under the same context, which makes it hard to understand the performance distinction among them, hindering the research progress and practical adoption of them. To fill this gap, this paper endeavours to conduct the first large-scale empirical study to comprehensively compare the performance of existing state-of-the-art fairness improving techniques. Specifically, we target the widely-used application scenario of image classification, and utilized three different datasets and five commonly-used performance metrics to assess in total 13 methods from diverse categories. Our findings reveal substantial variations in the performance of each method across different datasets and sensitive attributes, indicating over-fitting on specific datasets by many existing methods. Furthermore, different fairness evaluation metrics, due to their distinct focuses, yield significantly different assessment results. Overall, we observe that pre-processing methods and in-processing methods outperform post-processing methods, with pre-processing methods exhibiting the best performance. Our empirical study offers comprehensive recommendations for enhancing fairness in deep learning models. We approach the problem from multiple dimensions, aiming to provide a uniform evaluation platform and inspire researchers to explore more effective fairness solutions via a set of implications.

artificial intelligence, fairness, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2401.03695

Country:

Asia > China (0.28)
North America > United States (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Promising Solution (0.93)

Industry:

Law (0.67)
Information Technology (0.67)
Banking & Finance (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Channel-Feedback-Free Transmission for Downlink FD-RAN: A Radio Map based Complex-valued Precoding Network Approach

Zhao, Jiwei, Chen, Jiacheng, Sun, Zeyu, Shi, Yuhang, Zhou, Haibo, Xuemin, null, Shen, null

arXiv.org Artificial IntelligenceNov-29-2023

As the demand for high-quality services proliferates, an innovative network architecture, the fully-decoupled RAN (FD-RAN), has emerged for more flexible spectrum resource utilization and lower network costs. However, with the decoupling of uplink base stations and downlink base stations in FD-RAN, the traditional transmission mechanism, which relies on real-time channel feedback, is not suitable as the receiver is not able to feedback accurate and timely channel state information to the transmitter. This paper proposes a novel transmission scheme without relying on physical layer channel feedback. Specifically, we design a radio map based complex-valued precoding network~(RMCPNet) model, which outputs the base station precoding based on user location. RMCPNet comprises multiple subnets, with each subnet responsible for extracting unique modal features from diverse input modalities. Furthermore, the multi-modal embeddings derived from these distinct subnets are integrated within the information fusion layer, culminating in a unified representation. We also develop a specific RMCPNet training algorithm that employs the negative spectral efficiency as the loss function. We evaluate the performance of the proposed scheme on the public DeepMIMO dataset and show that RMCPNet can achieve 16\% and 76\% performance improvements over the conventional real-valued neural network and statistical codebook approach, respectively.

artificial intelligence, information, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2312.02184

Country:

Asia > China (0.29)
North America > United States (0.28)
North America > Canada > Ontario (0.14)

Genre: Research Report (0.64)

Industry: Telecommunications (1.00)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Minimum-Risk Recalibration of Classifiers

Sun, Zeyu, Song, Dogyoon, Hero, Alfred

arXiv.org Artificial IntelligenceMay-18-2023

Recalibrating probabilistic classifiers is vital for enhancing the reliability and accuracy of predictive models. Despite the development of numerous recalibration algorithms, there is still a lack of a comprehensive theory that integrates calibration and sharpness (which is essential for maintaining predictive power). In this paper, we introduce the concept of minimum-risk recalibration within the framework of mean-squared-error (MSE) decomposition, offering a principled approach for evaluating and recalibrating probabilistic classifiers. Using this framework, we analyze the uniform-mass binning (UMB) recalibration method and establish a finite-sample risk upper bound of order $\tilde{O}(B/n + 1/B^2)$ where $B$ is the number of bins and $n$ is the sample size. By balancing calibration and sharpness, we further determine that the optimal number of bins for UMB scales with $n^{1/3}$, resulting in a risk bound of approximately $O(n^{-2/3})$. Additionally, we tackle the challenge of label shift by proposing a two-stage approach that adjusts the recalibration function using limited labeled data from the target domain. Our results show that transferring a calibrated classifier requires significantly fewer target samples compared to recalibrating from scratch. We validate our theoretical findings through numerical simulations, which confirm the tightness of the proposed bounds, the optimal number of bins, and the effectiveness of label shift adaptation.

data mining, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2305.10886

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
(2 more...)

Add feedback

Performative Federated Learning: A Solution to Model-Dependent and Heterogeneous Distribution Shifts

Jin, Kun, Yin, Tongxin, Chen, Zhongzhu, Sun, Zeyu, Zhang, Xueru, Liu, Yang, Liu, Mingyan

arXiv.org Artificial IntelligenceMay-8-2023

Traditional learning problems typically assume data distributions to be static. For applications such as face recognition, this is largely true and designing algorithms under such an assumption in general does not impact learning efficacy. This, however, is not true in many other domains. In some cases, there may be a natural evolution and shift in the distribution, e.g., in weather and climate data, in which case new data need to be acquired periodically and the algorithm re-trained to remain up to date. In other cases, the distribution shift is the result of the very learning outcome, when individuals respond to the algorithmic decisions they are subjected to. For instance, when users with certain accents perceive larger-than-acceptable errors from a speech recognition software and therefore stop using it, this can directly impact the type of speech samples collected by the software used for training the next generation of the product. Another example is "gaming the algorithm", where users through honest or dishonest means attempt to improve critical features so as to obtain a favorable decision by the algorithm (e.g., in loan approvals or job applications). This again can directly lead to the distributional change in features and label that the algorithm relies on for decision making.

artificial intelligence, machine learning, model-dependent and heterogeneous distribution shift, (12 more...)

arXiv.org Artificial Intelligence

2305.0509

Country: North America > United States (1.00)

Genre: Research Report (0.82)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.54)

Add feedback

Incorporating Polar Field Data for Improved Solar Flare Prediction

Aktukmak, Mehmet, Sun, Zeyu, Bobra, Monica, Gombosi, Tamas, Manchester, Ward B., Chen, Yang, Hero, Alfred

arXiv.org Machine LearningDec-3-2022

In this paper, we consider incorporating data associated with the sun's north and south polar field strengths to improve solar flare prediction performance using machine learning models. When used to supplement local data from active regions on the photospheric magnetic field of the sun, the polar field data provides global information to the predictor. While such global features have been previously proposed for predicting the next solar cycle's intensity, in this paper we propose using them to help classify individual solar flares. We conduct experiments using HMI data employing four different machine learning algorithms that can exploit polar field information. Additionally, we propose a novel probabilistic mixture of experts model that can simply and effectively incorporate polar field data and provide on-par prediction performance with state-of-the-art solar flare prediction algorithms such as the Recurrent Neural Network (RNN). Our experimental results indicate the usefulness of the polar field data for solar flare prediction, which can improve Heidke Skill Score (HSS2) by as much as 10.1%.

artificial intelligence, machine learning, polar field data, (13 more...)

arXiv.org Machine Learning

2212.0173

Country: North America > United States > Michigan (0.29)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Lyra: A Benchmark for Turducken-Style Code Generation

Liang, Qingyuan, Sun, Zeyu, Zhu, Qihao, Zhang, Wenjie, Yu, Lian, Xiong, Yingfei, Zhang, Lu

arXiv.org Artificial IntelligenceAug-27-2021

Code generation is crucial to reduce manual software development efforts. Recently, neural techniques have been used to generate source code automatically. While promising, these approaches are evaluated on tasks for generating code in single programming languages. However, in actual development, one programming language is often embedded in another. For example, SQL statements are often embedded as strings in base programming languages such as Python and Java, and JavaScript programs are often embedded in sever-side programming languages, such as PHP, Java, and Python. We call this a turducken-style programming. In this paper, we define a new code generation task: given a natural language comment, this task aims to generate a program in a base language with an embedded language. To our knowledge, this is the first turducken-style code generation task. For this task, we present Lyra: a dataset in Python with embedded SQL. This dataset contains 2,000 carefully annotated database manipulation programs from real usage projects. Each program is paired with both a Chinese comment and an English comment. In our experiment, we adopted Transformer, a state-of-the-art technique, as the baseline. In the best setting, Transformer achieves 0.5% and 1.5% AST exact matching accuracy using Chinese and English comments, respectively. Therefore, we believe that Lyra provides a new challenge for code generation.

clojure, cobol, dataset, (22 more...)

arXiv.org Artificial Intelligence

2108.12144

Country:

Europe (0.69)
North America > United States (0.69)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

A Syntax-Guided Edit Decoder for Neural Program Repair

Zhu, Qihao, Sun, Zeyu, Xiao, Yuan-an, Zhang, Wenjie, Yuan, Kang, Xiong, Yingfei, Zhang, Lu

arXiv.org Artificial IntelligenceJun-15-2021

Automated Program Repair (APR) helps improve the efficiency of software development and maintenance. Recent APR techniques use deep learning, particularly the encoder-decoder architecture, to generate patches. Though existing DL-based APR approaches have proposed different encoder architectures, the decoder remains to be the standard one, which generates a sequence of tokens one by one to replace the faulty statement. This decoder has multiple limitations: 1) allowing to generate syntactically incorrect programs, 2) inefficiently representing small edits, and 3) not being able to generate project-specific identifiers. In this paper, we propose Recoder, a syntax-guided edit decoder with placeholder generation. Recoder is novel in multiple aspects: 1) Recoder generates edits rather than modified code, allowing efficient representation of small edits; 2) Recoder is syntax-guided, with the novel provider/decider architecture to ensure the syntactic correctness of the patched program and accurate generation; 3) Recoder generates placeholders that could be instantiated as project-specific identifiers later. We conduct experiments to evaluate Recoder on 395 bugs from Defects4J v1.2, 420 additional bugs from Defects4J v2.0, 297 bugs from IntroClassJava and 40 bugs from QuixBugs. Our results show that Recoder repairs 53 bugs on Defects4J v1.2, which achieves 26.2% (11 bugs) improvement over the previous state-of-the-art approach for single-hunk bugs (TBar). Importantly, to our knowledge, Recoder is the first DL-based APR approach that has outperformed the traditional APR approaches on this benchmark.

deep learning, proceedings, software engineering, (17 more...)

arXiv.org Artificial Intelligence

2106.08253

Country:

Asia (0.68)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Dynamic Labeling for Unlabeled Graph Neural Networks

Sun, Zeyu, Zhang, Wenjie, Mou, Lili, Zhu, Qihao, Xiong, Yingfei, Zhang, Lu

arXiv.org Artificial IntelligenceFeb-22-2021

Existing graph neural networks (GNNs) largely rely on node embeddings, which represent a node as a vector by its identity, type, or content. However, graphs with unlabeled nodes widely exist in real-world applications (e.g., anonymized social networks). Previous GNNs either assign random labels to nodes (which introduces artefacts to the GNN) or assign one embedding to all nodes (which fails to distinguish one node from another). In this paper, we analyze the limitation of existing approaches in two types of classification tasks, graph classification and node classification. Inspired by our analysis, we propose two techniques, Dynamic Labeling and Preferential Dynamic Labeling, that satisfy desired properties statistically or asymptotically for each type of the task. Experimental results show that we achieve high performance in various graph-related tasks.

artificial intelligence, graph, neural network, (18 more...)

arXiv.org Artificial Intelligence

2102.11485

Country: North America > Canada > Alberta (0.14)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback