AITopics | Zhu, Jiang

Collaborating Authors

Zhu, Jiang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unbiased Evaluation of Large Language Models from a Causal Perspective

Chen, Meilin, Tian, Jian, Ma, Liang, Xie, Di, Chen, Weijie, Zhu, Jiang

arXiv.org Artificial IntelligenceFeb-10-2025

Benchmark contamination has become a significant concern in the LLM evaluation community. Previous Agents-as-an-Evaluator address this issue by involving agents in the generation of questions. Despite their success, the biases in Agents-as-an-Evaluator methods remain largely unexplored. In this paper, we present a theoretical formulation of evaluation bias, providing valuable insights into designing unbiased evaluation protocols. Furthermore, we identify two type of bias in Agents-as-an-Evaluator through carefully designed probing tasks on a minimal Agents-as-an-Evaluator setup. To address these issues, we propose the Unbiased Evaluator, an evaluation protocol that delivers a more comprehensive, unbiased, and interpretable assessment of LLMs.Extensive experiments reveal significant room for improvement in current LLMs. Additionally, we demonstrate that the Unbiased Evaluator not only offers strong evidence of benchmark contamination but also provides interpretable evaluation results.

intervention, large language model, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2502.06655

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

DM-SBL: Channel Estimation under Structured Interference

Wang, Yifan, Yu, Chengjie, Zhu, Jiang, Wang, Fangyong, Tu, Xingbin, Wei, Yan, Qu, Fengzhong

arXiv.org Artificial IntelligenceDec-7-2024

Channel estimation is a fundamental task in communication systems and is critical for effective demodulation. While most works deal with a simple scenario where the measurements are corrupted by the additive white Gaussian noise (AWGN), this work addresses the more challenging scenario where both AWGN and structured interference coexist. Such conditions arise, for example, when a sonar/radar transmitter and a communication receiver operate simultaneously within the same bandwidth. To ensure accurate channel estimation in these scenarios, the sparsity of the channel in the delay domain and the complicate structure of the interference are jointly exploited. Firstly, the score of the structured interference is learned via a neural network based on the diffusion model (DM), while the channel prior is modeled as a Gaussian distribution, with its variance controlling channel sparsity, similar to the setup of the sparse Bayesian learning (SBL). Then, two efficient posterior sampling methods are proposed to jointly estimate the sparse channel and the interference. Nuisance parameters, such as the variance of the prior are estimated via the expectation maximization (EM) algorithm. The proposed method is termed as DM based SBL (DM-SBL). Numerical simulations demonstrate that DM-SBL significantly outperforms conventional approaches that deal with the AWGN scenario, particularly under low signal-to-interference ratio (SIR) conditions. Beyond channel estimation, DM-SBL also shows promise for addressing other linear inverse problems involving structured interference.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.05582

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment

Dou, Shihan, Zhou, Enyu, Liu, Yan, Gao, Songyang, Zhao, Jun, Shen, Wei, Zhou, Yuhao, Xi, Zhiheng, Wang, Xiao, Fan, Xiaoran, Pu, Shiliang, Zhu, Jiang, Zheng, Rui, Gui, Tao, Zhang, Qi, Huang, Xuanjing

arXiv.org Artificial IntelligenceDec-18-2023

Supervised fine-tuning (SFT) is a crucial step for large language models (LLMs), enabling them to align with human instructions and enhance their capabilities in downstream tasks. When the models are required to align with a broader range of downstream tasks, or there is a desire to notably improve the performance on a specific task, a substantial increase in fine-tuning data often emerges as the solution. However, we find that large-scale increases in instruction data can disrupt the world knowledge previously stored in the LLMs, i.e., world knowledge forgetting. In this paper, we introduce LoRAMoE to address the above challenge. The LoRAMoE is a plugin version of Mixture of Experts (MoE). The plugin form ensures the integrity of world knowledge by freezing the backbone model during the training phase. We then propose the use of localized balancing constraints to coordinate parts of experts for task utilization, meanwhile enabling other experts to fully leverage the world knowledge stored in the models. Experimental results demonstrate that LoRAMoE can reasonably coordinate experts based on data type during inference, and even dramatically increasing instruction data does not result in knowledge forgetting. Moreover, LoRAMoE provides additional benefits for the performance of downstream tasks, indicating the potential of our approach for multi-task learning.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2312.09979

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Image Classifier Based Generative Method for Planar Antenna Design

Zhong, Yang, Dou, Weiping, Cohen, Andrew, Bisharat, Dia'a, Tian, Yuandong, Zhu, Jiang, Liu, Qing Huo

arXiv.org Artificial IntelligenceDec-16-2023

Designing antennas in the wireless consumer electronic industry is a technical challenge that requires not only many efforts in simulation and measurement, but also experience in developing initial prototypes. The antenna space and the surrounding environment keep changing within various products. A well-designed antenna that meets the target of one product may not work with another even though they might come from the same production line. Selecting an initial antenna type, a monopole, loop or inverted F, to start with is critical. In many cases, it depends on who is the antenna engineer working on this project. For a same project and given the same specifications, different antenna engineers might surprisingly come out unalike types of antenna designs just because of their personalized experience and taste. In this era of rapid product iterations, there is high demand of creative antenna designs and it is hard to find antenna expertise. Therefore, in this paper, we will present a workflow of proposing good prototypes that antenna design experience is not a mandatory requirement. Antenna optimization have been widely studied and well presented in previous work, such as the trust region method Koziel and Unnsteinsson [2018], particle swarm method Jin and Rahmat-Samii [2007], evolutionary strategies Liu et al. [2014] and many types of machine learning methods Sharma et al. [2020], Koziel et al. [2021], Nan et al. [2021], This project is sponsored by Meta Internship Program.

dimension, evolutionary algorithm, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2401.06149

Country: North America > United States (0.30)

Genre: Research Report (0.40)

Industry:

Semiconductors & Electronics (0.34)
Information Technology > Hardware (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Sample-efficient Surrogate Model for Frequency Response of Linear PDEs using Self-Attentive Complex Polynomials

Cohen, Andrew, Dou, Weiping, Zhu, Jiang, Koziel, Slawomir, Renner, Peter, Mattsson, Jan-Ove, Yang, Xiaomeng, Chen, Beidi, Stone, Kevin, Tian, Yuandong

arXiv.org Artificial IntelligenceFeb-2-2023

Linear Partial Differential Equations (PDEs) govern the spatial-temporal dynamics of physical systems that are essential to building modern technology. When working with linear PDEs, designing a physical system for a specific outcome is difficult and costly due to slow and expensive explicit simulation of PDEs and the highly nonlinear relationship between a system's configuration and its behavior. In this work, we prove a parametric form that certain physical quantities in the Fourier domain must obey in linear PDEs, named the CZP (Constant-Zeros-Poles) framework. Applying CZP to antenna design, an industrial application using linear PDEs (i.e., Maxwell's equations), we derive a sample-efficient parametric surrogate model that directly predicts its scattering coefficients without explicit numerical PDE simulation. Combined with a novel image-based antenna representation and an attention-based neural network architecture, CZP outperforms baselines by 10% to 25% in terms of test loss and also is able to find 2D antenna designs verifiable by commercial software with $33\%$ greater success than baselines, when coupled with sequential search techniques like reinforcement learning.

artificial intelligence, machine learning, representation, (12 more...)

arXiv.org Artificial Intelligence

2301.02747

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback