AITopics | Wang, Ziteng

Collaborating Authors

Wang, Ziteng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing

Wang, Ziteng, Chen, Jianfei, Zhu, Jun

arXiv.org Artificial IntelligenceDec-19-2024

Sparsely activated Mixture-of-Experts (MoE) models are widely adopted to scale up model capacity without increasing the computation budget. However, vanilla TopK routers are trained in a discontinuous, non-differentiable way, limiting their performance and scalability. To address this issue, we propose ReMoE, a fully differentiable MoE architecture that offers a simple yet effective drop-in replacement for the conventional TopK+Softmax routing, utilizing ReLU as the router instead. We further propose methods to regulate the router's sparsity while balancing the load among experts. ReMoE's continuous nature enables efficient dynamic allocation of computation across tokens and layers, while also exhibiting domain specialization. Our experiments demonstrate that ReMoE consistently outperforms vanilla TopK-routed MoE across various model sizes, expert counts, and levels of granularity. Furthermore, ReMoE exhibits superior scalability with respect to the number of experts, surpassing traditional MoE architectures. The implementation based on Megatron-LM is available at https://github.com/thu-ml/ReMoE. Transformer models (Vaswani, 2017) consistently improve performance as the number of parameters increases (Kaplan et al., 2020). However, scaling these models is constrained by computation resources. Sparsely activated Mixture-of-Experts (MoE) (Shazeer et al., 2017) mitigates this challenge by employing a sparse architecture that selectively activates a subset of parameters during both training and inference.

machine learning, natural language, remoe, (17 more...)

arXiv.org Artificial Intelligence

2412.14711

Country: South America (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.35)

Add feedback

FACET: Fast and Accurate Event-Based Eye Tracking Using Ellipse Modeling for Extended Reality

Ding, Junyuan, Wang, Ziteng, Gao, Chang, Liu, Min, Chen, Qinyu

arXiv.org Artificial IntelligenceSep-23-2024

Eye tracking is a key technology for gaze-based interactions in Extended Reality (XR), but traditional frame-based systems struggle to meet XR's demands for high accuracy, low latency, and power efficiency. Event cameras offer a promising alternative due to their high temporal resolution and low power consumption. In this paper, we present FACET (Fast and Accurate Event-based Eye Tracking), an end-to-end neural network that directly outputs pupil ellipse parameters from event data, optimized for real-time XR applications. The ellipse output can be directly used in subsequent ellipse-based pupil trackers. We enhance the EV-Eye dataset by expanding annotated data and converting original mask labels to ellipse-based annotations to train the model. Besides, a novel trigonometric loss is adopted to address angle discontinuities and a fast causal event volume event representation method is put forward. On the enhanced EV-Eye test set, FACET achieves an average pupil center error of 0.20 pixels and an inference time of 0.53 ms, reducing pixel error and inference time by 1.6$\times$ and 1.8$\times$ compared to the prior art, EV-Eye, with 4.4$\times$ and 11.7$\times$ less parameters and arithmetic operations. The code is available at https://github.com/DeanJY/FACET.

artificial intelligence, machine learning, proceedings, (18 more...)

arXiv.org Artificial Intelligence

2409.15584

Country: Europe > Netherlands > South Holland (0.14)

Genre: Research Report (0.82)

Industry:

Information Technology (0.68)
Health & Medicine > Therapeutic Area (0.34)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

Efficient Backpropagation with Variance-Controlled Adaptive Sampling

Wang, Ziteng, Chen, Jianfei, Zhu, Jun

arXiv.org Artificial IntelligenceFeb-27-2024

Sampling-based algorithms, which eliminate ''unimportant'' computations during forward and/or back propagation (BP), offer potential solutions to accelerate neural network training. However, since sampling introduces approximations to training, such algorithms may not consistently maintain accuracy across various tasks. In this work, we introduce a variance-controlled adaptive sampling (VCAS) method designed to accelerate BP. VCAS computes an unbiased stochastic gradient with fine-grained layerwise importance sampling in data dimension for activation gradient calculation and leverage score sampling in token dimension for weight gradient calculation. To preserve accuracy, we control the additional variance by learning the sample ratio jointly with model parameters during training. We assessed VCAS on multiple fine-tuning and pre-training tasks in both vision and natural language domains. On all the tasks, VCAS can preserve the original training loss trajectory and validation accuracy with an up to 73.87% FLOPs reduction of BP and 49.58% FLOPs reduction of the whole training process. The implementation is available at https://github.com/thu-ml/VCAS .

artificial intelligence, machine learning, variance, (15 more...)

arXiv.org Artificial Intelligence

2402.17227

Country: Europe > Netherlands (0.14)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.42)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Fast Second-Order Stochastic Backpropagation for Variational Inference

Fan, Kai, Wang, Ziteng, Beck, Jeff, Kwok, James, Heller, Katherine

arXiv.org Machine LearningMar-28-2017

We propose a second-order (Hessian or Hessian-free) based optimization method for variational inference inspired by Gaussian backpropagation, and argue that quasi-Newton optimization can be developed as well. This is accomplished by generalizing the gradient computation in stochastic backpropagation via a reparametrization trick with lower complexity. As an illustrative example, we apply this approach to the problems of Bayesian logistic regression and variational auto-encoder (VAE). Additionally, we compute bounds on the estimator variance of intractable expectations for the family of Lipschitz continuous function. Our method is practical, scalable and model free. We demonstrate our method on several real-world datasets and provide comparisons with other stochastic gradient methods to show substantial enhancement in convergence rates.

algorithm, deep learning, neural network, (19 more...)

arXiv.org Machine Learning

1509.02866

Country: Asia (0.14)

Genre:

Research Report > New Finding (0.35)
Research Report > Experimental Study (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.82)

Add feedback

Fast Second Order Stochastic Backpropagation for Variational Inference

Fan, Kai, Wang, Ziteng, Beck, Jeff, Kwok, James, Heller, Katherine A.

Neural Information Processing SystemsDec-31-2015

algorithm, deep learning, neural network, (18 more...)

Neural Information Processing Systems

Country: Asia (0.14)

Genre:

Research Report > New Finding (0.35)
Research Report > Experimental Study (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.82)

Add feedback

Differentially Private Data Releasing for Smooth Queries with Synthetic Database Output

Jin, Chi, Wang, Ziteng, Huang, Junliang, Zhong, Yiqiao, Wang, Liwei

arXiv.org Machine LearningJan-6-2014

We consider accurately answering smooth queries while preserving differential privacy. A query is said to be $K$-smooth if it is specified by a function defined on $[-1,1]^d$ whose partial derivatives up to order $K$ are all bounded. We develop an $\epsilon$-differentially private mechanism for the class of $K$-smooth queries. The major advantage of the algorithm is that it outputs a synthetic database. In real applications, a synthetic database output is appealing. Our mechanism achieves an accuracy of $O (n^{-\frac{K}{2d+K}}/\epsilon )$, and runs in polynomial time. We also generalize the mechanism to preserve $(\epsilon, \delta)$-differential privacy with slightly improved accuracy. Extensive experiments on benchmark datasets demonstrate that the mechanisms have good accuracy and are efficient.

artificial intelligence, health & medicine, mechanism, (19 more...)

arXiv.org Machine Learning

1401.0987

Country: North America > United States > California (0.14)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Efficient Algorithm for Privately Releasing Smooth Queries

Wang, Ziteng, Fan, Kai, Zhang, Jiaqi, Wang, Liwei

Neural Information Processing SystemsDec-31-2013

We study differentially private mechanisms for answering \emph{smooth} queries on databases consisting of data points in $\mathbb{R}^d$. A $K$-smooth query is specified by a function whose partial derivatives up to order $K$ are all bounded. We develop an $\epsilon$-differentially private mechanism which for the class of $K$-smooth queries has accuracy $O (\left(\frac{1}{n}\right)^{\frac{K}{2d+K}}/\epsilon)$. The mechanism first outputs a summary of the database. To obtain an answer of a query, the user runs a public evaluation algorithm which contains no information of the database. Outputting the summary runs in time $O(n^{1+\frac{d}{2d+K}})$, and the evaluation algorithm for answering a query runs in time $\tilde O (n^{\frac{d+2+\frac{2d}{K}}{2d+K}} )$. Our mechanism is based on $L_{\infty}$-approximation of (transformed) smooth functions by low degree even trigonometric polynomials with small and efficiently computable coefficients.

artificial intelligence, machine learning, query, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback