mt 1
MemVLT: Vision-LanguageTrackingwithAdaptive Memory-basedPrompts
As an extension of traditional visual single object tracking (SOT) task [2, 3, 4], VLT can harness the complementary advantages of multiple modalities. Therefore, vision-language trackers (VLTs) have the potential to achieve more promising tracking performance, which has recently attracted widespreadattention[5,6,7,8].
Ordered Memory
Yikang Shen, Shawn Tan, Arian Hosseini, Zhouhan Lin, Alessandro Sordoni, Aaron C. Courville
We also introduce a new Gated Recursive Cell to compose lower levelrepresentations into higher level representation. We demonstrate that our modelachieves strong performance on the logical inference task (Bowman et al., 2015)andtheListOps(NangiaandBowman,2018)task. Wecanalsointerpretthemodelto retrieve the induced tree structure, and find that these induced structures alignwith the ground truth.
- North America > Canada > Quebec (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
Scaling Law for Stochastic Gradient Descent in Quadratically Parameterized Linear Regression
Ding, Shihong, Zhang, Haihan, Zhao, Hanzhen, Fang, Cong
In machine learning, the scaling law describes how the model performance improves with the model and data size scaling up. From a learning theory perspective, this class of results establishes upper and lower generalization bounds for a specific learning algorithm. Here, the exact algorithm running using a specific model parameterization often offers a crucial implicit regularization effect, leading to good generalization. To characterize the scaling law, previous theoretical studies mainly focus on linear models, whereas, feature learning, a notable process that contributes to the remarkable empirical success of neural networks, is regretfully vacant. This paper studies the scaling law over a linear regression with the model being quadratically parameterized. We consider infinitely dimensional data and slope ground truth, both signals exhibiting certain power-law decay rates. We study convergence rates for Stochastic Gradient Descent and demonstrate the learning rates for variables will automatically adapt to the ground truth. As a result, in the canonical linear regression, we provide explicit separations for generalization curves between SGD with and without feature learning, and the information-theoretical lower bound that is agnostic to parametrization method and the algorithm. Our analysis for decaying ground truth provides a new characterization for the learning dynamic of the model.
Stochastic Nonsmooth Convex Optimization with Heavy-Tailed Noises: High-Probability Bound, In-Expectation Rate and Initial Distance Adaptation
Recently, several studies consider the stochastic optimization problem but in a heavy-tailed noise regime, i.e., the difference between the stochastic gradient and the true gradient is assumed to have a finite $p$-th moment (say being upper bounded by $\sigma^{p}$ for some $\sigma\geq0$) where $p\in(1,2]$, which not only generalizes the traditional finite variance assumption ($p=2$) but also has been observed in practice for several different tasks. Under this challenging assumption, lots of new progress has been made for either convex or nonconvex problems, however, most of which only consider smooth objectives. In contrast, people have not fully explored and well understood this problem when functions are nonsmooth. This paper aims to fill this crucial gap by providing a comprehensive analysis of stochastic nonsmooth convex optimization with heavy-tailed noises. We revisit a simple clipping-based algorithm, whereas, which is only proved to converge in expectation but under the additional strong convexity assumption. Under appropriate choices of parameters, for both convex and strongly convex functions, we not only establish the first high-probability rates but also give refined in-expectation bounds compared with existing works. Remarkably, all of our results are optimal (or nearly optimal up to logarithmic factors) with respect to the time horizon $T$ even when $T$ is unknown in advance. Additionally, we show how to make the algorithm parameter-free with respect to $\sigma$, in other words, the algorithm can still guarantee convergence without any prior knowledge of $\sigma$. Furthermore, an initial distance adaptive convergence rate is provided if $\sigma$ is assumed to be known.
Non-IID Quantum Federated Learning with One-shot Communication Complexity
Federated learning refers to the task of machine learning based on decentralized data from multiple clients with secured data privacy. Recent studies show that quantum algorithms can be exploited to boost its performance. However, when the clients' data are not independent and identically distributed (IID), the performance of conventional federated algorithms is known to deteriorate. In this work, we explore the non-IID issue in quantum federated learning with both theoretical and numerical analysis. We further prove that a global quantum channel can be exactly decomposed into local channels trained by each client with the help of local density estimators. This observation leads to a general framework for quantum federated learning on non-IID data with one-shot communication complexity. Numerical simulations show that the proposed algorithm outperforms the conventional ones significantly under non-IID settings.
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan (0.04)
- Asia > China > Beijing > Beijing (0.04)
A Set of Recommendations for Assessing Human-Machine Parity in Language Translation
Läubli, Samuel, Castilho, Sheila, Neubig, Graham, Sennrich, Rico, Shen, Qinlan, Toral, Antonio
The quality of machine translation has increased remarkably over the past years, to the degree that it was found to be indistinguishable from professional human translation in a number of empirical investigations. We reassess Hassan et al.'s 2018 investigation into Chinese to English news translation, showing that the finding of human-machine parity was owed to weaknesses in the evaluation design - which is currently considered best practice in the field. We show that the professional human translations contained significantly fewer errors, and that perceived quality in human evaluation depends on the choice of raters, the availability of linguistic context, and the creation of reference translations. Our results call for revisiting current best practices to assess strong machine translation systems in general and human-machine parity in particular, for which we offer a set of recommendations based on our empirical findings.
- Asia > China > Hong Kong (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Asia > China > Shandong Province > Qingdao (0.04)
- (19 more...)
A Set of Recommendations for Assessing Human–Machine Parity in Language Translation
Läubli, Samuel (University of Zurich) | Castilho, Sheila (Dublin City University) | Neubig, Graham (Carnegie Mellon University) | Sennrich, Rico (University of Edinburgh) | Shen, Qinlan (Carnegie Mellon University) | Toral, Antonio (University of Groningen)
The quality of machine translation has increased remarkably over the past years, to the degree that it was found to be indistinguishable from professional human translation in a number of empirical investigations. We reassess Hassan et al.'s 2018 investigation into Chinese to English news translation, showing that the finding of human-machine parity was owed to weaknesses in the evaluation design--which is currently considered best practice in the field. We show that the professional human translations contained significantly fewer errors, and that perceived quality in human evaluation depends on the choice of raters, the availability of linguistic context, and the creation of reference translations. Our results call for revisiting current best practices to assess strong machine translation systems in general and human-machine parity in particular, for which we offer a set of recommendations based on our empirical findings.
- Asia > China > Hong Kong (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Asia > China > Shandong Province > Qingdao (0.04)
- (19 more...)
A Variational Time Series Feature Extractor for Action Prediction
Chaveroche, Maxime, Malaisé, Adrien, Colas, Francis, Charpillet, François, Ivaldi, Serena
The problem of recognizing actions or activities has been widely addressed in the computer vision research community: it consists in the classification of a fully or partially observed action, typically observed through cameras or external motion capture [2]. In robotics, recognizing the human activity is paramount for enabling a proper interaction and providing assistance to the human: an assistive device or prosthetics could switch control modes depending on the current human activity (e.g., walking or sitting) [3], [4]; a mobile robot may adapt its navigation depending on the prediction of the human motion [5]. More generally, prediction is important to provide the robot with anticipation capabilities [6]. In collaborative robotics applications in manufacturing, such as in assembly lines, recognizing the current activity of the operator is necessary for ergonomics evaluations [7] and for the optimization of the robot actions. However, there are two critical issues that prevent the direct application of existing techniques in such scenarios. The first issue is the availability of external sensing devices (cameras or motion captures) that poses constraints on the application for many tasks and application scenarios.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > France > Hauts-de-France > Oise > Compiègne (0.04)
- Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.04)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Federated Learning with Non-IID Data
Zhao, Yue, Li, Meng, Lai, Liangzhen, Suda, Naveen, Civin, Damon, Chandra, Vikas
Federated learning enables resource-constrained edge compute devices, such as mobile phones and IoT devices, to learn a shared model for prediction, while keeping the training data local. This decentralized approach to train models provides privacy, security, regulatory and economic benefits. In this work, we focus on the statistical challenge of federated learning when local data is non-IID. We first show that the accuracy of federated learning reduces significantly, by up to 55% for neural networks trained for highly skewed non-IID data, where each client device trains only on a single class of data. We further show that this accuracy reduction can be explained by the weight divergence, which can be quantified by the earth mover's distance (EMD) between the distribution over classes on each device and the population distribution. As a solution, we propose a strategy to improve training on non-IID data by creating a small subset of data which is globally shared between all the edge devices. Experiments show that accuracy can be increased by 30% for the CIFAR-10 dataset with only 5% globally shared data.
- North America > Canada > Ontario > Toronto (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > Santa Clara County > San Jose (0.04)