Goto

Collaborating Authors

 Asia


Non-convex Finite-Sum Optimization Via SCSG Methods

Neural Information Processing Systems

We develop a class of algorithms, as variants of the stochastically controlled stochastic gradient (SCSG) methods, for the smooth nonconvex finite-sum optimization problem. Only assuming the smoothness of each component, the complexity of SCSG to reach a stationary point with $E \|\nabla f(x)\|^{2}\le \epsilon$ is $O(\min\{\epsilon^{-5/3}, \epsilon^{-1}n^{2/3}\})$, which strictly outperforms the stochastic gradient descent. Moreover, SCSG is never worse than the state-of-the-art methods based on variance reduction and it significantly outperforms them when the target accuracy is low. A similar acceleration is also achieved when the functions satisfy the Polyak-Lojasiewicz condition. Empirical experiments demonstrate that SCSG outperforms stochastic gradient methods on training multi-layers neural networks in terms of both training and validation loss.


Online control of the false discovery rate with decaying memory

Neural Information Processing Systems

In the online multiple testing problem, p-values corresponding to different null hypotheses are presented one by one, and the decision of whether to reject a hypothesis must be made immediately, after which the next p-value is presented. Alpha-investing algorithms to control the false discovery rate were first formulated by Foster and Stine and have since been generalized and applied to various settings, varying from quality-preserving databases for science to multiple A/B tests for internet commerce. This paper improves the class of generalized alpha-investing algorithms (GAI) in four ways: (a) we show how to uniformly improve the power of the entire class of GAI procedures under independence by awarding more alpha-wealth for each rejection, giving a near win-win resolution to a dilemma raised by Javanmard and Montanari, (b) we demonstrate how to incorporate prior weights to indicate domain knowledge of which hypotheses are likely to be null or non-null, (c) we allow for differing penalties for false discoveries to indicate that some hypotheses may be more meaningful/important than others, (d) we define a new quantity called the \emph{decaying memory false discovery rate, or $\memfdr$} that may be more meaningful for applications with an explicit time component, using a discount factor to incrementally forget past decisions and alleviate some potential problems that we describe and name ``piggybacking'' and ``alpha-death''. Our GAI++ algorithms incorporate all four generalizations (a, b, c, d) simulatenously, and reduce to more powerful variants of earlier algorithms when the weights and decay are all set to unity.



WATCH: Wall-climbing robot swarms crawl US Navy warships as China's fleet surges

FOX News

Navy robots from Gecko Robotics will inspect U.S. warships in $71 million effort to reduce maintenance delays as only 60% of fleet remains operational amid China's naval expansion.


An AI image generator for non-English speakers

AIHub

Although text-to-image generation is rapidly advancing, these AI models are mostly English-centric. Researchers at the University of Amsterdam Faculty of Science have created NeoBabel, an AI image generator that can work in six different languages. By making all elements of their research open source, anyone can build on the model and help push inclusive AI research. When you generate an image with AI, the results are often better when your prompt is in English. This is because many AI models are English at their core: if you use another language, your prompt is translated into English before the image is created.


Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA

Neural Information Processing Systems

Nonlinear independent component analysis (ICA) provides an appealing framework for unsupervised feature learning, but the models proposed so far are not identifiable. Here, we first propose a new intuitive principle of unsupervised deep learning from time series which uses the nonstationary structure of the data. Our learning principle, time-contrastive learning (TCL), finds a representation which allows optimal discrimination of time segments (windows). Surprisingly, we show how TCL can be related to a nonlinear ICA model, when ICA is redefined to include temporal nonstationarities. In particular, we show that TCL combined with linear ICA estimates the nonlinear ICA model up to point-wise transformations of the sources, and this solution is unique --- thus providing the first identifiability result for nonlinear ICA which is rigorous, constructive, as well as very general.


How Invisalign Became the World's Biggest User of 3D Printers

WIRED

Joe Hogan, Align Technology's plastics-nerd CEO, says you shouldn't eat with your aligners and that you don't need to wear your retainers every night. Joe Hogan sees a lot of smiles. When people ask him where he works, he responds with "Align Technology," which inevitably prompts the follow up, "What's that?" After months, sometimes years, the discrete rival to braces promises to give people smiles they will want to show off. Hogan gets a look at them all. And he's eager to see more. Align is embarking on its biggest manufacturing overhaul since it was founded by two Stanford Graduate School of Business classmates 29 years ago. The company is preparing to begin directly 3D printing the aligners at the core of its business, ditching what Hogan describes as a longer, more wasteful process that involves making molds. A successful transition could lower costs and make treatment more affordable in the long run, bringing Invisalign to more customers and boosting Align's profits. It also, according to Hogan, would entrench Align as the world's biggest user of 3D printers .


Minimax Optimal Alternating Minimization for Kernel Nonparametric Tensor Learning

Neural Information Processing Systems

We investigate the statistical performance and computational efficiency of the alternating minimization procedure for nonparametric tensor learning. Tensor modeling has been widely used for capturing the higher order relations between multimodal data sources. In addition to a linear model, a nonlinear tensor model has been received much attention recently because of its high flexibility. We consider an alternating minimization procedure for a general nonlinear model where the true function consists of components in a reproducing kernel Hilbert space (RKHS). In this paper, we show that the alternating minimization method achieves linear convergence as an optimization algorithm and that the generalization error of the resultant estimator yields the minimax optimality. We apply our algorithm to some multitask learning problems and show that the method actually shows favorable performances.


Unsupervised Domain Adaptation with Residual Transfer Networks

Neural Information Processing Systems

The recent success of deep neural networks relies on massive amounts of labeled data. For a target task where labeled data is unavailable, domain adaptation can transfer a learner from a different source domain. In this paper, we propose a new approach to domain adaptation in deep networks that can jointly learn adaptive classifiers and transferable features from labeled data in the source domain and unlabeled data in the target domain. We relax a shared-classifier assumption made by previous methods and assume that the source classifier and target classifier differ by a residual function. We enable classifier adaptation by plugging several layers into deep network to explicitly learn the residual function with reference to the target classifier. We fuse features of multiple layers with tensor product and embed them into reproducing kernel Hilbert spaces to match distributions for feature adaptation. The adaptation can be achieved in most feed-forward models by extending them with new residual layers and loss functions, which can be trained efficiently via back-propagation. Empirical evidence shows that the new approach outperforms state of the art methods on standard domain adaptation benchmarks.


How to Set Up Your Own NAS Server for Backups and Content Streaming

WIRED

The app reads your email inbox and your meeting calendar, then gives you a short audio summary. It can help you spend less time scrolling, but of course, there are privacy drawbacks to consider.