AITopics | Chen, Lin

Plotting

Chen, Lin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Region Comparison Network for Interpretable Few-shot Image Classification

Xue, Zhiyu, Duan, Lixin, Li, Wen, Chen, Lin, Luo, Jiebo

arXiv.org Artificial IntelligenceSep-8-2020

While deep learning has been successfully applied to many real-world computer vision tasks, training robust classifiers usually requires a large amount of well-labeled data. However, the annotation is often expensive and time-consuming. Few-shot image classification has thus been proposed to effectively use only a limited number of labeled examples to train models for new classes. Recent works based on transferable metric learning methods have achieved promising classification performance through learning the similarity between the features of samples from the query and support sets. However, rare of them explicitly considers the model interpretability, which can actually be revealed during the training phase. For that, in this work, we propose a metric learning based method named Region Comparison Network (RCN), which is able to reveal how few-shot learning works as in a neural network as well as to find out specific regions that are related to each other in images coming from the query and support sets. Moreover, we also present a visualization strategy named Region Activation Mapping (RAM) to intuitively explain what our method has learned by visualizing intermediate variables in our network. We also present a new way to generalize the interpretability from the level of tasks to categories, which can also be viewed as a method to find the prototypical parts for supporting the final decision of our RCN. Extensive experiments on four benchmark datasets clearly show the effectiveness of our method over existing baselines.

deep learning, neural network, support sample, (18 more...)

arXiv.org Artificial Intelligence

2009.03558

Genre: Research Report (0.83)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Meta Learning in the Continuous Time Limit

Xu, Ruitu, Chen, Lin, Karbasi, Amin

arXiv.org Machine LearningJul-7-2020

In this paper, we establish the ordinary differential equation (ODE) that underlies the training dynamics of Model-Agnostic Meta-Learning (MAML). Our continuous-time limit view of the process eliminates the influence of the manually chosen step size of gradient descent and includes the existing gradient descent training algorithm as a special case that results from a specific discretization. We show that the MAML ODE enjoys a linear convergence rate to an approximate stationary point of the MAML loss function for strongly convex task losses, even when the corresponding MAML loss is non-convex. Moreover, through the analysis of the MAML ODE, we propose a new BI-MAML training algorithm that significantly reduces the computational burden associated with existing MAML training methods. To complement our theoretical findings, we perform empirical experiments to showcase the superiority of our proposed methods with respect to the existing work.

artificial intelligence, assumption 3, machine learning, (17 more...)

arXiv.org Machine Learning

2006.10921

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)

Add feedback

Online Continuous Submodular Maximization: From Full-Information to Bandit Feedback

Zhang, Mingrui, Chen, Lin, Hassani, Hamed, Karbasi, Amin

Neural Information Processing SystemsMar-19-2020, 00:17:31 GMT

In this paper, we propose three online algorithms for submodular maximization. The first one, Mono-Frank-Wolfe, reduces the number of per-function gradient evaluations from $T {1/2}$ [Chen2018Online] and $T {3/2}$ [chen2018projection] to 1, and achieves a $(1-1/e)$-regret bound of $O(T {4/5})$. The second one, Bandit-Frank-Wolfe, is the first bandit algorithm for continuous DR-submodular maximization, which achieves a $(1-1/e)$-regret bound of $O(T {8/9})$. Finally, we extend Bandit-Frank-Wolfe to a bandit algorithm for discrete submodular maximization, Responsive-Frank-Wolfe, which attains a $(1-1/e)$-regret bound of $O(T {8/9})$ in the responsive bandit setting. Papers published at the Neural Information Processing Systems Conference.

artificial intelligence, machine learning, submodular maximization, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Locality-Sensitive Hashing for f-Divergences: Mutual Information Loss and Beyond

Chen, Lin, Esfandiari, Hossein, Fu, Thomas, Mirrokni, Vahab S.

arXiv.org Machine LearningOct-27-2019

Computing approximate nearest neighbors in high dimensional spaces is a central problem in large-scale data mining with a wide range of applications in machine learning and data science. A popular and effective technique in computing nearest neighbors approximately is the locality-sensitive hashing (LSH) scheme. In this paper, we aim to develop LSH schemes for distance functions that measure the distance between two probability distributions, particularly for f-divergences as well as a generalization to capture mutual information loss. First, we provide a general framework to design LHS schemes for f-divergence distance functions and develop LSH schemes for the generalized Jensen-Shannon divergence and triangular discrimination in this framework. We show a two-sided approximation result for approximation of the generalized Jensen-Shannon divergence by the Hellinger distance, which may be of independent interest. Next, we show a general method of reducing the problem of designing an LSH scheme for a Krein kernel (which can be expressed as the difference of two positive definite kernels) to the problem of maximum inner product search. We exemplify this method by applying it to the mutual information loss, due to its several important applications such as model compression.

artificial intelligence, natural language, null 2, (16 more...)

arXiv.org Machine Learning

1910.12414

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Online Continuous Submodular Maximization: From Full-Information to Bandit Feedback

Zhang, Mingrui, Chen, Lin, Hassani, Hamed, Karbasi, Amin

arXiv.org Machine LearningOct-27-2019

In this paper, we propose three online algorithms for submodular maximisation. The first one, Mono-Frank-Wolfe, reduces the number of per-function gradient evaluations from $T^{1/2}$ [Chen2018Online] and $T^{3/2}$ [chen2018projection] to 1, and achieves a $(1-1/e)$-regret bound of $O(T^{4/5})$. The second one, Bandit-Frank-Wolfe, is the first bandit algorithm for continuous DR-submodular maximization, which achieves a $(1-1/e)$-regret bound of $O(T^{8/9})$. Finally, we extend Bandit-Frank-Wolfe to a bandit algorithm for discrete submodular maximization, Responsive-Frank-Wolfe, which attains a $(1-1/e)$-regret bound of $O(T^{8/9})$ in the responsive bandit setting.

artificial intelligence, big data, null, (18 more...)

arXiv.org Machine Learning

1910.12424

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.86)

Add feedback

Minimax Regret of Switching-Constrained Online Convex Optimization: No Phase Transition

Chen, Lin, Yu, Qian, Lawrence, Hannah, Karbasi, Amin

arXiv.org Machine LearningOct-23-2019

We study the problem of switching-constrained online convex optimization (OCO), where the player has a limited number of opportunities to change her action. While the discrete analog of this online learning task has been studied extensively, previous work in the continuous setting has neither established the minimax rate nor algorithmically achieved it. We here show that $ T $-round switching-constrained OCO with fewer than $ K $ switches has a minimax regret of $ \Theta(\frac{T}{\sqrt{K}}) $. In particular, it is at least $ \frac{T}{\sqrt{2K}} $ for one dimension and at least $ \frac{T}{\sqrt{K}} $ for higher dimensions. The lower bound in higher dimensions is attained by an orthogonal subspace argument. The minimax analysis in one dimension is more involved. To establish the one-dimensional result, we introduce the fugal game relaxation, whose minimax regret lower bounds that of switching-constrained OCO. We show that the minimax regret of the fugal game is at least $ \frac{T}{\sqrt{2K}} $ and thereby establish the minimax lower bound in one dimension. We next show that a mini-batching algorithm provides an $ O(\frac{T}{\sqrt{K}}) $ upper bound, and therefore we conclude that the minimax regret of switching-constrained OCO is $ \Theta(\frac{T}{\sqrt{K}}) $ for any $K$. This is in sharp contrast to its discrete counterpart, the switching-constrained prediction-from-experts problem, which exhibits a phase transition in minimax regret between the low-switching and high-switching regimes. In the case of bandit feedback, we first determine a novel linear (in $T$) minimax regret for bandit linear optimization against the strongly adaptive adversary of OCO, implying that a slightly weaker adversary is appropriate. We also establish the minimax regret of switching-constrained bandit convex optimization in dimension $n>2$ to be $\tilde{\Theta}(\frac{T}{\sqrt{K}})$.

artificial intelligence, game theory, null 2, (18 more...)

arXiv.org Machine Learning

1910.10873

Country: Europe (0.27)

Genre: Research Report (0.49)

Industry:

Leisure & Entertainment (0.46)
Education > Educational Setting (0.34)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)

Add feedback

Generative Imaging and Image Processing via Generative Encoder

Chen, Lin, Yang, Haizhao

arXiv.org Machine LearningMay-23-2019

This paper introduces a novel generative encoder (GE) model for generative imaging and image processing with applications in compressed sensing and imaging, image compression, denoising, inpainting, deblurring, and super-resolution. The GE model consists of a pre-training phase and a solving phase. In the pre-training phase, we separately train two deep neural networks: a generative adversarial network (GAN) with a generator $\G$ that captures the data distribution of a given image set, and an auto-encoder (AE) network with an encoder $\EN$ that compresses images following the estimated distribution by GAN. In the solving phase, given a noisy image $x=\mathcal{P}(x^*)$, where $x^*$ is the target unknown image, $\mathcal{P}$ is an operator adding an addictive, or multiplicative, or convolutional noise, or equivalently given such an image $x$ in the compressed domain, i.e., given $m=\EN(x)$, we solve the optimization problem \[ z^*=\underset{z}{\mathrm{argmin}} \|\EN(\G(z))-m\|_2^2+\lambda\|z\|_2^2 \] to recover the image $x^*$ in a generative way via $\hat{x}:=\G(z^*)\approx x^*$, where $\lambda>0$ is a hyperparameter. The GE model unifies the generative capacity of GANs and the stability of AEs in an optimization framework above instead of stacking GANs and AEs into a single network or combining their loss functions into one as in existing literature. Numerical experiments show that the proposed model outperforms several state-of-the-art algorithms.

deep learning, neural network, reconstruction, (21 more...)

arXiv.org Machine Learning

1905.133

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Categorical Feature Compression via Submodular Optimization

Bateni, MohammadHossein, Chen, Lin, Esfandiari, Hossein, Fu, Thomas, Mirrokni, Vahab S., Rostamizadeh, Afshin

arXiv.org Artificial IntelligenceApr-30-2019

In the era of big data, learning from categorical features with very large vocabularies (e.g., 28 million for the Criteo click prediction dataset) has become a practical challenge for machine learning researchers and practitioners. We design a highly-scalable vocabulary compression algorithm that seeks to maximize the mutual information between the compressed categorical feature and the target binary labels and we furthermore show that its solution is guaranteed to be within a $1-1/e \approx 63\%$ factor of the global optimal solution. To achieve this, we introduce a novel re-parametrization of the mutual information objective, which we prove is submodular, and design a data structure to query the submodular function in amortized $O(\log n )$ time (where $n$ is the input vocabulary size). Our complete algorithm is shown to operate in $O(n \log n )$ time. Additionally, we design a distributed implementation in which the query data structure is decomposed across $O(k)$ machines such that each machine only requires $O(\frac n k)$ space, while still preserving the approximation guarantee and using only logarithmic rounds of computation. We also provide analysis of simple alternative heuristic compression methods to demonstrate they cannot achieve any approximation guarantee. Using the large-scale Criteo learning task, we demonstrate better performance in retaining mutual information and also verify competitive learning performance compared to other baseline methods.

algorithm, artificial intelligence, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

1904.13389

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

Quantized Frank-Wolfe: Communication-Efficient Distributed Optimization

Zhang, Mingrui, Chen, Lin, Mokhtari, Aryan, Hassani, Hamed, Karbasi, Amin

arXiv.org Machine LearningFeb-17-2019

How can we efficiently mitigate the overhead of gradient communications in distributed optimization? This problem is at the heart of training scalable machine learning models and has been mainly studied in the unconstrained setting. In this paper, we propose Quantized Frank-Wolfe (QFW), the first projection-free and communication-efficient algorithm for solving constrained optimization problems at scale. We consider both convex and non-convex objective functions, expressed as a finite-sum or more generally a stochastic optimization problem, and provide strong theoretical guarantees on the convergence rate of QFW. This is done by proposing quantization schemes that efficiently compress gradients while controlling the variance introduced during this process. Finally, we empirically validate the efficiency of QFW in terms of communication and the quality of returned solution against natural baselines.

artificial intelligence, gradient, optimization problem, (18 more...)

arXiv.org Machine Learning

1902.06332

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Black Box Submodular Maximization: Discrete and Continuous Settings

Chen, Lin, Zhang, Mingrui, Hassani, Hamed, Karbasi, Amin

arXiv.org Machine LearningJan-27-2019

In this paper, we consider the problem of black box continuous submodular maximization where we only have access to the function values and no information about the derivatives is provided. For a monotone and continuous DR-submodular function, and subject to a bounded convex body constraint, we propose Black-box Continuous Greedy, a derivative-free algorithm that provably achieves the tight $[(1-1/e)OPT-\epsilon]$ approximation guarantee with $O(d/\epsilon^3)$ function evaluations. We then extend our result to the stochastic setting where function values are subject to stochastic zero-mean noise. It is through this stochastic generalization that we revisit the discrete submodular maximization problem and use the multi-linear extension as a bridge between discrete and continuous settings. Finally, we extensively evaluate the performance of our algorithm on continuous and discrete submodular objective functions using both synthetic and real data.

air transportation, algorithm, optimization problem, (17 more...)

arXiv.org Machine Learning

1901.09515

Country: North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine (0.93)
Transportation > Air (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Add feedback