AITopics | Kuznetsov, Vitaly

Collaborating Authors

Kuznetsov, Vitaly

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Multi-Class Deep Boosting

Kuznetsov, Vitaly, Mohri, Mehryar, Syed, Umar

Neural Information Processing SystemsFeb-14-2020, 09:58:19 GMT

Our algorithms can use as a base classifier set a family of deep decision trees or other rich or complex families and yet benefit from strong generalization guarantees. We give new data-dependent learning bounds for convex ensembles in the multi-class classification setting expressed in terms of the Rademacher complexities of the sub-families composing the base classifier set, and the mixture weight assigned to each sub-family. These bounds are finer than existing ones both thanks to an improved dependency on the number of classes and, more crucially, by virtue of a more favorable complexity term expressed as an average of the Rademacher complexities based on the ensemble's mixture weights. We introduce and discuss several new multi-class ensemble algorithms benefiting from these guarantees, prove positive results for the H-consistency of several of them, and report the results of experiments showing that their performance compares favorably with that of multi-class versions of AdaBoost and Logistic Regression and their L1-regularized counterparts. Papers published at the Neural Information Processing Systems Conference.

artificial intelligence, machine learning, rademacher complexity, (4 more...)

Neural Information Processing Systems

Genre: Research Report (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Learning Theory and Algorithms for Forecasting Non-stationary Time Series

Kuznetsov, Vitaly, Mohri, Mehryar

Neural Information Processing SystemsFeb-14-2020, 06:27:58 GMT

Our learning guarantees are expressed in terms of a data-dependent measure of sequential complexity and a discrepancy measure that can be estimated from data under some mild assumptions. We use our learning bounds to devise new algorithms for non-stationary time series forecasting for which we report some preliminary experimental results. Papers published at the Neural Information Processing Systems Conference.

artificial intelligence, machine learning, theory and algorithms, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.40)

Add feedback

AdaNet: A Scalable and Flexible Framework for Automatically Learning Ensembles

Weill, Charles, Gonzalvo, Javier, Kuznetsov, Vitaly, Yang, Scott, Yak, Scott, Mazzawi, Hanna, Hotaj, Eugen, Jerfel, Ghassen, Macko, Vladimir, Adlam, Ben, Mohri, Mehryar, Cortes, Corinna

arXiv.org Machine LearningApr-30-2019

AdaNet is a lightweight TensorFlow-based (Abadi et al., 2015) framework for automatically learning high-quality ensembles with minimal expert intervention. Our framework is inspired by the AdaNet algorithm (Cortes et al., 2017) which learns the structure of a neural network as an ensemble of subnetworks. We designed it to: (1) integrate with the existing TensorFlow ecosystem, (2) offer sensible default search spaces to perform well on novel datasets, (3) present a flexible API to utilize expert information when available, and (4) efficiently accelerate training with distributed CPU, GPU, and TPU hardware.

deep learning, ensemble, neural network, (19 more...)

arXiv.org Machine Learning

1905.0008

Country: North America > United States > New York > New York County > New York City (0.29)

Genre: Research Report (0.50)

Industry: Information Technology > Services (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Efficient Gradient Computation for Structured Output Learning with Rational and Tropical Losses

Cortes, Corinna, Kuznetsov, Vitaly, Mohri, Mehryar, Storcheus, Dmitry, Yang, Scott

Neural Information Processing SystemsDec-31-2018

Many structured prediction problems admit a natural loss function for evaluation such as the edit-distance or $n$-gram loss. However, existing learning algorithms are typically designed to optimize alternative objectives such as the cross-entropy. This is because a na\"{i}ve implementation of the natural loss functions often results in intractable gradient computations. In this paper, we design efficient gradient computation algorithms for two broad families of structured prediction loss functions: rational and tropical losses. These families include as special cases the $n$-gram loss, the edit-distance loss, and many other loss functions commonly used in natural language processing and computational biology tasks that are based on sequence similarity measures. Our algorithms make use of weighted automata and graph operations over appropriate semirings to design efficient solutions. They facilitate efficient gradient computation and hence enable one to train learning models such as neural networks with complex structured losses.

algorithm, deep learning, neural network, (22 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Efficient Gradient Computation for Structured Output Learning with Rational and Tropical Losses

Cortes, Corinna, Kuznetsov, Vitaly, Mohri, Mehryar, Storcheus, Dmitry, Yang, Scott

Neural Information Processing SystemsDec-31-2018

Many structured prediction problems admit a natural loss function for evaluation such as the edit-distance or n-gram loss. However, existing learning algorithms are typically designed to optimize alternative objectives such as the cross-entropy. This is because a naïve implementation of the natural loss functions often results in intractable gradient computations. In this paper, we design efficient gradient computation algorithmsfor two broad families of structured prediction loss functions: rational and tropical losses. These families include as special cases the n-gram loss, the edit-distance loss, and many other loss functions commonly used in natural language processing and computational biology tasks that are based on sequence similarity measures. Our algorithms make use of weighted automata and graph operations over appropriate semirings to design efficient solutions. They facilitate efficient gradient computation and hence enable one to train learning models such as neural networks with complex structured losses.

algorithm, deep learning, neural network, (22 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Foundations of Sequence-to-Sequence Modeling for Time Series

Kuznetsov, Vitaly, Mariet, Zelda

arXiv.org Artificial IntelligenceMay-9-2018

The availability of large amounts of time series data, paired with the performance of deep-learning algorithms on a broad class of problems, has recently led to significant interest in the use of sequence-to-sequence models for time series forecasting. We provide the first theoretical analysis of this time series forecasting framework. We include a comparison of sequence-to-sequence modeling to classical time series models, and as such our theory can serve as a quantitative guide for practitioners choosing between different modeling methodologies.

deep learning, neural network, time series, (17 more...)

arXiv.org Artificial Intelligence

1805.03714

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Online Non-Additive Path Learning under Full and Partial Information

Cortes, Corinna, Kuznetsov, Vitaly, Mohri, Mehryar, Rahmanian, Holakou, Warmuth, Manfred K.

arXiv.org Machine LearningApr-19-2018

We consider the online path learning problem in a graph with non-additive gains/losses. Various settings of full information, semi-bandit, and full bandit are explored. We give an efficient implementation of EXP3 algorithm for the full bandit setting with any (non-additive) gain. Then, focusing on the large family of non-additive count-based gains, we construct an intermediate graph which has equivalent gains that are additive. By operating on this intermediate graph, we are able to use algorithms like Component Hedge and ComBand for the first time for non-additive gains. Finally, we apply our methods to the important application of ensemble structured prediction.

algorithm, artificial intelligence, natural language, (19 more...)

arXiv.org Machine Learning

1804.06518

Country:

North America > United States (0.14)
Europe > France (0.14)

Genre: Research Report (0.64)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Discriminative State Space Models

Kuznetsov, Vitaly, Mohri, Mehryar

Neural Information Processing SystemsDec-31-2017

In this work, we introduce and study Discriminative State-Space Models (DSSMs) . We provide the precise mathematical definition of this class of models in Section 2 . Roughly speaking, a DSSM follows the same general structure as in ( 1) and consists of a state predictor g and an observation predictor h . However, no assumption is made about the form of the stochastic process used to generate observations. This family of models includes existing generative models and other state-based discriminative models (e.g. RNNs) as special cases, but also consists of some novel algorithmic solutions explored in this paper.

artificial intelligence, machine learning, stochastic process, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Structured Prediction Theory Based on Factor Graph Complexity

Cortes, Corinna, Kuznetsov, Vitaly, Mohri, Mehryar, Yang, Scott

Neural Information Processing SystemsDec-31-2016

We present a general theoretical analysis of structured prediction with a series of new results. We give new data-dependent margin guarantees for structured prediction for a very wide family of loss functions and a general family of hypotheses, with an arbitrary factor graph decomposition. These are the tightest margin bounds known for both standard multi-class and general structured prediction problems. Our guarantees are expressed in terms of a data-dependent complexity measure, \emph{factor graph complexity}, which we show can be estimated from data and bounded in terms of familiar quantities for several commonly used hypothesis sets, and a sparsity measure for features and graphs. Our proof techniques include generalizations of Talagrand's contraction lemma that can be of independent interest. We further extend our theory by leveraging the principle of Voted Risk Minimization (VRM) and show that learning is possible even with complex factor graphs. We present new learning bounds for this advanced setting, which we use to devise two new algorithms, \emph{Voted Conditional Random Field} (VCRF) and \emph{Voted Structured Boosting} (StructBoost). These algorithms can make use of complex features and factor graphs and yet benefit from favorable learning guarantees. We also report the results of experiments with VCRF on several datasets to validate our theory.

artificial intelligence, inductive learning, structured prediction, (17 more...)

Neural Information Processing Systems

Country: Europe > Spain (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

Structured Prediction Theory Based on Factor Graph Complexity

Cortes, Corinna, Mohri, Mehryar, Kuznetsov, Vitaly, Yang, Scott

arXiv.org Machine LearningDec-1-2016

We present a general theoretical analysis of structured prediction with a series of new results. We give new data-dependent margin guarantees for structured prediction for a very wide family of loss functions and a general family of hypotheses, with an arbitrary factor graph decomposition. These are the tightest margin bounds known for both standard multi-class and general structured prediction problems. Our guarantees are expressed in terms of a data-dependent complexity measure, factor graph complexity, which we show can be estimated from data and bounded in terms of familiar quantities. We further extend our theory by leveraging the principle of Voted Risk Minimization (VRM) and show that learning is possible even with complex factor graphs. We present new learning bounds for this advanced setting, which we use to design two new algorithms, Voted Conditional Random Field (VCRF) and Voted Structured Boosting (StructBoost). These algorithms can make use of complex features and factor graphs and yet benefit from favorable learning guarantees. We also report the results of experiments with VCRF on several datasets to validate our theory.

artificial intelligence, inductive learning, structured prediction, (17 more...)

arXiv.org Machine Learning

1605.06443

Country: Europe > Spain (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback