AITopics | Kobyzev, Ivan

Collaborating Authors

Kobyzev, Ivan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

KronA: Parameter Efficient Tuning with Kronecker Adapter

Edalati, Ali, Tahaei, Marzieh, Kobyzev, Ivan, Nia, Vahid Partovi, Clark, James J., Rezagholizadeh, Mehdi

arXiv.org Artificial IntelligenceDec-20-2022

Fine-tuning a Pre-trained Language Model (PLM) on a specific downstream task has been a well-known paradigm in Natural Language Processing. However, with the ever-growing size of PLMs, training the entire model on several downstream tasks becomes very expensive and resource-hungry. Recently, different Parameter Efficient Tuning (PET) techniques are proposed to improve the efficiency of fine-tuning PLMs. One popular category of PET methods is the low-rank adaptation methods which insert learnable truncated SVD modules into the original model either sequentially or in parallel. However, low-rank decomposition suffers from limited representation power. In this work, we address this problem using the Kronecker product instead of the low-rank representation. We introduce KronA, a Kronecker product-based adapter module for efficient fine-tuning of Transformer-based PLMs. We apply the proposed methods for fine-tuning T5 on the GLUE benchmark to show that incorporating the Kronecker-based modules can outperform state-of-the-art PET methods.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2212.1065

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Improving Generalization of Pre-trained Language Models via Stochastic Weight Averaging

Lu, Peng, Kobyzev, Ivan, Rezagholizadeh, Mehdi, Rashid, Ahmad, Ghodsi, Ali, Langlais, Philippe

arXiv.org Artificial IntelligenceDec-16-2022

Knowledge Distillation (KD) is a commonly used technique for improving the generalization of compact Pre-trained Language Models (PLMs) on downstream tasks. However, such methods impose the additional burden of training a separate teacher model for every new dataset. Alternatively, one may directly work on the improvement of the optimization procedure of the compact model toward better generalization. Recent works observe that the flatness of the local minimum correlates well with better generalization. In this work, we adapt Stochastic Weight Averaging (SWA), a method encouraging convergence to a flatter minimum, to fine-tuning PLMs. We conduct extensive experiments on various NLP tasks (text classification, question answering, and generation) and different model architectures and demonstrate that our adaptation improves the generalization without extra computation cost. Moreover, we observe that this simple optimization technique is able to outperform the state-of-the-art KD methods for compact models.

computational linguistic, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2212.05956

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

Continuation KD: Improved Knowledge Distillation through the Lens of Continuation Optimization

Jafari, Aref, Kobyzev, Ivan, Rezagholizadeh, Mehdi, Poupart, Pascal, Ghodsi, Ali

arXiv.org Artificial IntelligenceDec-12-2022

Knowledge Distillation (KD) has been extensively used for natural language understanding (NLU) tasks to improve a small model's (a student) generalization by transferring the knowledge from a larger model (a teacher). Although KD methods achieve state-of-the-art performance in numerous settings, they suffer from several problems limiting their performance. It is shown in the literature that the capacity gap between the teacher and the student networks can make KD ineffective. Additionally, existing KD techniques do not mitigate the noise in the teacher's output: modeling the noisy behaviour of the teacher can distract the student from learning more useful features. We propose a new KD method that addresses these problems and facilitates the training compared to previous techniques. Inspired by continuation optimization, we design a training procedure that optimizes the highly non-convex KD objective by starting with the smoothed version of this objective and making it more complex as the training proceeds. Our method (Continuation-KD) achieves state-of-the-art performance across various compact architectures on NLU (GLUE benchmark) and computer vision tasks (CIFAR-10 and CIFAR-100).

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2212.05998

Genre: Research Report (0.82)

Industry: Education (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Equivariant Discrete Normalizing Flows

Bose, Avishek Joey, Kobyzev, Ivan

arXiv.org Artificial IntelligenceOct-16-2021

At its core, generative modeling seeks to uncover the underlying factors that give rise to observed data that can often be modelled as the natural symmetries that manifest themselves through invariances and equivariances to certain transformations laws. However, current approaches are couched in the formalism of continuous normalizing flows that require the construction of equivariant vector fields -- inhibiting their simple application to conventional higher dimensional generative modelling domains like natural images. In this paper we focus on building equivariant normalizing flows using discrete layers. We first theoretically prove the existence of an equivariant map for compact groups whose actions are on compact spaces. We further introduce two new equivariant flows: $G$-coupling Flows and $G$-Residual Flows that elevate classical Coupling and Residual Flows with equivariant maps to a prescribed group $G$. Our construction of $G$-Residual Flows are also universal, in the sense that we prove an $G$-equivariant diffeomorphism can be exactly mapped by a $G$-residual flow. Finally, we complement our theoretical insights with experiments -- for the first time -- on image datasets like CIFAR-10 and show $G$-Equivariant Discrete Normalizing flows lead to increased data efficiency, faster convergence, and improved likelihood estimates.

artificial intelligence, diffeomorphism, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2110.08649

Country: North America (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Normalizing Flows: Introduction and Ideas

Kobyzev, Ivan, Prince, Simon, Brubaker, Marcus A.

arXiv.org Machine LearningAug-25-2019

Normalizing Flows are generative models which produce tractable distributions where both sampling and density evaluation can be efficient and exact. The goal of this survey article is to give a coherent and comprehensive review of the literature around the construction and use of Normalizing Flows for distribution learning. We aim to provide context and explanation of the models, review current state-of-the-art literature, and identify open questions and promising future directions.

arxiv, deep learning, upstream oil & gas, (22 more...)

arXiv.org Machine Learning

1908.09257

Country: North America > United States > California (0.68)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Energy > Oil & Gas > Upstream (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.94)
(2 more...)

Add feedback

Tails of Triangular Flows

Jaini, Priyank, Kobyzev, Ivan, Brubaker, Marcus, Yu, Yaoliang

arXiv.org Machine LearningJul-9-2019

Triangular maps are a construct in probability theory that allows the transformation of any source density to any target density. We consider flow based models that learn these triangular transformations which we call triangular flows and study the properties of these triangular flows with the goal of capturing heavy tailed target distributions. In one dimension, we prove that the density quantile functions of the source and target density can characterize properties of the increasing push-forward transformation and show that no Lipschitz continuous increasing map can transform a light-tailed source to a heavy-tailed target density. We further precisely relate the asymptotic behavior of these density quantile functions with the existence of certain function moments of distributions. These results allow us to give a precise asymptotic rate at which an increasing transformation must grow to capture the tail properties of a target given the source distribution. In the multivariate case, we show that any increasing triangular map transforming a light-tailed source density to a heavy-tailed target density must have all eigenvalues of the Jacobian to be unbounded. Our analysis suggests the importance of source distribution in capturing heavy-tailed distributions and we discuss the implications for flow based models.

artificial intelligence, banking & finance, transformation, (19 more...)

arXiv.org Machine Learning

1907.04481

Country: North America (0.14)

Genre: Research Report (0.64)

Industry: Banking & Finance (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Relational Representation Learning for Dynamic (Knowledge) Graphs: A Survey

Kazemi, Seyed Mehran, Goel, Rishab, Jain, Kshitij, Kobyzev, Ivan, Sethi, Akshay, Forsyth, Peter, Poupart, Pascal

arXiv.org Machine LearningMay-27-2019

Graphs arise naturally in many real-world applications including social networks, recommender systems, ontologies, biology, and computational finance. Traditionally, machine learning models for graphs have been mostly designed for static graphs. However, many applications involve evolving graphs. This introduces important challenges for learning and inference since nodes, attributes, and edges change over time. In this survey, we review the recent advances in representation learning for dynamic graphs, including dynamic knowledge graphs. We describe existing models from an encoder-decoder perspective, categorize these encoders and decoders based on the techniques they employ, and analyze the approaches in each category. We also review several prominent applications and widely used datasets, and highlight directions for future research.

deep learning, graph, neural network, (21 more...)

arXiv.org Machine Learning

1905.11485

Country:

North America > United States > California (0.28)
North America > United States > Wisconsin > Dane County > Madison (0.14)

Genre: Overview (1.00)

Industry:

Education (0.92)
Information Technology > Services (0.66)
Health & Medicine > Pharmaceuticals & Biotechnology (0.45)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(5 more...)

Add feedback