AITopics | Pokle, Ashwini

Collaborating Authors

Pokle, Ashwini

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Consistency Models Made Easy

Geng, Zhengyang, Pokle, Ashwini, Luo, William, Lin, Justin, Kolter, J. Zico

arXiv.org Artificial IntelligenceJun-20-2024

Consistency models (CMs) are an emerging class of generative models that offer faster sampling than traditional diffusion models. CMs enforce that all points along a sampling trajectory are mapped to the same initial point. But this target leads to resource-intensive training: for example, as of 2024, training a SoTA CM on CIFAR-10 takes one week on 8 GPUs. In this work, we propose an alternative scheme for training CMs, vastly improving the efficiency of building such models. Specifically, by expressing CM trajectories via a particular differential equation, we argue that diffusion models can be viewed as a special case of CMs with a specific discretization. We can thus fine-tune a consistency model starting from a pre-trained diffusion model and progressively approximate the full consistency condition to stronger degrees over the training process. Our resulting method, which we term Easy Consistency Tuning (ECT), achieves vastly improved training times while indeed improving upon the quality of previous methods: for example, ECT achieves a 2-step FID of 2.73 on CIFAR10 within 1 hour on a single A100 GPU, matching Consistency Distillation trained of hundreds of GPU hours. Owing to this computational efficiency, we investigate the scaling law of CMs under ECT, showing that they seem to obey classic power law scaling, hinting at their ability to improve efficiency and performance at larger scales. Code (https://github.com/locuslab/ect) is available.

artificial intelligence, arxiv preprint arxiv, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2406.14548

Country: North America (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

One-Step Diffusion Distillation via Deep Equilibrium Models

Geng, Zhengyang, Pokle, Ashwini, Kolter, J. Zico

arXiv.org Artificial IntelligenceDec-12-2023

Diffusion models excel at producing high-quality samples but naively require hundreds of iterations, prompting multiple attempts to distill the generation process into a faster network. However, many existing approaches suffer from a variety of challenges: the process for distillation training can be complex, often requiring multiple training stages, and the resulting models perform poorly when utilized in single-step generative applications. In this paper, we introduce a simple yet effective means of distilling diffusion models directly from initial noise to the resulting image. Of particular importance to our approach is to leverage a new Deep Equilibrium (DEQ) model as the distilled architecture: the Generative Equilibrium Transformer (GET). Our method enables fully offline training with just noise/image pairs from the diffusion model while achieving superior performance compared to existing one-step methods on comparable training budgets. We demonstrate that the DEQ architecture is crucial to this capability, as GET matches a $5\times$ larger ViT in terms of FID scores while striking a critical balance of computational cost and image quality. Code, checkpoints, and datasets are available.

arxiv preprint arxiv, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2401.08639

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

Deep Equilibrium Based Neural Operators for Steady-State PDEs

Marwah, Tanya, Pokle, Ashwini, Kolter, J. Zico, Lipton, Zachary C., Lu, Jianfeng, Risteski, Andrej

arXiv.org Machine LearningNov-30-2023

Data-driven machine learning approaches are being increasingly used to solve partial differential equations (PDEs). They have shown particularly striking successes when training an operator, which takes as input a PDE in some family, and outputs its solution. However, the architectural design space, especially given structural knowledge of the PDE family of interest, is still poorly understood. We seek to remedy this gap by studying the benefits of weight-tied neural network architectures for steady-state PDEs. To achieve this, we first demonstrate that the solution of most steady-state PDEs can be expressed as a fixed point of a non-linear operator. Motivated by this observation, we propose FNO-DEQ, a deep equilibrium variant of the FNO architecture that directly solves for the solution of a steady-state PDE as the infinite-depth fixed point of an implicit operator layer using a black-box root solver and differentiates analytically through this fixed point resulting in $\mathcal{O}(1)$ training memory. Our experiments indicate that FNO-DEQ-based architectures outperform FNO-based baselines with $4\times$ the number of parameters in predicting the solution to steady-state PDEs such as Darcy Flow and steady-state incompressible Navier-Stokes. Finally, we show FNO-DEQ is more robust when trained with datasets with more noisy observations than the FNO-based baselines, demonstrating the benefits of using appropriate inductive biases in architectural design for different neural network based PDE solvers. Further, we show a universal approximation result that demonstrates that FNO-DEQ can approximate the solution to any steady-state PDE that can be written as a fixed point equation.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Machine Learning

2312.00234

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Banking & Finance (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Training-free Linear Image Inversion via Flows

Pokle, Ashwini, Muckley, Matthew J., Chen, Ricky T. Q., Karrer, Brian

arXiv.org Artificial IntelligenceSep-25-2023

Training-free linear inversion involves the use of a pretrained generative model and -- through appropriate modifications to the generation process -- solving inverse problems without any finetuning of the generative model. While recent prior methods have explored the use of diffusion models, they still require the manual tuning of many hyperparameters for different inverse problems. In this work, we propose a training-free method for image inversion using pretrained flow models, leveraging the simplicity and efficiency of Flow Matching models, using theoretically-justified weighting schemes and thereby significantly reducing the amount of manual tuning. In particular, we draw inspiration from two main sources: adopting prior gradient correction methods to the flow regime, and a solver scheme based on conditional Optimal Transport paths. As pretrained diffusion models are widely accessible, we also show how to practically adapt diffusion models for our method. Empirically, our approach requires no problem-specific tuning across an extensive suite of noisy linear image inversion problems on high-dimensional datasets, ImageNet-64/128 and AFHQ-256, and we observe that our flow-based method for image inversion significantly improves upon closely-related diffusion-based linear inversion methods.

artificial intelligence, flow, training-free linear image inversion

arXiv.org Artificial Intelligence

2310.04432

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Path Independent Equilibrium Models Can Better Exploit Test-Time Computation

Anil, Cem, Pokle, Ashwini, Liang, Kaiqu, Treutlein, Johannes, Wu, Yuhuai, Bai, Shaojie, Kolter, Zico, Grosse, Roger

arXiv.org Artificial IntelligenceNov-17-2022

Designing networks capable of attaining better performance with an increased inference budget is important to facilitate generalization to harder problem instances. Recent efforts have shown promising results in this direction by making use of depth-wise recurrent networks. We show that a broad class of architectures named equilibrium models display strong upwards generalization, and find that stronger performance on harder examples (which require more iterations of inference to get correct) strongly correlates with the path independence of the system -- its tendency to converge to the same steady-state behaviour regardless of initialization, given enough computation. Experimental interventions made to promote path independence result in improved generalization on harder problem instances, while those that penalize it degrade this ability. Path independence analyses are also useful on a per-example basis: for equilibrium models that have good in-distribution performance, path independence on out-of-distribution samples strongly correlates with accuracy. Our results help explain why equilibrium models are capable of strong upwards generalization and motivates future work that harnesses path independence as a general modelling principle to facilitate scalable test-time usage.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2211.09961

Country:

North America > Canada (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Deep Local Trajectory Replanning and Control for Robot Navigation

Pokle, Ashwini, Martín-Martín, Roberto, Goebel, Patrick, Chow, Vincent, Ewald, Hans M., Yang, Junwei, Wang, Zhenkai, Sadeghian, Amir, Sadigh, Dorsa, Savarese, Silvio, Vázquez, Marynel

arXiv.org Artificial IntelligenceMay-13-2019

We present a navigation system that combines ideas from hierarchical planning and machine learning. The system uses a traditional global planner to compute optimal paths towards a goal, and a deep local trajectory planner and velocity controller to compute motion commands. The latter components of the system adjust the behavior of the robot through attention mechanisms such that it moves towards the goal, avoids obstacles, and respects the space of nearby pedestrians. Both the structure of the proposed deep models and the use of attention mechanisms make the system's execution interpretable. Our simulation experiments suggest that the proposed architecture outperforms baselines that try to map global plan information and sensor data directly to velocity commands. In comparison to a hand-designed traditional navigation system, the proposed approach showed more consistent performance.

artificial intelligence, navigation, neural network, (17 more...)

arXiv.org Artificial Intelligence

1905.05279

Country: North America > United States (0.93)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.48)

Industry:

Transportation (0.46)
Automobiles & Trucks (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Translating Navigation Instructions in Natural Language to a High-Level Plan for Behavioral Robot Navigation

Zang, Xiaoxue, Pokle, Ashwini, Vázquez, Marynel, Chen, Kevin, Niebles, Juan Carlos, Soto, Alvaro, Savarese, Silvio

arXiv.org Artificial IntelligenceSep-24-2018

We propose an end-to-end deep learning model for translating free-form natural language instructions to a high-level plan for behavioral robot navigation. The proposed model uses attention mechanisms to connect information from user instructions with a topological representation of the environment. To evaluate this model, we collected a new dataset for the translation problem containing 11,051 pairs of user instructions and navigation plans. Our results show that the proposed model outperforms baseline approaches on the new dataset. Overall, our work suggests that a topological map of the environment can serve as a relevant knowledge base for translating natural language instructions into a sequence of navigation behaviors.

deep learning, instruction, neural network, (20 more...)

arXiv.org Artificial Intelligence

1810.00663

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.90)

Add feedback