AITopics | Kumar, Abhishek

Collaborating Authors

Kumar, Abhishek

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Early Weight Averaging meets High Learning Rates for LLM Pre-training

Sanyal, Sunny, Neerkaje, Atula, Kaddour, Jean, Kumar, Abhishek, Sanghavi, Sujay

arXiv.org Artificial IntelligenceDec-11-2023

Training Large Language Models (LLMs) incurs significant cost; hence, any strategy that accelerates model convergence is helpful. In this paper, we investigate the ability of a simple idea checkpoint averaging along the trajectory of a training run to improve both convergence and generalization quite early on during training. Here we show that models trained with high learning rates observe higher gains due to checkpoint averaging. Furthermore, these gains are amplified when checkpoints are sampled with considerable spacing in training steps. Our training recipe outperforms conventional training and popular checkpoint averaging baselines such as exponential moving average (EMA) and stochastic moving average (SWA). We evaluate our training recipe by pre-training LLMs, where high learning rates are inherently preferred due to extremely large batch sizes. Specifically, we pre-trained nanoGPT-2 models of varying sizes, small (125M), medium (335M), and large (770M)on the OpenWebText dataset, comprised of 9B tokens. Additionally, we present results for publicly available Pythia LLMs, ranging from 1B to 12B, which were trained on the PILE-deduped dataset containing 207B tokens.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2306.03241

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Beyond First-Order Tweedie: Solving Inverse Problems using Latent Diffusion

Rout, Litu, Chen, Yujia, Kumar, Abhishek, Caramanis, Constantine, Shakkottai, Sanjay, Chu, Wen-Sheng

arXiv.org Machine LearningDec-1-2023

Sampling from the posterior distribution poses a major computational challenge in solving inverse problems using latent diffusion models. Common methods rely on Tweedie's first-order moments, which are known to induce a quality-limiting bias. Existing second-order approximations are impractical due to prohibitive computational costs, making standard reverse diffusion processes intractable for posterior sampling. This paper introduces Second-order Tweedie sampler from Surrogate Loss (STSL), a novel sampler that offers efficiency comparable to first-order Tweedie with a tractable reverse process using second-order approximation. Our theoretical results reveal that the second-order approximation is lower bounded by our surrogate loss that only requires $O(1)$ compute using the trace of the Hessian, and by the lower bound we derive a new drift term to make the reverse process tractable. Our method surpasses SoTA solvers PSLD and P2L, achieving 4X and 8X reduction in neural function evaluations, respectively, while notably enhancing sampling quality on FFHQ, ImageNet, and COCO benchmarks. In addition, we show STSL extends to text-guided image editing and addresses residual distortions present from corrupted images in leading text-guided image editing methods. To our best knowledge, this is the first work to offer an efficient second-order approximation in solving inverse problems using latent diffusion and editing real-world images with corruptions.

artificial intelligence, editing, machine learning, (17 more...)

arXiv.org Machine Learning

2312.00852

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Towards Last-layer Retraining for Group Robustness with Fewer Annotations

LaBonte, Tyler, Muthukumar, Vidya, Kumar, Abhishek

arXiv.org Artificial IntelligenceNov-14-2023

Empirical risk minimization (ERM) of neural networks is prone to over-reliance on spurious correlations and poor generalization on minority groups. The recent deep feature reweighting (DFR) technique achieves state-of-the-art group robustness via simple last-layer retraining, but it requires held-out group and class annotations to construct a group-balanced reweighting dataset. In this work, we examine this impractical requirement and find that last-layer retraining can be surprisingly effective with no group annotations (other than for model selection) and only a handful of class annotations. We first show that last-layer retraining can greatly improve worst-group accuracy even when the reweighting dataset has only a small proportion of worst-group data. This implies a "free lunch" where holding out a subset of training data to retrain the last layer can substantially outperform ERM on the entire dataset with no additional data or annotations. To further improve group robustness, we introduce a lightweight method called selective last-layer finetuning (SELF), which constructs the reweighting dataset using misclassifications or disagreements. Our empirical and theoretical results present the first evidence that model disagreement upsamples worst-group data, enabling SELF to nearly match DFR on four well-established benchmarks across vision and language tasks with no group annotations and less than 3% of the held-out class annotations. Our code is available at https://github.com/tmlabonte/last-layer-retraining.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2309.08534

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Enhancing Clustering Representations with Positive Proximity and Cluster Dispersion Learning

Kumar, Abhishek, Lee, Dong-Gyu

arXiv.org Artificial IntelligenceNov-1-2023

Contemporary deep clustering approaches often rely on either contrastive or non-contrastive techniques to acquire effective representations for clustering tasks. Contrastive methods leverage negative pairs to achieve homogenous representations but can introduce class collision issues, potentially compromising clustering performance. On the contrary, non-contrastive techniques prevent class collisions but may produce non-uniform representations that lead to clustering collapse. In this work, we propose a novel end-to-end deep clustering approach named PIPCDR, designed to harness the strengths of both approaches while mitigating their limitations. PIPCDR incorporates a positive instance proximity loss and a cluster dispersion regularizer. The positive instance proximity loss ensures alignment between augmented views of instances and their sampled neighbors, enhancing within-cluster compactness by selecting genuinely positive pairs within the embedding space. Meanwhile, the cluster dispersion regularizer maximizes inter-cluster distances while minimizing within-cluster compactness, promoting uniformity in the learned representations. PIPCDR excels in producing well-separated clusters, generating uniform representations, avoiding class collision issues, and enhancing within-cluster compactness. We extensively validate the effectiveness of PIPCDR within an end-to-end Majorize-Minimization framework, demonstrating its competitive performance on moderate-scale clustering benchmark datasets and establishing new state-of-the-art results on large-scale datasets.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2311.00731

Genre:

Overview (0.93)
Research Report > Promising Solution (0.67)
Research Report > New Finding (0.67)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.69)
(3 more...)

Add feedback

Small-scale proxies for large-scale Transformer training instabilities

Wortsman, Mitchell, Liu, Peter J., Xiao, Lechao, Everett, Katie, Alemi, Alex, Adlam, Ben, Co-Reyes, John D., Gur, Izzeddin, Kumar, Abhishek, Novak, Roman, Pennington, Jeffrey, Sohl-dickstein, Jascha, Xu, Kelvin, Lee, Jaehoon, Gilmer, Justin, Kornblith, Simon

arXiv.org Artificial IntelligenceOct-16-2023

Teams that have trained large Transformer-based models have reported training instabilities at large scale that did not appear when training with the same hyperparameters at smaller scales. Although the causes of such instabilities are of scientific interest, the amount of resources required to reproduce them has made investigation difficult. In this work, we seek ways to reproduce and study training stability and instability at smaller scales. First, we focus on two sources of training instability described in previous work: the growth of logits in attention layers (Dehghani et al., 2023) and divergence of the output logits from the log probabilities (Chowdhery et al., 2022). By measuring the relationship between learning rate and loss across scales, we show that these instabilities also appear in small models when training at high learning rates, and that mitigations previously employed at large scales are equally effective in this regime. This prompts us to investigate the extent to which other known optimizer and model interventions influence the sensitivity of the final loss to changes in the learning rate. To this end, we study methods such as warm-up, weight decay, and the $\mu$Param (Yang et al., 2022), and combine techniques to train small models that achieve similar losses across orders of magnitude of learning rate variation. Finally, to conclude our exploration we study two cases where instabilities can be predicted before they emerge by examining the scaling behavior of model activation and gradient norms.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2309.14322

Country: Europe > Spain (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Distributionally Robust Post-hoc Classifiers under Prior Shifts

Wei, Jiaheng, Narasimhan, Harikrishna, Amid, Ehsan, Chu, Wen-Sheng, Liu, Yang, Kumar, Abhishek

arXiv.org Artificial IntelligenceSep-15-2023

We investigate the problem of training models that are robust to shifts caused by changes in the distribution of class-priors or group-priors. The presence of skewed training priors can often lead to the models overfitting to spurious features. Unlike existing methods, which optimize for either the worst or the average performance over classes or groups, our work is motivated by the need for finer control over the robustness properties of the model. We present an extremely lightweight post-hoc approach that performs scaling adjustments to predictions from a pre-trained model, with the goal of minimizing a distributionally robust loss around a chosen target distribution. These adjustments are computed by solving a constrained optimization problem on a validation set and applied to the model during test time. Our constrained optimization objective is inspired from a natural notion of robustness to controlled distribution shifts. Our method comes with provable guarantees and empirically makes a strong case for distributional robust post-hoc classifiers.

artificial intelligence, drop, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2309.08825

Country: North America > United States (0.28)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

RemoteNet: Remote Sensing Image Segmentation Network based on Global-Local Information

Kumar, Satyawant, Kumar, Abhishek, Lee, Dong-Gyu

arXiv.org Artificial IntelligenceAug-14-2023

Remotely captured images possess an immense scale and object appearance variability due to the complex scene. It becomes challenging to capture the underlying attributes in the global and local context for their segmentation. Existing networks struggle to capture the inherent features due to the cluttered background. To address these issues, we propose a remote sensing image segmentation network, RemoteNet, for semantic segmentation of remote sensing images. We capture the global and local features by leveraging the benefits of the transformer and convolution mechanisms. RemoteNet is an encoder-decoder design that uses multi-scale features. We construct an attention map module to generate channel-wise attention scores for fusing these features. We construct a global-local transformer block (GLTB) in the decoder network to support learning robust representations during a decoding phase. Further, we designed a feature refinement module to refine the fused output of the shallow stage encoder feature and the deepest GLTB feature of the decoder. Experimental findings on the two public datasets show the effectiveness of the proposed RemoteNet.

artificial intelligence, machine learning, segmentation, (18 more...)

arXiv.org Artificial Intelligence

2302.13084

Genre: Research Report (0.64)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.88)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Learnable Digital Twin for Efficient Wireless Network Evaluation

Li, Boning, Efimov, Timofey, Kumar, Abhishek, Cortes, Jose, Verma, Gunjan, Swami, Ananthram, Segarra, Santiago

arXiv.org Artificial IntelligenceJun-10-2023

Network digital twins (NDTs) facilitate the estimation of key performance indicators (KPIs) before physically implementing a network, thereby enabling efficient optimization of the network configuration. In this paper, we propose a learning-based NDT for network simulators. The proposed method offers a holistic representation of information flow in a wireless network by integrating node, edge, and path embeddings. Through this approach, the model is trained to map the network configuration to KPIs in a single forward pass. Hence, it offers a more efficient alternative to traditional simulation-based methods, thus allowing for rapid experimentation and optimization. Our proposed method has been extensively tested through comprehensive experimentation in various scenarios, including wired and wireless networks. Results show that it outperforms baseline learning models in terms of accuracy and robustness. Moreover, our approach achieves comparable performance to simulators but with significantly higher computational efficiency.

artificial intelligence, machine learning, node, (19 more...)

arXiv.org Artificial Intelligence

2306.06574

Country: North America > United States > New York (0.14)

Genre: Research Report (1.00)

Industry: Telecommunications > Networks (1.00)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

Score-based Causal Representation Learning with Interventions

Varici, Burak, Acarturk, Emre, Shanmugam, Karthikeyan, Kumar, Abhishek, Tajer, Ali

arXiv.org Artificial IntelligenceMay-1-2023

This paper studies the causal representation learning problem when the latent causal variables are observed indirectly through an unknown linear transformation. The objectives are: (i) recovering the unknown linear transformation (up to scaling) and (ii) determining the directed acyclic graph (DAG) underlying the latent variables. Sufficient conditions for DAG recovery are established, and it is shown that a large class of non-linear models in the latent space (e.g., causal mechanisms parameterized by two-layer neural networks) satisfy these conditions. These sufficient conditions ensure that the effect of an intervention can be detected correctly from changes in the score. Capitalizing on this property, recovering a valid transformation is facilitated by the following key property: any valid transformation renders latent variables' score function to necessarily have the minimal variations across different interventional environments. This property is leveraged for perfect recovery of the latent DAG structure using only \emph{soft} interventions. For the special case of stochastic \emph{hard} interventions, with an additional hypothesis testing step, one can also uniquely recover the linear transformation up to scaling and a valid causal ordering.

artificial intelligence, intervention, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2301.0823

Country:

North America > United States (1.00)
Asia (0.67)

Genre: Research Report (1.00)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

Add feedback

Autonomy and Intelligence in the Computing Continuum: Challenges, Enablers, and Future Directions for Orchestration

Kokkonen, Henna, Lovén, Lauri, Motlagh, Naser Hossein, Kumar, Abhishek, Partala, Juha, Nguyen, Tri, Pujol, Víctor Casamayor, Kostakos, Panos, Leppänen, Teemu, González-Gil, Alfonso, Sola, Ester, Angulo, Iñigo, Liyanage, Madhusanka, Bennis, Mehdi, Tarkoma, Sasu, Dustdar, Schahram, Pirttikangas, Susanna, Riekki, Jukka

arXiv.org Artificial IntelligenceFeb-17-2023

Future AI applications require performance, reliability and privacy that the existing, cloud-dependant system architectures cannot provide. In this article, we study orchestration in the device-edge-cloud continuum, and focus on edge AI for resource orchestration. We claim that to support the constantly growing requirements of intelligent applications in the device-edge-cloud computing continuum, resource orchestration needs to embrace edge AI and emphasize local autonomy and intelligence. To justify the claim, we provide a general definition for continuum orchestration, and look at how current and emerging orchestration paradigms are suitable for the computing continuum. We describe certain major emerging research themes that may affect future orchestration, and provide an early vision of an orchestration paradigm that embraces those research themes. Finally, we survey current key edge AI methods and look at how they may contribute into fulfilling the vision of future continuum orchestration.

data mining, machine learning, reinforcement learning, (23 more...)

arXiv.org Artificial Intelligence

2205.01423

Country:

Europe (1.00)
Asia (1.00)
North America > United States > California (0.67)

Genre:

Overview (1.00)
Research Report > New Finding (0.67)
Research Report > Promising Solution (0.45)

Industry:

Telecommunications (1.00)
Information Technology > Services (1.00)
Information Technology > Security & Privacy (1.00)
(5 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Information Management (1.00)
Information Technology > Game Theory (1.00)
(17 more...)

Add feedback