AITopics | Mishra, Aayush

Collaborating Authors

Mishra, Aayush

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Robust Amortized Bayesian Inference with Self-Consistency Losses on Unlabeled Data

Mishra, Aayush, Habermann, Daniel, Schmitt, Marvin, Radev, Stefan T., Bürkner, Paul-Christian

arXiv.org Machine LearningFeb-11-2025

Neural amortized Bayesian inference (ABI) can solve probabilistic inverse problems orders of magnitude faster than classical methods. However, neural ABI is not yet sufficiently robust for widespread and safe applicability. In particular, when performing inference on observations outside of the scope of the simulated data seen during training, for example, because of model misspecification, the posterior approximations are likely to become highly biased. Due to the bad pre-asymptotic behavior of current neural posterior estimators in the out-of-simulation regime, the resulting estimation biases cannot be fixed in acceptable time by just simulating more training data. In this proof-of-concept paper, we propose a semi-supervised approach that enables training not only on (labeled) simulated data generated from the model, but also on unlabeled data originating from any source, including real-world data. To achieve the latter, we exploit Bayesian self-consistency properties that can be transformed into strictly proper losses without requiring knowledge of true parameter values, that is, without requiring data labels. The results of our initial experiments show remarkable improvements in the robustness of ABI on out-of-simulation data. Even if the observed data is far away from both labeled and unlabeled training data, inference remains highly accurate. If our findings also generalize to other scenarios and model classes, we believe that our new method represents a major breakthrough in neural ABI.

artificial intelligence, machine learning, self-consistency loss, (16 more...)

arXiv.org Machine Learning

2501.13483

Country:

Europe (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Transportation (0.95)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

Add feedback

Automatic Machine Learning Framework to Study Morphological Parameters of AGN Host Galaxies within $z < 1.4$ in the Hyper Supreme-Cam Wide Survey

Tian, Chuan, Urry, C. Megan, Ghosh, Aritra, Nagai, Daisuke, Ananna, Tonima T., Powell, Meredith C., Auge, Connor, Mishra, Aayush, Sanders, David B., Cappelluti, Nico, Schawinski, Kevin

arXiv.org Artificial IntelligenceJan-26-2025

We present a composite machine learning framework to estimate posterior probability distributions of bulge-to-total light ratio, half-light radius, and flux for Active Galactic Nucleus (AGN) host galaxies within $z<1.4$ and $m<23$ in the Hyper Supreme-Cam Wide survey. We divide the data into five redshift bins: low ($0

artificial intelligence, galaxy, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2501.15739

Country:

Asia (0.67)
North America > United States > Hawaii (0.14)
North America > United States > Washington > King County > Seattle (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (1.00)

Industry: Government > Regional Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

On the challenges of detecting MCI using EEG in the wild

Mishra, Aayush, Joffe, David, Telidevara, Sankara Surendra, Oakley, David S, Liu, Anqi

arXiv.org Artificial IntelligenceJan-15-2025

Recent studies have shown promising results in the detection of Mild Cognitive Impairment (MCI) using easily accessible Electroencephalogram (EEG) data which would help administer early and effective treatment for dementia patients. However, the reliability and practicality of such systems remains unclear. In this work, we investigate the potential limitations and challenges in developing a robust MCI detection method using two contrasting datasets: 1) CAUEEG, collected and annotated by expert neurologists in controlled settings and 2) GENEEG, a new dataset collected and annotated in general practice clinics, a setting where routine MCI diagnoses are typically made. We find that training on small datasets, as is done by most previous works, tends to produce high variance models that make overconfident predictions, and are unreliable in practice. Additionally, distribution shifts between datasets make cross-domain generalization challenging. Finally, we show that MCI detection using EEG may suffer from fundamental limitations because of the overlapping nature of feature distributions with control groups. We call for more effort in high-quality data collection in actionable settings (like general practice clinics) to make progress towards this salient goal of non-invasive MCI detection.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2501.17871

Genre: Research Report > Experimental Study (0.86)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.96)
Health & Medicine > Therapeutic Area > Neurology > Dementia (0.91)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

DECODE: Data-driven Energy Consumption Prediction leveraging Historical Data and Environmental Factors in Buildings

Mishra, Aditya, Lone, Haroon R., Mishra, Aayush

arXiv.org Artificial IntelligenceFeb-6-2024

Energy prediction in buildings plays a crucial role in effective energy management. Precise predictions are essential for achieving optimal energy consumption and distribution within the grid. This paper introduces a Long Short-Term Memory (LSTM) model designed to forecast building energy consumption using historical energy data, occupancy patterns, and weather conditions. The LSTM model provides accurate short, medium, and long-term energy predictions for residential and commercial buildings compared to existing prediction models. We compare our LSTM model with established prediction methods, including linear regression, decision trees, and random forest. Encouragingly, the proposed LSTM model emerges as the superior performer across all metrics. It demonstrates exceptional prediction accuracy, boasting the highest R2 score of 0.97 and the most favorable mean absolute error (MAE) of 0.007. An additional advantage of our developed model is its capacity to achieve efficient energy consumption forecasts even when trained on a limited dataset. We address concerns about overfitting (variance) and underfitting (bias) through rigorous training and evaluation on real-world data. In summary, our research contributes to energy prediction by offering a robust LSTM model that outperforms alternative methods and operates with remarkable efficiency, generalizability, and reliability.

artificial intelligence, energy consumption, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.enbuild.2024.113950

2309.02908

Country: Asia > India (0.14)

Genre: Research Report (1.00)

Industry: Energy > Power Industry (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Source-Free and Image-Only Unsupervised Domain Adaptation for Category Level Object Pose Estimation

Kaushik, Prakhar, Mishra, Aayush, Kortylewski, Adam, Yuille, Alan

arXiv.org Artificial IntelligenceJan-19-2024

We consider the problem of source-free unsupervised category-level pose estimation from only RGB images to a target domain without any access to source domain data or 3D annotations during adaptation. Collecting and annotating real-world 3D data and corresponding images is laborious, expensive, yet unavoidable process, since even 3D pose domain adaptation methods require 3D data in the target domain. We introduce 3DUDA, a method capable of adapting to a nuisance-ridden target domain without 3D or depth data. Our key insight stems from the observation that specific object subparts remain stable across out-of-domain (OOD) scenarios, enabling strategic utilization of these invariant subcomponents for effective model updates. We represent object categories as simple cuboid meshes, and harness a generative model of neural feature activations modeled at each mesh vertex learnt using differential rendering. We focus on individual locally robust mesh vertex features and iteratively update them based on their proximity to corresponding features in the target domain even when the global pose is not correct. Our model is then trained in an EM fashion, alternating between updating the vertex features and the feature extractor. We show that our method simulates fine-tuning on a global pseudo-labeled dataset under mild assumptions, which converges to the target domain asymptotically. Through extensive empirical validation, including a complex extreme UDA setup which combines real nuisances, synthetic noise, and occlusion, we demonstrate the potency of our simple approach in addressing the domain shift challenge and significantly improving pose estimation accuracy.

artificial intelligence, machine learning, pose estimation result, (15 more...)

arXiv.org Artificial Intelligence

2401.10848

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Do pretrained Transformers Really Learn In-context by Gradient Descent?

Shen, Lingfeng, Mishra, Aayush, Khashabi, Daniel

arXiv.org Artificial IntelligenceNov-29-2023

The emergence of In-Context Learning (ICL) in LLMs remains a significant phenomenon with little understanding. To explain ICL, recent studies try to shed light on ICL by connecting it to Gradient Descent (GD). However, the question is, do these hold up in practice in actual pre-trained models? We highlight the limiting assumptions in prior works that make their context considerably different from the practical context in which language models are trained. For example, the theoretical hand-constructed weights used in these studies have properties that don't match those of real LLMs. Furthermore, their experimental verification uses \emph{ICL objective} (training models explicitly for ICL), which differs from the emergent ICL in the wild. We also look for evidence in real models. We observe that ICL and GD have different sensitivity to the order in which they observe demonstrations. Finally, we probe and compare the ICL vs. GD hypothesis in a natural setting. We conduct comprehensive empirical analyses on language models pre-trained on natural data (LLaMa-7B). Our comparisons of three performance metrics highlight the inconsistent behavior of ICL and GD as a function of various factors such as datasets, models, and the number of demonstrations. We observe that ICL and GD modify the output distribution of language models differently. These results indicate that the equivalence between ICL and GD remains an open hypothesis and calls for further studies.

large language model, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2310.0854

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)

Add feedback

Stress Testing Chain-of-Thought Prompting for Large Language Models

Mishra, Aayush, Thakkar, Karan

arXiv.org Artificial IntelligenceSep-28-2023

This report examines the effectiveness of Chain-of-Thought (CoT) prompting in improving the multi-step reasoning abilities of large language models (LLMs). Inspired by previous studies \cite{Min2022RethinkingWork}, we analyze the impact of three types of CoT prompt perturbations, namely CoT order, CoT values, and CoT operators on the performance of GPT-3 on various tasks. Our findings show that incorrect CoT prompting leads to poor performance on accuracy metrics. Correct values in the CoT is crucial for predicting correct answers. Moreover, incorrect demonstrations, where the CoT operators or the CoT order are wrong, do not affect the performance as drastically when compared to the value based perturbations. This research deepens our understanding of CoT prompting and opens some new questions regarding the capability of LLMs to learn reasoning in context.

large language model, machine learning, natural language, (3 more...)

arXiv.org Artificial Intelligence

2309.16621

Genre: Research Report > New Finding (0.53)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Repeated Environment Inference for Invariant Learning

Mishra, Aayush, Liu, Anqi

arXiv.org Artificial IntelligenceAug-5-2022

We study the problem of invariant learning when the environment labels are unknown. We focus on the invariant representation notion when the Bayes optimal conditional label distribution is the same across different environments. Previous work conducts Environment Inference (EI) by maximizing the penalty term from Invariant Risk Minimization (IRM) framework. The EI step uses a reference model which focuses on spurious correlations to efficiently reach a good environment partition. However, it is not clear how to find such a reference model. In this work, we propose to repeat the EI process and retrain an ERM model on the \textit{majority} environment inferred by the previous EI step. Under mild assumptions, we find that this iterative process helps learn a representation capturing the spurious correlation better than the single step. This results in better Environment Inference and better Invariant Learning. We show that this method outperforms baselines on both synthetic and real-world datasets.

artificial intelligence, environment inference, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2207.12876

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback