AITopics | Ahuja, Narendra

Collaborating Authors

Ahuja, Narendra

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

R2I-rPPG: A Robust Region of Interest Selection Method for Remote Photoplethysmography to Extract Heart Rate

Nagar, Sandeep, Hasegawa-Johnson, Mark, Beiser, David G., Ahuja, Narendra

arXiv.org Artificial IntelligenceNov-25-2024

The COVID-19 pandemic has underscored the need for low-cost, scalable approaches to measuring contactless vital signs, either during initial triage at a healthcare facility or virtual telemedicine visits. Remote photoplethysmography (rPPG) can accurately estimate heart rate (HR) when applied to close-up videos of healthy volunteers in well-lit laboratory settings. However, results from such highly optimized laboratory studies may not be readily translated to healthcare settings. One significant barrier to the practical application of rPPG in health care is the accurate localization of the region of interest (ROI). Clinical or telemedicine visits may involve sub-optimal lighting, movement artifacts, variable camera angle, and subject distance. This paper presents an rPPG ROI selection method based on 3D facial landmarks and patient head yaw angle. We then demonstrate the robustness of this ROI selection method when coupled to the Plane-Orthogonal-to-Skin (POS) rPPG method when applied to videos of patients presenting to an Emergency Department for respiratory complaints. Our results demonstrate the effectiveness of our proposed approach in improving the accuracy and robustness of rPPG in a challenging clinical environment.

artificial intelligence, dataset, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2410.15851

Country: North America > United States > Illinois (0.28)

Genre: Research Report > New Finding (0.86)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Health Care Technology > Telehealth (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.47)

Add feedback

Potential Field Based Deep Metric Learning

Bhatnagar, Shubhang, Ahuja, Narendra

arXiv.org Artificial IntelligenceMay-28-2024

Deep metric learning (DML) involves training a network to learn a semantically meaningful representation space. Many current approaches mine n-tuples of examples and model interactions within each tuplets. We present a novel, compositional DML model, inspired by electrostatic fields in physics that, instead of in tuples, represents the influence of each example (embedding) by a continuous potential field, and superposes the fields to obtain their combined global potential field. We use attractive/repulsive potential fields to represent interactions among embeddings from images of the same/different classes. Contrary to typical learning methods, where mutual influence of samples is proportional to their distance, we enforce reduction in such influence with distance, leading to a decaying field. We show that such decay helps improve performance on real world datasets with large intra-class variations and label noise. Like other proxy-based methods, we also use proxies to succinctly represent sub-populations of examples. We evaluate our method on three standard DML benchmarks- Cars-196, CUB-200-2011, and SOP datasets where it outperforms state-of-the-art baselines.

artificial intelligence, machine learning, metric learning, (13 more...)

arXiv.org Artificial Intelligence

2405.1856

Country: North America > United States > Illinois (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Improving Multi-label Recognition using Class Co-Occurrence Probabilities

Rawlekar, Samyak, Bhatnagar, Shubhang, Srinivasulu, Vishnuvardhan Pogunulu, Ahuja, Narendra

arXiv.org Artificial IntelligenceApr-24-2024

Multi-label Recognition (MLR) involves the identification of multiple objects within an image. To address the additional complexity of this problem, recent works have leveraged information from vision-language models (VLMs) trained on large text-images datasets for the task. These methods learn an independent classifier for each object (class), overlooking correlations in their occurrences. Such co-occurrences can be captured from the training data as conditional probabilities between a pair of classes. We propose a framework to extend the independent classifiers by incorporating the co-occurrence information for object pairs to improve the performance of independent classifiers. We use a Graph Convolutional Network (GCN) to enforce the conditional probabilities between classes, by refining the initial estimates derived from image and text sources obtained using VLMs. We validate our method on four MLR datasets, where our approach outperforms all state-of-the-art methods.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2404.16193

Country:

North America > United States > Illinois > Champaign County (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine (0.68)
Food & Agriculture (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.58)

Add feedback

Piecewise-Linear Manifolds for Deep Metric Learning

Bhatnagar, Shubhang, Ahuja, Narendra

arXiv.org Artificial IntelligenceMar-22-2024

Unsupervised deep metric learning (UDML) focuses on learning a semantic representation space using only unlabeled data. This challenging problem requires accurately estimating the similarity between data points, which is used to supervise a deep network. For this purpose, we propose to model the high-dimensional data manifold using a piecewise-linear approximation, with each low-dimensional linear piece approximating the data manifold in a small neighborhood of a point. These neighborhoods are used to estimate similarity between data points. We empirically show that this similarity estimate correlates better with the ground truth than the similarity estimates of current state-of-the-art techniques. We also show that proxies, commonly used in supervised metric learning, can be used to model the piecewise-linear manifold in an unsupervised setting, helping improve performance. Our method outperforms existing unsupervised metric learning approaches on standard zero-shot image retrieval benchmarks.

artificial intelligence, machine learning, neighborhood, (12 more...)

arXiv.org Artificial Intelligence

2403.14977

Country: North America > United States > Illinois (0.14)

Genre: Research Report > Promising Solution (0.48)

Industry: Food & Agriculture (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Long-Distance Gesture Recognition using Dynamic Neural Networks

Bhatnagar, Shubhang, Gopal, Sharath, Ahuja, Narendra, Ren, Liu

arXiv.org Artificial IntelligenceAug-8-2023

Gestures form an important medium of communication between humans and machines. An overwhelming majority of existing gesture recognition methods are tailored to a scenario where humans and machines are located very close to each other. This short-distance assumption does not hold true for several types of interactions, for example gesture-based interactions with a floor cleaning robot or with a drone. Methods made for short-distance recognition are unable to perform well on long-distance recognition due to gestures occupying only a small portion of the input data. Their performance is especially worse in resource constrained settings where they are not able to effectively focus their limited compute on the gesturing subject. We propose a novel, accurate and efficient method for the recognition of gestures from longer distances. It uses a dynamic neural network to select features from gesture-containing spatial regions of the input sensor data for further processing. This helps the network focus on features important for gesture recognition while discarding background features early on, thus making it more compute efficient compared to other techniques. We demonstrate the performance of our method on the LD-ConGR long-distance dataset where it outperforms previous state-of-the-art methods on recognition accuracy and compute efficiency.

artificial intelligence, machine learning, recognition, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/IROS55552.2023.10342147

2308.04643

Country: North America > United States > Illinois (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

A Hierarchical Variational Neural Uncertainty Model for Stochastic Video Prediction

Chatterjee, Moitreya, Ahuja, Narendra, Cherian, Anoop

arXiv.org Artificial IntelligenceOct-5-2021

Predicting the future frames of a video is a challenging task, in part due to the underlying stochastic real-world phenomena. Prior approaches to solve this task typically estimate a latent prior characterizing this stochasticity, however do not account for the predictive uncertainty of the (deep learning) model. Such approaches often derive the training signal from the mean-squared error (MSE) between the generated frame and the ground truth, which can lead to sub-optimal training, especially when the predictive uncertainty is high. Towards this end, we introduce Neural Uncertainty Quantifier (NUQ) - a stochastic quantification of the model's predictive uncertainty, and use it to weigh the MSE loss. We propose a hierarchical, variational framework to derive NUQ in a principled manner using a deep, Bayesian graphical model. Our experiments on four benchmark stochastic video prediction datasets show that our proposed framework trains more effectively compared to the state-of-the-art models (especially when the training sets are small), while demonstrating better video generation quality and diversity against several evaluation metrics.

artificial intelligence, machine learning, neural network, (18 more...)

arXiv.org Artificial Intelligence

2110.03446

Country: North America > United States > Illinois (0.14)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry: Automobiles & Trucks (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Visual Scene Graphs for Audio Source Separation

Chatterjee, Moitreya, Roux, Jonathan Le, Ahuja, Narendra, Cherian, Anoop

arXiv.org Artificial IntelligenceSep-24-2021

State-of-the-art approaches for visually-guided audio source separation typically assume sources that have characteristic sounds, such as musical instruments. These approaches often ignore the visual context of these sound sources or avoid modeling object interactions that may be useful to better characterize the sources, especially when the same object class may produce varied sounds from distinct interactions. To address this challenging problem, we propose Audio Visual Scene Graph Segmenter (AVSGS), a novel deep learning model that embeds the visual structure of the scene as a graph and segments this graph into subgraphs, each subgraph being associated with a unique sound obtained by co-segmenting the audio spectrogram. At its core, AVSGS uses a recursive neural network that emits mutually-orthogonal sub-graph embeddings of the visual graph using multi-head attention. These embeddings are used for conditioning an audio encoder-decoder towards source separation. Our pipeline is trained end-to-end via a self-supervised task consisting of separating audio sources using the visual graph from artificially mixed sounds. In this paper, we also introduce an "in the wild'' video dataset for sound source separation that contains multiple non-musical sources, which we call Audio Separation in the Wild (ASIW). This dataset is adapted from the AudioCaps dataset, and provides a challenging, natural, and daily-life setting for source separation. Thorough experiments on the proposed ASIW and the standard MUSIC datasets demonstrate state-of-the-art sound separation performance of our method against recent prior approaches.

deep learning, man woman, neural network, (21 more...)

arXiv.org Artificial Intelligence

2109.11955

Country: North America > United States > Illinois (0.14)

Genre: Research Report (0.84)

Industry:

Leisure & Entertainment (0.67)
Media > Music (0.49)
Transportation (0.46)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A SNoW-Based Face Detector

Yang, Ming-Hsuan, Roth, Dan, Ahuja, Narendra

Neural Information Processing SystemsDec-31-2000

A novel learning approach for human face detection using a network of linear units is presented. The SNoW learning architecture is a sparse network of linear functions over a predefined or incrementally learned feature space and is specifically tailored for learning in the presence of a very large number of features. A wide range of face images in different poses, with different expressions and under different lighting conditions are used as a training set to capture the variations of human faces. Experimental results on commonly used benchmark data sets of a wide range of face images show that the SNoW-based approach outperforms methods that use neural networks, Bayesian methods, support vector machines and others. Furthermore, learning and evaluation using the SNoW-based method are significantly more efficient than with other methods. 1 Introduction Growing interest in intelligent human computer interactions has motivated a recent surge in research on problems such as face tracking, pose estimation, face expression and gesture recognition. Most methods, however, assume human faces in their input images have been detected and localized.

artificial intelligence, machine learning, recognition, (13 more...)

Neural Information Processing Systems

Country: North America > United States > Illinois (0.14)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.56)

Add feedback

A SNoW-Based Face Detector

Yang, Ming-Hsuan, Roth, Dan, Ahuja, Narendra

Neural Information Processing SystemsDec-31-2000

A novel learning approach for human face detection using a network of linear units is presented. The SNoW learning architecture is a sparse network of linear functions over a predefined or incrementally learnedfeature space and is specifically tailored for learning in the presence of a very large number of features. A wide range of face images in different poses, with different expressions and under different lighting conditions are used as a training set to capture the variations of human faces. Experimental results on commonly used benchmark data sets of a wide range of face images show that the SNoW-based approach outperforms methods that use neural networks, Bayesian methods, support vector machines and others. Furthermore,learning and evaluation using the SNoW-based method are significantly more efficient than with other methods. 1 Introduction Growing interest in intelligent human computer interactions has motivated a recent surge in research on problems such as face tracking, pose estimation, face expression and gesture recognition. Most methods, however, assume human faces in their input images have been detected and localized.

artificial intelligence, machine learning, recognition, (12 more...)

Neural Information Processing Systems

Country: North America > United States > Illinois (0.14)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.56)

Add feedback