AITopics | Hariharan, Bharath

Collaborating Authors

Hariharan, Bharath

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery

Mall, Utkarsh, Phoo, Cheng Perng, Chiquier, Mia, Hariharan, Bharath, Bala, Kavita, Vondrick, Carl

arXiv.org Artificial IntelligenceFeb-14-2025

Visual data is used in numerous different scientific workflows ranging from remote sensing to ecology. As the amount of observation data increases, the challenge is not just to make accurate predictions but also to understand the underlying mechanisms for those predictions. Good interpretation is important in scientific workflows, as it allows for better decision-making by providing insights into the data. This paper introduces an automatic way of obtaining such interpretable-by-design models, by learning programs that interleave neural networks. We propose DiSciPLE (Discovering Scientific Programs using LLMs and Evolution) an evolutionary algorithm that leverages common sense and prior knowledge of large language models (LLMs) to create Python programs explaining visual data. Additionally, we propose two improvements: a program critic and a program simplifier to improve our method further to synthesize good programs. On three different real-world problems, DiSciPLE learns state-of-the-art programs on novel tasks with no prior literature. For example, we can learn programs with 35% lower error than the closest non-interpretable baseline for population density estimation.

evolutionary algorithm, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2502.1006

Country: North America > United States (1.00)

Genre: Research Report (0.64)

Industry:

Government > Regional Government > North America Government > United States Government (0.68)
Education (0.66)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.89)

Add feedback

AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite Imagery

Zhou, Hangyu, Kao, Chia-Hsiang, Phoo, Cheng Perng, Mall, Utkarsh, Hariharan, Bharath, Bala, Kavita

arXiv.org Artificial IntelligenceOct-31-2024

Clouds in satellite imagery pose a significant challenge for downstream applications. A major challenge in current cloud removal research is the absence of a comprehensive benchmark and a sufficiently large and diverse training dataset. To address this problem, we introduce the largest public dataset -- $\textit{AllClear}$ for cloud removal, featuring 23,742 globally distributed regions of interest (ROIs) with diverse land-use patterns, comprising 4 million images in total. Each ROI includes complete temporal captures from the year 2022, with (1) multi-spectral optical imagery from Sentinel-2 and Landsat 8/9, (2) synthetic aperture radar (SAR) imagery from Sentinel-1, and (3) auxiliary remote sensing products such as cloud masks and land cover maps. We validate the effectiveness of our dataset by benchmarking performance, demonstrating the scaling law -- the PSNR rises from $28.47$ to $33.87$ with $30\times$ more data, and conducting ablation studies on the temporal length and the importance of individual modalities. This dataset aims to provide comprehensive coverage of the Earth's surface and promote better cloud removal results.

artificial intelligence, dataset, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2410.23891

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.67)

Industry:

Law (1.00)
Information Technology (1.00)
Government (1.00)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.93)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Counter-Current Learning: A Biologically Plausible Dual Network Approach for Deep Learning

Kao, Chia-Hsiang, Hariharan, Bharath

arXiv.org Artificial IntelligenceOct-23-2024

Despite its widespread use in neural networks, error backpropagation has faced criticism for its lack of biological plausibility, suffering from issues such as the backward locking problem and the weight transport problem. These limitations have motivated researchers to explore more biologically plausible learning algorithms that could potentially shed light on how biological neural systems adapt and learn. Inspired by the counter-current exchange mechanisms observed in biological systems, we propose counter-current learning (CCL), a biologically plausible framework for credit assignment in neural networks. This framework employs a feedforward network to process input data and a feedback network to process targets, with each network enhancing the other through anti-parallel signal propagation. By leveraging the more informative signals from the bottom layer of the feedback network to guide the updates of the top layer of the feedforward network and vice versa, CCL enables the simultaneous transformation of source inputs to target outputs and the dynamic mutual influence of these transformations. Experimental results on MNIST, FashionMNIST, CIFAR10, and CIFAR100 datasets using multi-layer perceptrons and convolutional neural networks demonstrate that CCL achieves comparable performance to other biologically plausible algorithms while offering a more biologically realistic learning mechanism. Furthermore, we showcase the applicability of our approach to an autoencoder task, underscoring its potential for unsupervised representation learning. Our work presents a direction for biologically inspired and plausible learning algorithms, offering an alternative mechanism of learning and adaptation in neural networks.

artificial intelligence, feedback network, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2409.19841

Genre: Research Report > Experimental Study (0.93)

Industry:

Energy > Oil & Gas (0.46)
Education (0.46)
Health & Medicine (0.46)
Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Neural Control Variates with Automatic Integration

Li, Zilu, Yang, Guandao, Zhao, Qingqing, Deng, Xi, Guibas, Leonidas, Hariharan, Bharath, Wetzstein, Gordon

arXiv.org Artificial IntelligenceSep-23-2024

This paper presents a method to leverage arbitrary neural network architecture for control variates. Control variates are crucial in reducing the variance of Monte Carlo integration, but they hinge on finding a function that both correlates with the integrand and has a known analytical integral. Traditional approaches rely on heuristics to choose this function, which might not be expressive enough to correlate well with the integrand. Recent research alleviates this issue by modeling the integrands with a learnable parametric model, such as a neural network. However, the challenge remains in creating an expressive parametric model with a known analytical integral. This paper proposes a novel approach to construct learnable parametric control variates functions from arbitrary neural network architectures. Instead of using a network to approximate the integrand directly, we employ the network to approximate the anti-derivative of the integrand. This allows us to use automatic differentiation to create a function whose integration can be constructed by the antiderivative network. We apply our method to solve partial differential equations using the Walk-on-sphere algorithm. Our results indicate that this approach is unbiased and uses various network architectures to achieve lower variance than other control variate methods.

artificial intelligence, control variate, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3641519.3657395

2409.15394

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.14)
North America > United States > New York (0.14)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Better Monocular 3D Detectors with LiDAR from the Past

You, Yurong, Phoo, Cheng Perng, Diaz-Ruiz, Carlos Andres, Luo, Katie Z, Chao, Wei-Lun, Campbell, Mark, Hariharan, Bharath, Weinberger, Kilian Q

arXiv.org Artificial IntelligenceApr-9-2024

Accurate 3D object detection is crucial to autonomous driving. Though LiDAR-based detectors have achieved impressive performance, the high cost of LiDAR sensors precludes their widespread adoption in affordable vehicles. Camera-based detectors are cheaper alternatives but often suffer inferior performance compared to their LiDAR-based counterparts due to inherent depth ambiguities in images. In this work, we seek to improve monocular 3D detectors by leveraging unlabeled historical LiDAR data. Specifically, at inference time, we assume that the camera-based detectors have access to multiple unlabeled LiDAR scans from past traversals at locations of interest (potentially from other high-end vehicles equipped with LiDAR sensors). Under this setup, we proposed a novel, simple, and end-to-end trainable framework, termed AsyncDepth, to effectively extract relevant features from asynchronous LiDAR traversals of the same location for monocular 3D detectors. We show consistent and significant performance gain (up to 9 AP) across multiple state-of-the-art models and datasets with a negligible additional latency of 9.66 ms and a small storage cost.

artificial intelligence, detection, detector, (18 more...)

arXiv.org Artificial Intelligence

2404.05139

Country: North America > United States > New York (0.14)

Genre: Research Report (0.69)

Industry:

Transportation > Ground > Road (0.51)
Information Technology (0.49)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.49)

Add feedback

Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment

Mall, Utkarsh, Phoo, Cheng Perng, Liu, Meilin Kelsey, Vondrick, Carl, Hariharan, Bharath, Bala, Kavita

arXiv.org Artificial IntelligenceDec-11-2023

We introduce a method to train vision-language models for remote-sensing images without using any textual annotations. Our key insight is to use co-located internet imagery taken on the ground as an intermediary for connecting remote-sensing images and language. Specifically, we train an image encoder for remote sensing images to align with the image encoder of CLIP using a large amount of paired internet and satellite images. Our unsupervised approach enables the training of a first-of-its-kind large-scale vision language model (VLM) for remote sensing images at two different resolutions. We show that these VLMs enable zero-shot, open-vocabulary image classification, retrieval, segmentation and visual question answering for satellite images. On each of these tasks, our VLM trained without textual annotations outperforms existing VLMs trained with supervision, with gains of up to 20% for classification and 80% for segmentation. Our planet is constantly captured by an extensive array of remote sensors such as satellites or drones. These earth observation images enable the monitoring of various events on the earth such as deforestation, forest fires, and droughts so that rapid actions can be taken to protect our environment. While these images can shed light on various insights about our planet, the scale of such data is huge. This has prompted the development of automatic analysis models that could extract relevant information from a large amount of remotely sensed images. While useful, these models are often specialized and can only recognize a pre-defined set of concepts. Besides, they could be complex, decreasing their accessibility to experts outside of the domain of artificial intelligence. Researchers developing automatic analysis methods for internet imagery encountered a similar problem a few years ago. One promising solution is to leverage large-scale vision-language models (VLMs) that are trained on millions or even billions of text-image pairs collected on the internet (Radford et al., 2021; Li et al., 2023). These models have demonstrated remarkable abilities to perform open-vocabulary recognition (Gu et al., 2022; Kuo et al., 2023) and enhance accessibility to non-AI experts (Alayrac et al., 2022; Surís et al., 2023). It would be incredibly valuable for a range of applications to replicate the success of openvocabulary recognition for satellite images as well, allowing an analyst to simply query, say, "Where are all the farmlands in the state of Massachusetts?" without requiring any new training or annotation for farms.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2312.0696

Country: North America > United States > Massachusetts (0.24)

Genre: Research Report > Promising Solution (0.34)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Accurate Differential Operators for Hybrid Neural Fields

Chetan, Aditya, Yang, Guandao, Wang, Zichen, Marschner, Steve, Hariharan, Bharath

arXiv.org Artificial IntelligenceDec-10-2023

Neural fields have become widely used in various fields, from shape representation to neural rendering, and for solving partial differential equations (PDEs). With the advent of hybrid neural field representations like Instant NGP that leverage small MLPs and explicit representations, these models train quickly and can fit large scenes. Yet in many applications like rendering and simulation, hybrid neural fields can cause noticeable and unreasonable artifacts. This is because they do not yield accurate spatial derivatives needed for these downstream applications. In this work, we propose two ways to circumvent these challenges. Our first approach is a post hoc operator that uses local polynomial-fitting to obtain more accurate derivatives from pre-trained hybrid neural fields. Additionally, we also propose a self-supervised fine-tuning approach that refines the neural field to yield accurate derivatives directly while preserving the initial signal. We show the application of our method on rendering, collision simulation, and solving PDEs. We observe that using our approach yields more accurate derivatives, reducing artifacts and leading to more accurate simulations in downstream applications.

artificial intelligence, machine learning, neural field, (15 more...)

arXiv.org Artificial Intelligence

2312.05984

Country:

North America > United States (0.28)
Asia > Japan > Honshū > Chūbu (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.87)

Add feedback

Pre-Training LiDAR-Based 3D Object Detectors Through Colorization

Pan, Tai-Yu, Ma, Chenyang, Chen, Tianle, Phoo, Cheng Perng, Luo, Katie Z, You, Yurong, Campbell, Mark, Weinberger, Kilian Q., Hariharan, Bharath, Chao, Wei-Lun

arXiv.org Artificial IntelligenceOct-23-2023

Accurate 3D object detection and understanding for self-driving cars heavily relies on LiDAR point clouds, necessitating large amounts of labeled data to train. In this work, we introduce an innovative pre-training approach, Grounded Point Colorization (GPC), to bridge the gap between data and labels by teaching the model to colorize LiDAR point clouds, equipping it with valuable semantic cues. To tackle challenges arising from color variations and selection bias, we incorporate color as "context" by providing ground-truth colors as hints during colorization. Even with limited labeled data, GPC significantly improves finetuning performance; notably, on just 20% of the KITTI dataset, GPC outperforms training from scratch with the entire dataset. In sum, we introduce a fresh perspective on pre-training for 3D object detection, aligning the objective with the model's intended role and ultimately advancing the accuracy and efficiency of 3D object detection for autonomous vehicles. Detecting objects such as vehicles and pedestrians in 3D is crucial for self-driving cars to operate safely. Mainstream 3D object detectors (Shi et al., 2019; 2020b; Zhu et al., 2020; He et al., 2020a) take LiDAR point clouds as input, which provide precise 3D signals of the surrounding environment. However, training a detector needs a lot of labeled data. The expensive process of curating annotated data has motivated the community to investigate model pre-training using unlabeled data that can be collected easily. Most of the existing pre-training methods are built upon contrastive learning (Yin et al., 2022; Xie et al., 2020; Zhang et al., 2021; Huang et al., 2021; Liang et al., 2021), inspired by its success in 2D recognition (Chen et al., 2020a; He et al., 2020b). The key novelties, however, are often limited to how the positive and negative data pairs are constructed. This paper attempts to go beyond contrastive learning by providing a new perspective on pre-training 3D object detectors. We rethink pre-training's role in how it could facilitate the downstream fine-tuning with labeled data.

artificial intelligence, gpc, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2310.14592

Country: North America > United States > Ohio (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Ground > Road (0.54)
Information Technology > Robotics & Automation (0.54)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Unsupervised Domain Adaptation for Self-Driving from Past Traversal Features

Zhang, Travis, Luo, Katie, Phoo, Cheng Perng, You, Yurong, Chao, Wei-Lun, Hariharan, Bharath, Campbell, Mark, Weinberger, Kilian Q.

arXiv.org Artificial IntelligenceSep-21-2023

The rapid development of 3D object detection systems for self-driving cars has significantly improved accuracy. However, these systems struggle to generalize across diverse driving environments, which can lead to safety-critical failures in detecting traffic participants. To address this, we propose a method that utilizes unlabeled repeated traversals of multiple locations to adapt object detectors to new driving environments. By incorporating statistics computed from repeated LiDAR scans, we guide the adaptation process effectively. Our approach enhances LiDAR-based detection models using spatial quantized historical features and introduces a lightweight regression head to leverage the statistics for feature regularization. Additionally, we leverage the statistics for a novel self-training process to stabilize the training. The framework is detector model-agnostic and experiments on real-world datasets demonstrate significant improvements, achieving up to a 20-point performance gain, especially in detecting pedestrians and distant objects. Code is available at https://github.com/zhangtravis/Hist-DA.

artificial intelligence, machine learning, traversal, (16 more...)

arXiv.org Artificial Intelligence

2309.1214

Country:

North America > United States > California (0.14)
North America > United States > New York (0.14)

Genre: Research Report (0.82)

Industry: Transportation > Ground > Road (0.71)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.91)

Add feedback

Distilling from Similar Tasks for Transfer Learning on a Budget

Borup, Kenneth, Phoo, Cheng Perng, Hariharan, Bharath

arXiv.org Artificial IntelligenceApr-24-2023

We address the challenge of getting efficient yet accurate recognition systems with limited labels. While recognition models improve with model size and amount of data, many specialized applications of computer vision have severe resource constraints both during training and inference. Transfer learning is an effective solution for training with few labels, however often at the expense of a computationally costly fine-tuning of large base models. We propose to mitigate this unpleasant trade-off between compute and accuracy via semi-supervised cross-domain distillation from a set of diverse source models. Initially, we show how to use task similarity metrics to select a single suitable source model to distill from, and that a good selection process is imperative for good downstream performance of a target model. We dub this approach DistillNearest. Though effective, DistillNearest assumes a single source model matches the target task, which is not always the case. To alleviate this, we propose a weighted multi-source distillation method to distill multiple source models trained on different domains weighted by their relevance for the target task into a single efficient model (named DistillWeighted). Our methods need no access to source data, and merely need features and pseudo-labels of the source models. When the goal is accurate recognition under computational constraints, both DistillNearest and DistillWeighted approaches outperform both transfer learning from strong ImageNet initializations as well as state-of-the-art semi-supervised techniques such as FixMatch. Averaged over 8 diverse target tasks our multi-source method outperforms the baselines by 5.6%-points and 4.5%-points, respectively.

artificial intelligence, deep learning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2304.12314

Genre: Research Report (0.82)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback