AITopics

2304.01008

Genre: Overview (1.00)

Industry:

Education (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

arXiv.org Artificial IntelligenceJun-4-2023

Spatial Implicit Neural Representations for Global-Scale Species Mapping

Cole, Elijah, Van Horn, Grant, Lange, Christian, Shepard, Alexander, Leary, Patrick, Perona, Pietro, Loarie, Scott, Mac Aodha, Oisin

Estimating the geographical range of a species from sparse observations is a challenging and important geospatial prediction problem. Given a set of locations where a species has been observed, the goal is to build a model to predict whether the species is present or absent at any location. This problem has a long history in ecology, but traditional methods struggle to take advantage of emerging large-scale crowdsourced datasets which can include tens of millions of records for hundreds of thousands of species. In this work, we use Spatial Implicit Neural Representations (SINRs) to jointly estimate the geographical range of 47k species simultaneously. We find that our approach scales gracefully, making increasingly better predictions as we increase the number of species and the amount of data per species when training. To make this problem accessible to machine learning researchers, we provide four new benchmarks that measure different aspects of species range estimation and spatial representation learning. Using these benchmarks, we demonstrate that noisy and biased crowdsourced data can be combined with implicit neural representations to approximate expert-developed range maps for many species.

artificial intelligence, machine learning, spatial implicit neural representation, (16 more...)

2306.02564

Country:

North America > United States > New York (0.14)
North America > United States > Hawaii (0.14)

Genre: Research Report (1.00)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceApr-5-2023

Vision Learners Meet Web Image-Text Pairs

Zhao, Bingchen, Cui, Quan, Wu, Hao, Yoshie, Osamu, Yang, Cheng, Mac Aodha, Oisin

Most recent self-supervised learning methods are pre-trained on the well-curated ImageNet-1K dataset. In this work, given the excellent scalability of web data, we consider self-supervised pre-training on noisy web sourced image-text paired data. First, we conduct a benchmark study of representative self-supervised pre-training methods on large-scale web data in a like-for-like setting. We compare a range of methods, including single-modal ones that use masked training objectives and multi-modal ones that use image-text constrastive training. We observe that existing multi-modal methods do not outperform their single-modal counterparts on vision transfer learning tasks. We derive an information-theoretical view to explain these benchmark results, which provides insight into how to design a novel vision learner. Inspired by this insight, we present a new visual representation pre-training method, MUlti-modal Generator~(MUG), that learns from scalable web sourced image-text data. MUG achieves state-of-the-art transfer performance on a variety of tasks and demonstrates promising scaling properties. Pre-trained models and code will be made public upon acceptance.

artificial intelligence, machine learning, representation, (19 more...)

2301.07088

Country: Asia (0.14)

Genre:

Personal (0.46)
Research Report (0.40)
Instructional Material (0.34)

Industry:

Leisure & Entertainment (0.93)
Transportation > Infrastructure & Services (0.93)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

arXiv.org Artificial IntelligenceJul-21-2022

Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset

Van Horn, Grant, Qian, Rui, Wilber, Kimberly, Adam, Hartwig, Mac Aodha, Oisin, Belongie, Serge

We present a new benchmark dataset, Sapsucker Woods 60 (SSW60), for advancing research on audiovisual fine-grained categorization. While our community has made great strides in fine-grained visual categorization on images, the counterparts in audio and video fine-grained categorization are relatively unexplored. To encourage advancements in this space, we have carefully constructed the SSW60 dataset to enable researchers to experiment with classifying the same set of categories in three different modalities: images, audio, and video. The dataset covers 60 species of birds and is comprised of images from existing datasets, and brand new, expert-curated audio and video datasets. We thoroughly benchmark audiovisual classification performance and modality fusion experiments through the use of state-of-the-art transformer methods. Our findings show that performance of audiovisual fusion methods is better than using exclusively image or audio based methods for the task of video classification. We also present interesting modality transfer experiments, enabled by the unique construction of SSW60 to encompass three different modalities. We hope the SSW60 dataset and accompanying baselines spur research in this fascinating area.

artificial intelligence, machine learning, social media, (21 more...)

2207.10664

Country:

North America > United States > New York (0.14)
Europe (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(2 more...)

arXiv.org Artificial IntelligenceJul-20-2022

On Label Granularity and Object Localization

Cole, Elijah, Wilber, Kimberly, Van Horn, Grant, Yang, Xuan, Fornoni, Marco, Perona, Pietro, Belongie, Serge, Howard, Andrew, Mac Aodha, Oisin

Weakly supervised object localization (WSOL) aims to learn representations that encode object location using only image-level category labels. However, many objects can be labeled at different levels of granularity. Is it an animal, a bird, or a great horned owl? Which image-level labels should we use? In this paper we study the role of label granularity in WSOL. To facilitate this investigation we introduce iNat-Loc500, a new large-scale fine-grained benchmark dataset for WSOL. Surprisingly, we find that choosing the right training label granularity provides a much larger performance boost than choosing the best WSOL algorithm. We also show that changing the label granularity can significantly improve data efficiency.

artificial intelligence, dataset, machine learning, (15 more...)

2207.10225

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

arXiv.org Machine LearningJun-5-2018

Digging Into Self-Supervised Monocular Depth Estimation

Godard, Clément, Mac Aodha, Oisin, Brostow, Gabriel

Depth-sensing is important for both navigation and scene understanding. However, procuring RGB images with corresponding depth data for training deep models is challenging; large-scale, varied, datasets with ground truth training data are scarce. Consequently, several recent methods have proposed treating the training of monocular color-to-depth estimation networks as an image reconstruction problem, thus forgoing the need for ground truth depth. There are multiple concepts and design decisions for these networks that seem sensible, but give mixed or surprising results when tested. For example, binocular stereo as the source of self-supervision seems cumbersome and hard to scale, yet results are less blurry compared to training with monocular videos. Such decisions also interplay with questions about architectures, loss functions, image scales, and motion handling. In this paper, we propose a simple yet effective model, with several general architectural and loss innovations, that surpasses all other self-supervised depth estimation approaches on KITTI.

deep learning, estimation, video understanding, (20 more...)

1806.0126

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceMay-21-2018

Teaching Multiple Concepts to Forgetful Learners

Hunziker, Anette, Chen, Yuxin, Mac Aodha, Oisin, Rodriguez, Manuel Gomez, Krause, Andreas, Perona, Pietro, Yue, Yisong, Singla, Adish

How can we help a forgetful learner learn multiple concepts within a limited time frame? For long-term learning, it is crucial to devise teaching strategies that leverage the underlying forgetting mechanisms of the learners. In this paper, we cast the problem of adaptively teaching a forgetful learner as a novel discrete optimization problem, where we seek to optimize a natural objective function that characterizes the learner's expected performance throughout the teaching session. We then propose a simple greedy teaching strategy and derive strong performance guarantees based on two intuitive data-dependent parameters, which characterize the degree of diminishing returns of teaching each concept. We show that, given some assumptions of the learner's memory model, one can efficiently compute the performance bounds. Furthermore, we identify parameter settings of our memory models where greedy is guaranteed to achieve high performance. We have deployed our approach in two concrete applications, namely (1) an educational app for online vocabulary teaching and (2) an app for teaching novices how to recognize bird species. We demonstrate the effectiveness of our algorithm using simulations along with user studies.

artificial intelligence, learner, optimization problem, (16 more...)

1805.08322

Genre: Research Report (0.82)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningFeb-19-2018

Teaching Categories to Human Learners with Visual Explanations

Mac Aodha, Oisin, Su, Shihan, Chen, Yuxin, Perona, Pietro, Yue, Yisong

We study the problem of computer-assisted teaching with explanations. Conventional approaches for machine teaching typically only provide feedback at the instance level e.g., the category or label of the instance. However, it is intuitive that clear explanations from a knowledgeable teacher can significantly improve a student's ability to learn a new concept. To address these existing limitations, we propose a teaching framework that provides interpretable explanations as feedback and models how the learner incorporates this additional information. In the case of images, we show that we can automatically generate explanations that highlight the parts of the image that are responsible for the class label. Experiments on human learners illustrate that, on average, participants achieve better test set performance on challenging categorization tasks when taught with our interpretable approach compared to existing methods.

computer based training, educational technology, explanation, (23 more...)

1802.06924

Country: North America (0.14)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

arXiv.org Machine LearningNov-23-2017

Context Embedding Networks

Kim, Kun ho, Mac Aodha, Oisin, Perona, Pietro

Low dimensional embeddings that capture the main variations of interest in collections of data are important for many applications. One way to construct these embeddings is to acquire estimates of similarity from the crowd. However, similarity is a multi-dimensional concept that varies from individual to individual. Existing models for learning embeddings from the crowd typically make simplifying assumptions such as all individuals estimate similarity using the same criteria, the list of criteria is known in advance, or that the crowd workers are not influenced by the data that they see. To overcome these limitations we introduce Context Embedding Networks (CENs). In addition to learning interpretable embeddings from images, CENs also model worker biases for different attributes along with the visual context i.e. the visual attributes highlighted by a set of images. Experiments on two noisy crowd annotated datasets show that modeling both worker bias and visual context results in more interpretable embeddings compared to existing approaches.

dataset, deep learning, neural network, (19 more...)

1710.01691

Country: North America > United States (0.14)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Communications (0.94)
(2 more...)

arXiv.org Machine LearningApr-12-2017

Unsupervised Monocular Depth Estimation with Left-Right Consistency

Godard, Clément, Mac Aodha, Oisin, Brostow, Gabriel J.

Learning based methods have shown very promising results for the task of depth estimation in single images. However, most existing approaches treat depth prediction as a supervised regression problem and as a result, require vast quantities of corresponding ground truth depth data for training. Just recording quality depth data in a range of environments is a challenging problem. In this paper, we innovate beyond existing approaches, replacing the use of explicit depth data during training with easier-to-obtain binocular stereo footage. We propose a novel training objective that enables our convolutional neural network to learn to perform single image depth estimation, despite the absence of ground truth depth data. Exploiting epipolar geometry constraints, we generate disparity images by training our network with an image reconstruction loss. We show that solving for image reconstruction alone results in poor quality depth images. To overcome this problem, we propose a novel training loss that enforces consistency between the disparities produced relative to both the left and right images, leading to improved performance and robustness compared to existing approaches. Our method produces state of the art results for monocular depth estimation on the KITTI driving dataset, even outperforming supervised methods that have been trained with ground truth depth.

deep learning, disparity, neural network, (17 more...)

1609.03677

Country: Europe > United Kingdom > England (0.14)

Genre:

Research Report (0.50)
Instructional Material (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)