AITopics | Hamilton, Mark

Collaborating Authors

Hamilton, Mark

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Seeing Faces in Things: A Model and Dataset for Pareidolia

Hamilton, Mark, Stent, Simon, DuTell, Vasha, Harrington, Anne, Corbett, Jennifer, Rosenholtz, Ruth, Freeman, William T.

arXiv.org Artificial IntelligenceSep-24-2024

The human visual system is well-tuned to detect faces of all shapes and sizes. While this brings obvious survival advantages, such as a better chance of spotting unknown predators in the bush, it also leads to spurious face detections. "Face pareidolia" describes the perception of face-like structure among otherwise random stimuli: seeing faces in coffee stains or clouds in the sky. In this paper, we study face pareidolia from a computer vision perspective. We present an image dataset of "Faces in Things", consisting of five thousand web images with humanannotated pareidolic faces. Using this dataset, we examine the extent to which a state-of-the-art human face detector exhibits pareidolia, and find a significant behavioral gap between humans and machines. We find that the evolutionary need for humans to detect animal faces, as well as human faces, may explain some of this gap. Finally, we propose a simple statistical model of pareidolia in images. Through studies on human subjects and our pareidolic face detectors we confirm a key prediction of our model regarding what image conditions are most likely to induce pareidolia.

artificial intelligence, machine learning, pareidolia, (17 more...)

arXiv.org Artificial Intelligence

2409.16143

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.93)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.92)

Add feedback

Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language

Hamilton, Mark, Zisserman, Andrew, Hershey, John R., Freeman, William T.

arXiv.org Artificial IntelligenceJun-8-2024

We present DenseAV, a novel dual encoder grounding architecture that learns high-resolution, semantically meaningful, and audio-visually aligned features solely through watching videos. We show that DenseAV can discover the ``meaning'' of words and the ``location'' of sounds without explicit localization supervision. Furthermore, it automatically discovers and distinguishes between these two types of associations without supervision. We show that DenseAV's localization abilities arise from a new multi-head feature aggregation operator that directly compares dense image and audio representations for contrastive learning. In contrast, many other systems that learn ``global'' audio and video representations cannot localize words and sound. Finally, we contribute two new datasets to improve the evaluation of AV representations through speech and sound prompted semantic segmentation. On these and other datasets we show DenseAV dramatically outperforms the prior art on speech and sound prompted semantic segmentation. DenseAV outperforms the previous state-of-the-art, ImageBind, on cross-modal retrieval using fewer than half of the parameters. Project Page: \href{https://aka.ms/denseav}{https://aka.ms/denseav}

denseav, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2406.05629

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

FeatUp: A Model-Agnostic Framework for Features at Any Resolution

Fu, Stephanie, Hamilton, Mark, Brandt, Laura, Feldman, Axel, Zhang, Zhoutong, Freeman, William T.

arXiv.org Artificial IntelligenceApr-1-2024

High-res features can be learned either as a per-image implicit network or a general-purpose upsampling operation; the latter is a drop-in module to improve downstream dense prediction tasks. Deep features are a cornerstone of computer vision research, capturing image semantics and enabling the community to solve downstream tasks even in the zero-or few-shot regime. However, these features often lack the spatial resolution to directly perform dense prediction tasks like segmentation and depth prediction because models aggressively pool information over large areas. In this work, we introduce FeatUp, a task-and model-agnostic framework to restore lost spatial information in deep features. We introduce two variants of FeatUp: one that guides features with high-resolution signal in a single forward pass, and one that fits an implicit model to a single image to reconstruct features at any resolution. Both approaches use a multi-view consistency loss with deep analogies to NeRFs. Our features retain their original semantics and can be swapped into existing applications to yield resolution and performance gains even without re-training. We show that FeatUp significantly outperforms other feature upsampling and image super-resolution approaches in class activation map generation, transfer learning for segmentation and depth prediction, and end-to-end training for semantic segmentation. Despite their immense success, deep features often sacrifice spatial resolution for semantic quality. For example, ResNet-50 (He et al., 2015) produces 7 7 deep Work done while at MIT. FeatUp learns to upsample features through a consistency loss on low resolution "views" of a model's features that arise from slight transformations of the input image. Even Vision Transformers (ViTs) (Dosovitskiy et al., 2020) incur a significant resolution reduction, making it challenging to perform dense prediction tasks such as segmentation or depth estimation using these features alone. To mitigate these issues, we propose FeatUp: a novel framework to improve the resolution of any vision model's features without changing their original "meaning" or orientation. Our primary insight, inspired by 3D reconstruction frameworks like NeRF (Mildenhall et al., 2020), is that multiview consistency of low-resolution signals can supervise the construction of high-resolution signals. More specifically, we learn high-resolution information by aggregating low resolution views from a model's outputs across multiple "jittered" (e.g. Our work explores two architectures for upsampling: a single guided upsampling feedforward network that generalizes across images, and an implicit representation overfit to a single image.

artificial intelligence, featup, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2403.10516

Country:

North America > United States (1.00)
Asia > Middle East > Israel (0.14)

Genre: Research Report (1.00)

Industry: Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Large-Scale Automatic Audiobook Creation

Walsh, Brendan, Hamilton, Mark, Newby, Greg, Wang, Xi, Ruan, Serena, Zhao, Sheng, He, Lei, Zhang, Shaofei, Dettinger, Eric, Freeman, William T., Weimer, Markus

arXiv.org Artificial IntelligenceSep-7-2023

An audiobook can dramatically improve a work of literature's accessibility and improve reader engagement. However, audiobooks can take hundreds of hours of human effort to create, edit, and publish. In this work, we present a system that can automatically generate high-quality audiobooks from online e-books. In particular, we leverage recent advances in neural text-to-speech to create and release thousands of human-quality, open-license audiobooks from the Project Gutenberg e-book collection. Our method can identify the proper subset of e-book content to read for a wide collection of diversely structured books and can operate on hundreds of books in parallel. Our system allows users to customize an audiobook's speaking speed and style, emotional intonation, and can even match a desired voice using a small amount of sample audio. This work contributed over five thousand open-license audiobooks and an interactive demo that allows users to quickly create their own customized audiobooks. To listen to the audiobook collection visit \url{https://aka.ms/audiobook}.

artificial intelligence, audiobook, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2309.03926

Genre: Research Report (0.40)

Industry: Media > Publishing (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.39)

Add feedback

MultiEarth 2023 -- Multimodal Learning for Earth and Environment Workshop and Challenge

Cha, Miriam, Angelides, Gregory, Hamilton, Mark, Soszynski, Andy, Swenson, Brandon, Maidel, Nathaniel, Isola, Phillip, Perron, Taylor, Freeman, Bill

arXiv.org Artificial IntelligenceJun-7-2023

The Multimodal Learning for Earth and Environment Workshop (MultiEarth 2023) is the second annual CVPR workshop aimed at the monitoring and analysis of the health of Earth ecosystems by leveraging the vast amount of remote sensing data that is continuously being collected. The primary objective of this workshop is to bring together the Earth and environmental science communities as well as the multimodal representation learning communities to explore new ways of harnessing technological advancements in support of environmental monitoring. The MultiEarth Workshop also seeks to provide a common benchmark for processing multimodal remote sensing information by organizing public challenges focused on monitoring the Amazon rainforest. These challenges include estimating deforestation, detecting forest fires, translating synthetic aperture radar (SAR) images to the visible domain, and projecting environmental trends. This paper presents the challenge guidelines, datasets, and evaluation metrics. Our challenge website is available at https://sites.google.com/view/rainforest-challenge/multiearth-2023.

artificial intelligence, dataset, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2306.04738

Country:

North America > United States (1.00)
Europe (0.68)

Genre: Instructional Material > Course Syllabus & Notes (0.86)

Industry: Government > Regional Government > North America Government > United States Government (0.95)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.94)
Information Technology > Artificial Intelligence > Vision (0.68)

Add feedback

Exploring Gender and Race Biases in the NFT Market

Zhong, Howard, Hamilton, Mark

arXiv.org Artificial IntelligenceMar-29-2023

Non-Fungible Tokens (NFTs) are non-interchangeable assets, usually digital art, which are stored on the blockchain. Preliminary studies find that female and darker-skinned NFTs are valued less than their male and lighter-skinned counterparts. However, these studies analyze only the CryptoPunks collection. We test the statistical significance of race and gender biases in the prices of CryptoPunks and present the first study of gender bias in the broader NFT market. We find evidence of racial bias but not gender bias. Our work also introduces a dataset of gender-labeled NFT collections to advance the broader study of social equity in this emerging market.

artificial intelligence, cryptopunk, nft, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.frl.2023.103651

2304.06484

Country:

Asia (0.47)
North America (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Banking & Finance > Trading (1.00)
Information Technology > Services > e-Commerce Services (0.94)

Technology:

Information Technology > e-Commerce > Financial Technology (0.50)
Information Technology > Data Science (0.47)
Information Technology > Artificial Intelligence (0.46)

Add feedback

It Is Likely That Your Loss Should be a Likelihood

Hamilton, Mark, Shelhamer, Evan, Freeman, William T.

arXiv.org Machine LearningOct-2-2020

Many common loss functions such as mean-squared-error, cross-entropy, and reconstruction loss are unnecessarily rigid. Under a probabilistic interpretation, these common losses correspond to distributions with fixed shapes and scales. We instead argue for optimizing full likelihoods that include parameters like the normal variance and softmax temperature. Joint optimization of these "likelihood parameters" with model parameters can adaptively tune the scales and shapes of losses in addition to the strength of regularization. We explore and systematically evaluate how to parameterize and apply likelihood parameters for robust modeling, outlier-detection, and re-calibration. Additionally, we propose adaptively tuning $L_2$ and $L_1$ weights by fitting the scale parameters of normal and Laplace priors and introduce more flexible element-wise regularizers.

deep learning, likelihood parameter, neural network, (21 more...)

arXiv.org Machine Learning

2007.06059

Country:

North America > United States > New York (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)
(2 more...)

Add feedback

Conditional Image Retrieval

Hamilton, Mark, Fu, Stephanie, Lu, Mindren, Freeman, William T.

arXiv.org Machine LearningSep-18-2020

This work introduces Conditional Image Retrieval (CIR) systems: IR methods that can efficiently specialize to specific subsets of images on the fly. These systems broaden the class of queries IR systems support, and eliminate the need for expensive re-fitting to specific subsets of data. Specifically, we adapt tree-based K-Nearest Neighbor (KNN) data-structures to the conditional setting by introducing additional inverted-index data-structures. This speeds conditional queries and does not slow queries without conditioning. We present two new datasets for evaluating the performance of CIR systems and evaluate a variety of design choices. As a motivating application, we present an algorithm that can explore shared semantic content between works of art of vastly different media and cultural origin. Finally, we demonstrate that CIR data-structures can identify Generative Adversarial Network (GAN) "blind spots": areas where GANs fail to properly model the true data distribution.

dataset, deep learning, neural network, (22 more...)

arXiv.org Machine Learning

2007.07177

Country: North America > United States (0.29)

Genre: Research Report (0.83)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
(2 more...)

Add feedback

Large-Scale Intelligent Microservices

Hamilton, Mark, Gonsalves, Nick, Lee, Christina, Raman, Anand, Walsh, Brendan, Prasad, Siddhartha, Banda, Dalitso, Zhang, Lucy, Zhang, Lei, Freeman, William T.

arXiv.org Artificial IntelligenceSep-16-2020

Deploying Machine Learning (ML) algorithms within databases is a challenge due to the varied computational footprints of modern ML algorithms and the myriad of database technologies each with their own restrictive syntax. We introduce an Apache Spark-based micro-service orchestration framework that extends database operations to include web service primitives. Our system can orchestrate web services across hundreds of machines and takes full advantage of cluster, thread, and asynchronous parallelism. Using this framework, we provide large scale clients for intelligent services such as speech, vision, search, anomaly detection, and text analysis. This allows users to integrate ready-to-use intelligence into any datastore with an Apache Spark connector. To eliminate the majority of overhead from network communication, we also introduce a low-latency containerized version of our architecture. Finally, we demonstrate that the services we investigate are competitive on a variety of benchmarks, and present two applications of this framework to create intelligent search engines, and real time auto race analytics systems.

cognitive service, information management, text processing, (23 more...)

arXiv.org Artificial Intelligence

2009.08044

Country:

North America > United States > Massachusetts (0.28)
North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report (0.50)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(6 more...)

Add feedback

MMLSpark: Unifying Machine Learning Ecosystems at Massive Scales

Hamilton, Mark, Raghunathan, Sudarshan, Matiach, Ilya, Schonhoffer, Andrew, Raman, Anand, Barzilay, Eli, Thigpen, Minsoo, Rajendran, Karthik, Mahajan, Janhavi Suresh, Cochrane, Courtney, Eswaran, Abhiram, Green, Ari

arXiv.org Machine LearningOct-19-2018

We introduce Microsoft Machine Learning for Apache Spark (MMLSpark), an ecosystem of enhancements that expand the Apache Spark distributed computing library to tackle problems in Deep Learning, Micro-Service Orchestration, Gradient Boosting, Model Interpretability, and other areas of modern computation. Furthermore, we present a novel system called Spark Serving that allows users to run any Apache Spark program as a distributed, sub-millisecond latency web service backed by their existing Spark Cluster. All MMLSpark contributions have the same API to enable simple composition across frameworks and usage across batch, streaming, and RESTful web serving scenarios on static, elastic, or serverless clusters. We showcase MMLSpark by creating a method for deep object detection capable of learning without human labeled data and demonstrate its effectiveness for Snow Leopard conservation.

dataset, deep learning, neural network, (20 more...)

arXiv.org Machine Learning

1810.08744

Country: North America > United States (0.69)

Genre: Research Report (0.50)

Industry:

Education (0.64)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.49)

Add feedback