AITopics | Saenko, Kate

Collaborating Authors

Saenko, Kate

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

NewsStories: Illustrating articles with visual summaries

Tan, Reuben, Plummer, Bryan A., Saenko, Kate, Lewis, JP, Sud, Avneesh, Leung, Thomas

arXiv.org Artificial IntelligenceAug-14-2022

Recent self-supervised approaches have used large-scale image-text datasets to learn powerful representations that transfer to many tasks without finetuning. These methods often assume that there is one-to-one correspondence between its images and their (short) captions. However, many tasks require reasoning about multiple images and long text narratives, such as describing news articles with visual summaries. Thus, we explore a novel setting where the goal is to learn a self-supervised visual-language representation that is robust to varying text length and the number of images. In addition, unlike prior work which assumed captions have a literal relation to the image, we assume images only contain loose illustrative correspondence with the text. To explore this problem, we introduce a large-scale multimodal dataset containing over 31M articles, 22M images and 1M videos. We show that state-of-the-art image-text alignment methods are not robust to longer narratives with multiple images. Finally, we introduce an intuitive baseline that outperforms these methods on zero-shot image-set retrieval by 10% on the GoodNews dataset.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2207.13061

Country:

North America > United States (1.00)
Europe (1.00)
Asia > Middle East > Republic of Türkiye (0.68)
South America (0.67)

Genre:

Personal (1.00)
Research Report > New Finding (0.46)

Industry:

Transportation > Air (1.00)
Media > News (1.00)
Leisure & Entertainment > Sports > Soccer (1.00)
(13 more...)

Technology:

Information Technology > Communications > Social Media (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning

Hessel, Jack, Hwang, Jena D., Park, Jae Sung, Zellers, Rowan, Bhagavatula, Chandra, Rohrbach, Anna, Saenko, Kate, Choi, Yejin

arXiv.org Artificial IntelligenceJul-25-2022

Humans have remarkable capacity to reason abductively and hypothesize about what lies beyond the literal content of an image. By identifying concrete visual clues scattered throughout a scene, we almost can't help but draw probable inferences beyond the literal scene based on our everyday experience and knowledge about the world. For example, if we see a "20 mph" sign alongside a road, we might assume the street sits in a residential area (rather than on a highway), even if no houses are pictured. Can machines perform similar visual reasoning? We present Sherlock, an annotated corpus of 103K images for testing machine capacity for abductive reasoning beyond literal image contents. We adopt a free-viewing paradigm: participants first observe and identify salient clues within images (e.g., objects, actions) and then provide a plausible inference about the scene, given the clue. In total, we collect 363K (clue, inference) pairs, which form a first-of-its-kind abductive visual reasoning dataset. Using our corpus, we test three complementary axes of abductive reasoning. We evaluate the capacity of models to: i) retrieve relevant inferences from a large candidate corpus; ii) localize evidence for inferences via bounding boxes, and iii) compare plausible inferences to match human judgments on a newly-collected diagnostic corpus of 19K Likert-scale judgments. While we find that fine-tuning CLIP-RN50x64 with a multitask objective outperforms strong baselines, significant headroom exists between model performance and human agreement. Data, models, and leaderboard available at http://visualabduction.com/

abductive reasoning, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2202.048

Country: North America > United States (1.00)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment (0.93)
Transportation > Ground (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Abductive Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Explaining Reinforcement Learning Policies through Counterfactual Trajectories

Frost, Julius, Watkins, Olivia, Weiner, Eric, Abbeel, Pieter, Darrell, Trevor, Plummer, Bryan, Saenko, Kate

arXiv.org Artificial IntelligenceJan-28-2022

In order for humans to confidently decide where to employ RL agents for real-world tasks, a human developer must validate that the agent will perform well at test-time. Some policy interpretability methods facilitate this by capturing the policy's decision making in a set of agent rollouts. However, even the most informative trajectories of training time behavior may give little insight into the agent's behavior out of distribution. In contrast, our method conveys how the agent performs under distribution shifts by showing the agent's behavior across a wider trajectory distribution. We generate these trajectories by guiding the agent to more diverse unseen states and showing the agent's behavior there. In a user study, we demonstrate that our method enables users to score better than baseline methods on one of two agent validation tasks.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2201.12462

Genre: Research Report > Experimental Study (0.46)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.89)

Add feedback

Extending the WILDS Benchmark for Unsupervised Adaptation

Sagawa, Shiori, Koh, Pang Wei, Lee, Tony, Gao, Irena, Xie, Sang Michael, Shen, Kendrick, Kumar, Ananya, Hu, Weihua, Yasunaga, Michihiro, Marklund, Henrik, Beery, Sara, David, Etienne, Stavness, Ian, Guo, Wei, Leskovec, Jure, Saenko, Kate, Hashimoto, Tatsunori, Levine, Sergey, Finn, Chelsea, Liang, Percy

arXiv.org Artificial IntelligenceDec-9-2021

Machine learning systems deployed in the wild are often trained on a source distribution but deployed on a different target distribution. Unlabeled data can be a powerful point of leverage for mitigating these distribution shifts, as it is frequently much more available than labeled data. However, existing distribution shift benchmarks for unlabeled data do not reflect the breadth of scenarios that arise in real-world applications. In this work, we present the WILDS 2.0 update, which extends 8 of the 10 datasets in the WILDS benchmark of distribution shifts to include curated unlabeled data that would be realistically obtainable in deployment. To maintain consistency, the labeled training, validation, and test sets, as well as the evaluation metrics, are exactly the same as in the original WILDS benchmark. These datasets span a wide range of applications (from histology to wildlife conservation), tasks (classification, regression, and detection), and modalities (photos, satellite images, microscope slides, text, molecular graphs). We systematically benchmark state-of-the-art methods that leverage unlabeled data, including domain-invariant, self-training, and self-supervised methods, and show that their success on WILDS 2.0 is limited. To facilitate method development and evaluation, we provide an open-source package that automates data loading and contains all of the model architectures and methods used in this paper. Code and leaderboards are available at https://wilds.stanford.edu.

artificial intelligence, diagnostic medicine, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2112.0509

Country:

Europe (1.00)
North America > United States > California > Santa Clara County > Palo Alto (0.25)
Asia > Japan > Honshū > Kantō (0.14)
North America > Canada > Saskatchewan > Saskatoon (0.14)

Genre: Research Report > New Finding (0.93)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.92)
Education (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition

Panda, Rameswar, Chen, Chun-Fu, Fan, Quanfu, Sun, Ximeng, Saenko, Kate, Oliva, Aude, Feris, Rogerio

arXiv.org Artificial IntelligenceMay-12-2021

Multi-modal learning, which focuses on utilizing various modalities to improve the performance of a model, is widely used in video recognition. While traditional multi-modal learning offers excellent recognition results, its computational expense limits its impact for many real-world applications. In this paper, we propose an adaptive multi-modal learning framework, called AdaMML, that selects on-the-fly the optimal modalities for each segment conditioned on the input for efficient video recognition. Specifically, given a video segment, a multi-modal policy network is used to decide what modalities should be used for processing by the recognition model, with the goal of improving both accuracy and efficiency. We efficiently train the policy network jointly with the recognition model using standard back-propagation. Extensive experiments on four challenging diverse datasets demonstrate that our proposed adaptive approach yields 35%-55% reduction in computation when compared to the traditional baseline that simply uses all the modalities irrespective of the input, while also achieving consistent improvements in accuracy over the state-of-the-art methods.

deep learning, modality, neural network, (18 more...)

arXiv.org Artificial Intelligence

2105.05165

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Auxiliary Task Reweighting for Minimum-data Learning

Shi, Baifeng, Hoffman, Judy, Saenko, Kate, Darrell, Trevor, Xu, Huijuan

arXiv.org Machine LearningOct-16-2020

Supervised learning requires a large amount of training data, limiting its application where labeled data is scarce. To compensate for data scarcity, one possible method is to utilize auxiliary tasks to provide additional supervision for the main task. Assigning and optimizing the importance weights for different auxiliary tasks remains an crucial and largely understudied research question. In this work, we propose a method to automatically reweight auxiliary tasks in order to reduce the data requirement on the main task. Specifically, we formulate the weighted likelihood function of auxiliary tasks as a surrogate prior for the main task. By adjusting the auxiliary task weights to minimize the divergence between the surrogate prior and the true prior of the main task, we obtain a more accurate prior estimation, achieving the goal of minimizing the required amount of training data for the main task and avoiding a costly grid search. In multiple experimental settings (e.g. semi-supervised learning, multi-label classification), we demonstrate that our algorithm can effectively utilize limited labeled data of the main task with the benefit of auxiliary tasks compared with previous task reweighting methods. We also show that under extreme cases with only a few extra examples (e.g. few-shot domain adaptation), our algorithm results in significant improvement over the baseline.

auxiliary task, deep learning, neural network, (16 more...)

arXiv.org Machine Learning

2010.08244

Country: North America > United States > California (0.14)

Genre: Research Report (0.90)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News

Tan, Reuben, Saenko, Kate, Plummer, Bryan A.

arXiv.org Artificial IntelligenceSep-22-2020

Large-scale dissemination of disinformation online intended to mislead or deceive the general population is a major societal problem. Rapid progression in image, video, and natural language generative models has only exacerbated this situation and intensified our need for an effective defense mechanism. While existing approaches have been proposed to defend against neural fake news, they are generally constrained to the very limited setting where articles only have text and metadata such as the title and authors. In this paper, we introduce the more realistic and challenging task of defending against machine-generated news that also includes images and captions. To identify the possible weaknesses that adversaries can exploit, we create a NeuralNews dataset composed of 4 different types of generated articles as well as conduct a series of human user study experiments based on this dataset. In addition to the valuable insights gleaned from our user study experiments, we provide a relatively effective approach based on detecting visual-semantic inconsistencies, which will serve as an effective first line of defense and a useful reference for future work in defending against machine-generated disinformation.

artificial intelligence, caption, text processing, (20 more...)

arXiv.org Artificial Intelligence

2009.07698

Country:

Europe > United Kingdom (1.00)
North America > United States > New York (0.28)

Genre:

Questionnaire & Opinion Survey (0.97)
Research Report > New Finding (0.46)

Industry:

Media > News (1.00)
Leisure & Entertainment > Sports > Baseball (1.00)
Government > Regional Government > Europe Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Shapeshifter Networks: Cross-layer Parameter Sharing for Scalable and Effective Deep Learning

Plummer, Bryan A., Dryden, Nikoli, Frost, Julius, Hoefler, Torsten, Saenko, Kate

arXiv.org Machine LearningJun-18-2020

We present Shapeshifter Networks (SSNs), a flexible neural network framework that improves performance and reduces memory requirements on a diverse set of scenarios over standard neural networks. Our approach is based on the observation that many neural networks are severely overparameterized, resulting in significant waste in computational resources as well as being susceptible to overfitting. SSNs address this by learning where and how to share parameters between layers in a neural network while avoiding degenerate solutions that result in underfitting. Specifically, we automatically construct parameter groups that identify where parameter sharing is most beneficial. Then, we map each group's weights to construct layers with learned combinations of candidates from a shared parameter pool. SSNs can share parameters across layers even when they have different sizes, perform different operations, and/or operate on features from different modalities. We evaluate our approach on a diverse set of tasks, including image classification, bidirectional image-sentence retrieval, and phrase grounding, creating high performing models even when using as little as 1% of the parameters. We also apply SSNs to knowledge distillation, where we obtain state-of-the-art results when combined with traditional distillation methods.

deep learning, neural network, ssn, (16 more...)

arXiv.org Machine Learning

2006.10598

Country: North America > United States (0.28)

Genre:

Research Report > Promising Solution (0.46)
Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Black-box Explanation of Object Detectors via Saliency Maps

Petsiuk, Vitali, Jain, Rajiv, Manjunatha, Varun, Morariu, Vlad I., Mehra, Ashutosh, Ordonez, Vicente, Saenko, Kate

arXiv.org Artificial IntelligenceJun-4-2020

We propose D-RISE, a method for generating visual explanations for the predictions of object detectors. D-RISE can be considered "black-box" in the software testing sense, it only needs access to the inputs and outputs of an object detector. Compared to gradient-based methods, D-RISE is more general and agnostic to the particular type of object detector being tested as it does not need to know about the inner workings of the model. We show that D-RISE can be easily applied to different object detectors including one-stage detectors such as YOLOv3 and two-stage detectors such as Faster-RCNN. We present a detailed analysis of the generated visual explanations to highlight the utilization of context and the possible biases learned by object detectors.

air transportation, neural network, saliency map, (18 more...)

arXiv.org Artificial Intelligence

2006.03204

Country:

Europe (1.00)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.68)

Industry:

Transportation > Air (0.61)
Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Learning visual servo policies via planner cloning

Viereck, Ulrich, Saenko, Kate, Platt, Robert

arXiv.org Artificial IntelligenceMay-24-2020

This algorithm differs from Visual servoing in novel environments is an important AGGREVATE because problem. Given images produced by a camera, a visual servo it incorporates the value control policy guides a grasped part into a desired pose penalties and from DQfD relative to the environment. This problem appears in many because it uses supervised situations: reaching, grasping, peg insertion, stacking, machine targets rather than TD assembly tasks, etc. Whereas classical approaches to the targets. We compare PQC problem [6, 3, 27] typically make strong assumptions about the with several baselines and environment (fiducials, known object geometries, etc.), there algorithm ablations and has been a surge of interest recently in using deep learning show that it outperforms methods to solve these problems in more unstructured settings all these variations on two that incorporate novel objects [29, 14, 26, 8, 21, 28, 12, 13].

algorithm, deep learning, neural network, (22 more...)

arXiv.org Artificial Intelligence

2005.1181

Country: Asia (0.28)

Genre: Research Report (0.83)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback