AITopics | Cohen, Scott

Plotting

Cohen, Scott

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication

Slyman, Eric, Lee, Stefan, Cohen, Scott, Kafle, Kushal

arXiv.org Artificial IntelligenceApr-24-2024

Recent dataset deduplication techniques have demonstrated that content-aware dataset pruning can dramatically reduce the cost of training Vision-Language Pretrained (VLP) models without significant performance losses compared to training on the original dataset. These results have been based on pruning commonly used image-caption datasets collected from the web -- datasets that are known to harbor harmful social biases that may then be codified in trained models. In this work, we evaluate how deduplication affects the prevalence of these biases in the resulting trained models and introduce an easy-to-implement modification to the recent SemDeDup algorithm that can reduce the negative effects that we observe. When examining CLIP-style models trained on deduplicated variants of LAION-400M, we find our proposed FairDeDup algorithm consistently leads to improved fairness metrics over SemDeDup on the FairFace and FACET datasets while maintaining zero-shot performance on CLIP benchmarks.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2404.16123

Country: North America > United States (1.00)

Genre: Research Report (0.82)

Industry:

Law (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction

Hua, Hang, Shi, Jing, Kafle, Kushal, Jenni, Simon, Zhang, Daoan, Collomosse, John, Cohen, Scott, Luo, Jiebo

arXiv.org Artificial IntelligenceApr-22-2024

Recent progress in large-scale pre-training has led to the development of advanced vision-language models (VLMs) with remarkable proficiency in comprehending and generating multimodal content. Despite the impressive ability to perform complex reasoning for VLMs, current models often struggle to effectively and precisely capture the compositional information on both the image and text sides. To address this, we propose FineMatch, a new aspect-based fine-grained text and image matching benchmark, focusing on text and image mismatch detection and correction. This benchmark introduces a novel task for boosting and evaluating the VLMs' compositionality for aspect-based fine-grained text and image matching. In this task, models are required to identify mismatched aspect phrases within a caption, determine the aspect's class, and propose corrections for an image-text pair that may contain between 0 and 3 mismatches. To evaluate the models' performance on this new task, we propose a new evaluation metric named ITM-IoU for which our experiments show a high correlation to human evaluation. In addition, we also provide a comprehensive experimental analysis of existing mainstream VLMs, including fully supervised learning and in-context learning settings. We have found that models trained on FineMatch demonstrate enhanced proficiency in detecting fine-grained text and image mismatches. Moreover, models (e.g., GPT-4V, Gemini Pro Vision) with strong abilities to perform multimodal in-context learning are not as skilled at fine-grained compositional image and text matching analysis. With FineMatch, we are able to build a system for text-to-image generation hallucination detection and correction.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2404.14715

Genre: Research Report (0.82)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
(2 more...)

Add feedback

Latent Feature-Guided Diffusion Models for Shadow Removal

Mei, Kangfu, Figueroa, Luis, Lin, Zhe, Ding, Zhihong, Cohen, Scott, Patel, Vishal M.

arXiv.org Artificial IntelligenceDec-4-2023

Motivated by the success of diffusionbased Recovering textures under shadows has remained a challenging image restoration models [38, 41], we adapt diffusion problem due to the difficulty of inferring shadowfree models for the task of shadow removal by conditioning on scenes from shadow images. In this paper, we propose the input shadow image and corresponding shadow mask as the use of diffusion models as they offer a promising approach a baseline approach to generate shadow-free images. However, to gradually refine the details of shadow regions preserving and generating high-fidelity textures and during the diffusion process. Our method improves this colors in the shadow region after removal is non-trivial. The process by conditioning on a learned latent feature space baseline model appears to favor borrowing textures from that inherits the characteristics of shadow-free images, thus the surrounding non-shadow areas rather than focusing on avoiding the limitation of conventional methods that condition restoring the original details underneath the shadow, which on degraded images only. Additionally, we propose results in incorrect color mixtures and loss of detail in the to alleviate potential local optima during training by fusing shadow region. In Figure 1, we show one of the representative noise features with the diffusion network. We demonstrate issues of image-mask conditioning, i.e., the model synthesizes the effectiveness of our approach which outperforms results containing an incorrect color mixture.

artificial intelligence, diffusion model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2312.02156

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Answering Questions about Data Visualizations using Efficient Bimodal Fusion

Kafle, Kushal, Shrestha, Robik, Price, Brian, Cohen, Scott, Kanan, Christopher

arXiv.org Artificial IntelligenceAug-5-2019

Chart question answering (CQA) is a newly proposed visual question answering (VQA) task where an algorithm must answer questions about data visualizations, e.g. bar charts, pie charts, and line graphs. CQA requires capabilities that natural-image VQA algorithms lack: fine-grained measurements, optical character recognition, and handling out-of-vocabulary words in both questions and answers. Without modifications, state-of-the-art VQA algorithms perform poorly on this task. Here, we propose a novel CQA algorithm called parallel recurrent fusion of image and language (PReFIL). PReFIL first learns bimodal embeddings by fusing question and image features and then intelligently aggregates these learned embeddings to answer the given question. Despite its simplicity, PReFIL greatly surpasses state-of-the art systems and human baselines on both the FigureQA and DVQA datasets. Additionally, we demonstrate that PReFIL can be used to reconstruct tables by asking a series of questions about a chart.

deep learning, neural network, prefil, (22 more...)

arXiv.org Artificial Intelligence

1908.01801

Genre: Research Report (1.00)

Technology:

Information Technology > Visualization (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
(2 more...)

Add feedback

Sherlock: Scalable Fact Learning in Images

Elhoseiny, Mohamed (Rutgers University) | Cohen, Scott (Adobe Research) | Chang, Walter (Adobe Research) | Price, Brian (Adobe Research) | Elgammal, Ahmed (Rutgers University)

AAAI ConferencesFeb-14-2017

We study scalable and uniform understanding of facts in images. Existing visual recognition systems are typically modeled differently for each fact type such as objects, actions, and interactions. We propose a setting where all these facts can be modeled simultaneously with a capacity to understand an unbounded number of facts in a structured way. The training data comes as structured facts in images, including (1) objects (e.g., <boy>), (2) attributes (e.g., <boy, tall>), (3) actions (e.g., <boy, playing>), and (4) interactions (e.g., <boy, riding, a horse >). Each fact has a semantic language view (e.g., < boy, playing>) and a visual view (an image with this fact). We show that learning visual facts in a structured way enables not only a uniform but also generalizable visual understanding. We propose and investigate recent and strong approaches from the multiview learning literature and also introduce two learning representation models as potential baselines. We applied the investigated methods on several datasets that we augmented with structured facts and a large scale dataset of more than 202,000 facts and 814,000 images. Our experiments show the advantage of relating facts by the structure by the proposed models compared to the designed baselines on bidirectional fact retrieval.

dataset, deep learning, neural network, (20 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country: North America > Canada > British Columbia (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

SURGE: Surface Regularized Geometry Estimation from a Single Image

Wang, Peng, Shen, Xiaohui, Russell, Bryan, Cohen, Scott, Price, Brian, Yuille, Alan L.

Neural Information Processing SystemsDec-31-2016

This paper introduces an approach to regularize 2.5D surface normal and depth predictions at each pixel given a single input image. The approach infers and reasons about the underlying 3D planar surfaces depicted in the image to snap predicted normals and depths to inferred planar surfaces, all while maintaining fine detail within objects. Our approach comprises two components: (i) a fourstream convolutional neural network (CNN) where depths, surface normals, and likelihoods of planar region and planar boundary are predicted at each pixel, followed by (ii) a dense conditional random field (DCRF) that integrates the four predictions such that the normals and depths are compatible with each other and regularized by the planar region and planar boundary information. The DCRF is formulated such that gradients can be passed to the surface normal and depth CNNs via backpropagation. In addition, we propose new planar wise metrics to evaluate geometry consistency within planar surfaces, which are more tightly related to dependent 3D editing applications. We show that our regularization yields a 30% relative improvement in planar consistency on the NYU v2 dataset.

deep learning, neural network, prediction, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.14)
Europe > Spain (0.14)
Asia > Japan (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback