AITopics | Sketch Understanding

d43621ff2dfe39d298dcd4a41937c912-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-11-2025, 12:50:06 GMT

Sketching is a powerful tool for creating abstract images that are sparse but meaningful. Sketch understanding poses fundamental challenges for generalpurpose vision algorithms because it requires robustness to the sparsity of sketches relative to natural visual inputs and because it demands tolerance for semantic ambiguity, as sketches can reliably evoke multiple meanings. While current vision algorithms have achieved high performance on a variety of visual tasks, it remains unclear to what extent they understand sketches in a human-like way. Here we introduce SEVA, a new benchmark dataset containing approximately 90K humangenerated sketches of 128 object concepts produced under different time constraints, and thus systematically varying in sparsity. We evaluated a suite of state-of-the-art vision algorithms on their ability to correctly identify the target concept depicted in these sketches and to generate responses that are strongly aligned with human response patterns on the same sketch recognition task. We found that vision algorithms that better predicted human sketch recognition performance also better approximated human uncertainty about sketch meaning, but there remains a sizable gap between model and human response patterns. To explore the potential of models that emulate human visual abstraction in generative tasks, we conducted further evaluations of a recently developed sketch generation algorithm [91] capable of generating sketches that vary in sparsity. We hope that public release of this dataset and evaluation protocol will catalyze progress towards algorithms with enhanced capacities for human-like visual abstraction.

artificial intelligence, image understanding, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Sketch Understanding (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

d43621ff2dfe39d298dcd4a41937c912-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsOct-6-2024, 02:45:08 GMT

Sketching is a powerful tool for creating abstract images that are sparse but meaningful. Sketch understanding poses fundamental challenges for generalpurpose vision algorithms because it requires robustness to the sparsity of sketches relative to natural visual inputs and because it demands tolerance for semantic ambiguity, as sketches can reliably evoke multiple meanings. While current vision algorithms have achieved high performance on a variety of visual tasks, it remains unclear to what extent they understand sketches in a human-like way. Here we introduce SEVA, a new benchmark dataset containing approximately 90K humangenerated sketches of 128 object concepts produced under different time constraints, and thus systematically varying in sparsity. We evaluated a suite of state-of-the-art vision algorithms on their ability to correctly identify the target concept depicted in these sketches and to generate responses that are strongly aligned with human response patterns on the same sketch recognition task. We found that vision algorithms that better predicted human sketch recognition performance also better approximated human uncertainty about sketch meaning, but there remains a sizable gap between model and human response patterns. To explore the potential of models that emulate human visual abstraction in generative tasks, we conducted further evaluations of a recently developed sketch generation algorithm [91] capable of generating sketches that vary in sparsity. We hope that public release of this dataset and evaluation protocol will catalyze progress towards algorithms with enhanced capacities for human-like visual abstraction.

artificial intelligence, image understanding, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Sketch Understanding (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Hybrid Primal Sketch: Combining Analogy, Qualitative Representations, and Computer Vision for Scene Understanding

Forbus, Kenneth D., Chen, Kezhen, Xu, Wangcheng, Usher, Madeline

arXiv.org Artificial IntelligenceJul-5-2024

One of the purposes of perception is to bridge between sensors and conceptual understanding. Marr's Primal Sketch combined initial edge-finding with multiple downstream processes to capture aspects of visual perception such as grouping and stereopsis. Given the progress made in multiple areas of AI since then, we have developed a new framework inspired by Marr's work, the Hybrid Primal Sketch, which combines computer vision components into an ensemble to produce sketch-like entities which are then further processed by CogSketch, our model of high-level human vision, to produce both more detailed shape representations and scene representations which can be used for data-efficient learning via analogical generalization. This paper describes our theoretical framework, summarizes several previous experiments, and outlines a new experiment in progress on diagram understanding.

artificial intelligence, qualitative reasoning, representation, (17 more...)

arXiv.org Artificial Intelligence

2407.04859

Country:

North America > United States > Illinois > Cook County > Evanston (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.64)

Industry:

Education > Educational Technology (0.46)
Energy > Oil & Gas (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision > Sketch Understanding (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Qualitative Reasoning (0.93)

Add feedback

Hausdorff Distance Matching with Adaptive Query Denoising for Rotated Detection Transformer

Lee, Hakjin, Song, Minki, Koo, Jamyoung, Seo, Junghoon

arXiv.org Artificial IntelligenceNov-29-2023

The Detection Transformer (DETR) has emerged as a pivotal role in object detection tasks, setting new performance benchmarks due to its end-to-end design and scalability. Despite its advancements, the application of DETR in detecting rotated objects has demonstrated suboptimal performance relative to established oriented object detectors. Our analysis identifies a key limitation: the L1 cost used in Hungarian Matching leads to duplicate predictions due to the square-like problem in oriented object detection, thereby obstructing the training process of the detector. We introduce a Hausdorff distance-based cost for Hungarian matching, which more accurately quantifies the discrepancy between predictions and ground truths. Moreover, we note that a static denoising approach hampers the training of rotated DETR, particularly when the detector's predictions surpass the quality of noised ground truths. We propose an adaptive query denoising technique, employing Hungarian matching to selectively filter out superfluous noised queries that no longer contribute to model improvement. Our proposed modifications to DETR have resulted in superior performance, surpassing previous rotated DETR models and other alternatives. This is evidenced by our model's state-of-the-art achievements in benchmarks such as DOTA-v1.0/v1.5/v2.0, and DIOR-R.

artificial intelligence, detection, image understanding, (19 more...)

arXiv.org Artificial Intelligence

2305.07598

Country: Europe > Switzerland (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry: Leisure & Entertainment > Sports (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision > Sketch Understanding (0.62)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.62)

Add feedback

DSS: Synthesizing long Digital Ink using Data augmentation, Style encoding and Split generation

Timofeev, Aleksandr, Fadeeva, Anastasiia, Afonin, Andrei, Musat, Claudiu, Maksai, Andrii

arXiv.org Artificial IntelligenceNov-29-2023

As text generative models can give increasingly long answers, we tackle the problem of synthesizing long text in digital ink. We show that the commonly used models for this task fail to generalize to long-form data and how this problem can be solved by augmenting the training data, changing the model architecture and the inference procedure. These methods use contrastive learning technique and are tailored specifically for the handwriting domain. They can be applied to any encoder-decoder model that works with digital ink. We demonstrate that our method reduces the character error rate on long-form English data by half compared to baseline RNN and by 16% compared to the previous approach that aims at addressing the same problem. We show that all three parts of the method improve recognizability of generated inks. In addition, we evaluate synthesized data in a human study and find that people perceive most of generated data as real.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-41685-9_14

2311.17786

Country:

North America > United States (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision > Handwriting Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Vision > Sketch Understanding (0.82)

Add feedback

Gromov-Hausdorff Distances for Comparing Product Manifolds of Model Spaces

Borde, Haitz Saez de Ocariz, Arroyo, Alvaro, Morales, Ismael, Posner, Ingmar, Dong, Xiaowen

arXiv.org Artificial IntelligenceSep-9-2023

Recent studies propose enhancing machine learning models by aligning the geometric characteristics of the latent space with the underlying data structure. Instead of relying solely on Euclidean space, researchers have suggested using hyperbolic and spherical spaces with constant curvature, or their combinations (known as product manifolds), to improve model performance. However, there exists no principled technique to determine the best latent product manifold signature, which refers to the choice and dimensionality of manifold components. To address this, we introduce a novel notion of distance between candidate latent geometries using the Gromov-Hausdorff distance from metric geometry. We propose using a graph search space that uses the estimated Gromov-Hausdorff distances to search for the optimal latent geometry. In this work we focus on providing a description of an algorithm to compute the Gromov-Hausdorff distance between model spaces and its computational implementation.

artificial intelligence, gromov-hausdorff distance, machine learning, (2 more...)

arXiv.org Artificial Intelligence

2309.05678

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision > Sketch Understanding (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)

Add feedback

Sampling and Ranking for Digital Ink Generation on a tight computational budget

Afonin, Andrei, Maksai, Andrii, Timofeev, Aleksandr, Musat, Claudiu

arXiv.org Artificial IntelligenceJun-2-2023

Digital ink (online handwriting) generation has a number of potential applications for creating user-visible content, such as handwriting autocompletion, spelling correction, and beautification. Writing is personal and usually the processing is done on-device. Ink generative models thus need to produce high quality content quickly, in a resource constrained environment. In this work, we study ways to maximize the quality of the output of a trained digital ink generative model, while staying within an inference time budget. We use and compare the effect of multiple sampling and ranking techniques, in the first ablation study of its kind in the digital ink domain. We confirm our findings on multiple datasets - writing in English and Vietnamese, as well as mathematical formulas - using two model types and two common ink data representations. In all combinations, we report a meaningful improvement in the recognizability of the synthetic inks, in some cases more than halving the character error rate metric, and describe a way to select the optimal combination of sampling and ranking techniques for any given computational budget.

artificial intelligence, machine learning, ranking model, (16 more...)

arXiv.org Artificial Intelligence

2306.03103

Country: Europe > Switzerland (0.46)

Genre: Research Report (0.71)

Technology:

Information Technology > Artificial Intelligence > Vision > Sketch Understanding (1.00)
Information Technology > Artificial Intelligence > Vision > Handwriting Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Document-Level Multi-Event Extraction with Event Proxy Nodes and Hausdorff Distance Minimization

Wang, Xinyu, Gui, Lin, He, Yulan

arXiv.org Artificial IntelligenceMay-30-2023

Document-level multi-event extraction aims to extract the structural information from a given document automatically. Most recent approaches usually involve two steps: (1) modeling entity interactions; (2) decoding entity interactions into events. However, such approaches ignore a global view of inter-dependency of multiple events. Moreover, an event is decoded by iteratively merging its related entities as arguments, which might suffer from error propagation and is computationally inefficient. In this paper, we propose an alternative approach for document-level multi-event extraction with event proxy nodes and Hausdorff distance minimization. The event proxy nodes, representing pseudo-events, are able to build connections with other event proxy nodes, essentially capturing global information. The Hausdorff distance makes it possible to compare the similarity between the set of predicted events and the set of ground-truth events. By directly minimizing Hausdorff distance, the model is trained towards the global optimum directly, which improves performance and reduces training time. Experimental results show that our model outperforms previous state-of-the-art method in F1-score on two datasets with only a fraction of training time.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.18926

Country: North America > United States > Minnesota (0.28)

Genre:

Research Report > New Finding (0.34)
Research Report > Promising Solution (0.34)

Industry: Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision > Sketch Understanding (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.87)

Add feedback

Learning from Neighboring Strokes: Combining Appearance and Context for Multi-Domain Sketch Recognition

Neural Information Processing SystemsApr-6-2023, 14:02:42 GMT

We propose a new sketch recognition framework that combines a rich representation of low level visual appearance with a graphical model for capturing high level relationships between symbols. This joint model of appearance and context allows our framework to be less sensitive to noise and drawing variations, improving accuracy and robustness. The result is a recognizer that is better able to handle the wide range of drawing styles found in messy freehand sketches. We evaluate our work on two real-world domains, molecular diagrams and electrical circuit diagrams, and show that our combined approach significantly improves recognition performance.

artificial intelligence, combining appearance and context, multi-domain sketch recognition, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Human Computer Interaction > Interfaces (0.69)
Information Technology > Artificial Intelligence > Vision > Sketch Understanding (0.69)

Add feedback

GitHub - N0vel/weighted-hausdorff-distance-tensorflow-keras-loss: Weighted Hausdorff Distance Loss: use it as point cloud similarity metric based loss for keras and tf. Useful in keypoint detection.

#artificialintelligenceJun-2-2022, 18:05:33 GMT

Weighted Hausdorff Distance Loss: use it as a point cloud similarity metric based loss for keras and tf. This loss requires a huge tensor with dimensions (number_of_pixels * number_of_keypoints if I remember correctly) of float values. So high res picture with thousands of keypoints will consume A LOT of GPU memory (at least 1 GB for 512 pixels x 512 pixels x 1000 keypoints with float32 type). It doesn't matter if you want to detect only several points in an image.

artificial intelligence, inteligência artificial, weighted hausdorff distance loss, (8 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Vision > Sketch Understanding (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)

Add feedback

Filters

Collaborating Authors

Sketch Understanding

d43621ff2dfe39d298dcd4a41937c912-Paper-Datasets_and_Benchmarks.pdf

d43621ff2dfe39d298dcd4a41937c912-Paper-Datasets_and_Benchmarks.pdf

Hybrid Primal Sketch: Combining Analogy, Qualitative Representations, and Computer Vision for Scene Understanding

Hausdorff Distance Matching with Adaptive Query Denoising for Rotated Detection Transformer

DSS: Synthesizing long Digital Ink using Data augmentation, Style encoding and Split generation

Gromov-Hausdorff Distances for Comparing Product Manifolds of Model Spaces

Sampling and Ranking for Digital Ink Generation on a tight computational budget

Document-Level Multi-Event Extraction with Event Proxy Nodes and Hausdorff Distance Minimization

Learning from Neighboring Strokes: Combining Appearance and Context for Multi-Domain Sketch Recognition

GitHub - N0vel/weighted-hausdorff-distance-tensorflow-keras-loss: Weighted Hausdorff Distance Loss: use it as point cloud similarity metric based loss for keras and tf. Useful in keypoint detection.