AITopics

2410.10143

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas > Upstream (0.35)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Kumar, Ashutosh, Kaushal, Sarthak, Murthy, Shiv Vignesh

MoonMetaSync: Lunar Image Registration Analysis

arXiv.org Artificial IntelligenceOct-14-2024

This paper compares scale-invariant (SIFT) and scale-variant (ORB) feature detection methods, alongside our novel feature detector, IntFeat, specifically applied to lunar imagery. We evaluate these methods using low (128x128) and high-resolution (1024x1024) lunar image patches, providing insights into their performance across scales in challenging extraterrestrial environments. IntFeat combines high-level features from SIFT and low-level features from ORB into a single vector space for robust lunar image registration. We introduce SyncVision, a Python package that compares lunar images using various registration methods, including SIFT, ORB, and IntFeat. Our analysis includes upscaling low-resolution lunar images using bi-linear and bi-cubic interpolation, offering a unique perspective on registration effectiveness across scales and feature detectors in lunar landscapes. This research contributes to computer vision and planetary science by comparing feature detection methods for lunar imagery and introducing a versatile tool for lunar image registration and evaluation, with implications for multi-resolution image analysis in space exploration applications.

artificial intelligence, machine learning, pattern recognition, (19 more...)

2410.11118

Country: North America > United States > New York > Monroe County > Rochester (0.05)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.95)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.86)

arXiv.org Artificial IntelligenceOct-13-2024

ChartKG: A Knowledge-Graph-Based Representation for Chart Images

Zhou, Zhiguang, Wang, Haoxuan, Zhao, Zhengqing, Zheng, Fengling, Wang, Yongheng, Chen, Wei, Wang, Yong

Chart images, such as bar charts, pie charts, and line charts, are explosively produced due to the wide usage of data visualizations. Accordingly, knowledge mining from chart images is becoming increasingly important, which can benefit downstream tasks like chart retrieval and knowledge graph completion. However, existing methods for chart knowledge mining mainly focus on converting chart images into raw data and often ignore their visual encodings and semantic meanings, which can result in information loss for many downstream tasks. In this paper, we propose ChartKG, a novel knowledge graph (KG) based representation for chart images, which can model the visual elements in a chart image and semantic relations among them including visual encodings and visual insights in a unified manner. Further, we develop a general framework to convert chart images to the proposed KG-based representation. It integrates a series of image processing techniques to identify visual elements and relations, e.g., CNNs to classify charts, yolov5 and optical character recognition to parse charts, and rule-based methods to construct graphs. We present four cases to illustrate how our knowledge-graph-based representation can model the detailed visual elements and semantic relations in charts, and further demonstrate how our approach can benefit downstream applications such as semantic-aware chart retrieval and chart question answering. We also conduct quantitative evaluations to assess the two fundamental building blocks of our chart-to-KG framework, i.e., object recognition and optical character recognition. The results provide support for the usefulness and effectiveness of ChartKG.

chart image, machine learning, pattern recognition, (22 more...)

2410.09761

Country:

Asia > China > Zhejiang Province > Hangzhou (0.05)
Europe > United Kingdom > England (0.04)
Europe > Ukraine (0.04)
(7 more...)

Genre: Research Report (1.00)

Industry:

Education (0.68)
Health & Medicine (0.46)

Technology:

Information Technology > Visualization (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
(4 more...)

Neural Information Processing SystemsOct-11-2024, 02:35:47 GMT

Recurrent Registration Neural Networks for Deformable Image Registration

Parametric spatial transformation models have been successfully applied to image registration tasks. In such models, the transformation of interest is parameterized by a fixed set of basis functions as for example B-splines. Each basis function is located on a fixed regular grid position among the image domain because the transformation of interest is not known in advance. As a consequence, not all basis functions will necessarily contribute to the final transformation which results in a non-compact representation of the transformation. For each element in the sequence, a local deformation defined by its position, shape, and weight is computed by our recurrent registration neural network.

deformable image registration, recurrent registration neural network, transformation, (7 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.64)

Turnbull, Robert, Fitzgerald, Emily, Thompson, Karen, Birch, Joanne L.

Hespi: A pipeline for automatically detecting information from hebarium specimen sheets

arXiv.org Artificial IntelligenceOct-11-2024

Specimen associated biodiversity data are sought after for biological, environmental, climate, and conservation sciences. A rate shift is required for the extraction of data from specimen images to eliminate the bottleneck that the reliance on human-mediated transcription of these data represents. We applied advanced computer vision techniques to develop the `Hespi' (HErbarium Specimen sheet PIpeline), which extracts a pre-catalogue subset of collection data on the institutional labels on herbarium specimens from their digital images. The pipeline integrates two object detection models; the first detects bounding boxes around text-based labels and the second detects bounding boxes around text-based data fields on the primary institutional label. The pipeline classifies text-based institutional labels as printed, typed, handwritten, or a combination and applies Optical Character Recognition (OCR) and Handwritten Text Recognition (HTR) for data extraction. The recognized text is then corrected against authoritative databases of taxon names. The extracted text is also corrected with the aide of a multimodal Large Language Model (LLM). Hespi accurately detects and extracts text for test datasets including specimen sheet images from international herbaria. The components of the pipeline are modular and users can train their own models with their own data and use them in place of the models provided.

large language model, machine learning, pattern recognition, (23 more...)

2410.0874

Country:

Europe (0.28)
North America > United States > Texas (0.14)

Genre: Research Report (0.40)

Industry:

Leisure & Entertainment (0.46)
Energy > Oil & Gas (0.46)
Media > Photography (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Neural Information Processing SystemsOct-10-2024, 19:08:19 GMT

Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition

Vision Transformers (ViT) have achieved remarkable success in large-scale image recognition. They split every 2D image into a fixed number of patches, each of which is treated as a token. Generally, representing an image with more tokens would lead to higher prediction accuracy, while it also results in drastically increased computational cost. To achieve a decent trade-off between accuracy and speed, the number of tokens is empirically set to 16x16 or 14x14. In this paper, we argue that every image has its own characteristics, and ideally the token number should be conditioned on each individual input.

dynamic transformer, efficient image recognition, github, (1 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.63)

Neural Information Processing SystemsOct-10-2024, 17:49:49 GMT

This Looks Like That: Deep Learning for Interpretable Image Recognition

When we are faced with challenging image classification tasks, we often explain our reasoning by dissecting the image, and pointing out prototypical aspects of one class or another. The mounting evidence for each of the classes helps us make our final decision. In this work, we introduce a deep network architecture -- prototypical part network (ProtoPNet), that reasons in a similar way: the network dissects the image by finding prototypical parts, and combines evidence from the prototypes to make a final classification. The model thus reasons in a way that is qualitatively similar to the way ornithologists, physicians, and others would explain to people on how to solve challenging image classification tasks. The network uses only image-level labels for training without any annotations for parts of images.

deep learning, interpretable image recognition, protopnet, (3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Neural Information Processing SystemsOct-10-2024, 01:36:22 GMT

Arbicon-Net: Arbitrary Continuous Geometric Transformation Networks for Image Registration

This paper concerns the undetermined problem of estimating geometric transformation between image pairs. Recent methods introduce deep neural networks to predict the controlling parameters of hand-crafted geometric transformation models (e.g. However, the low-dimension parametric models are incapable of estimating a highly complex geometric transform with limited flexibility to model the actual geometric deformation from image pairs. To address this issue, we present an end-to-end trainable deep neural networks, named Arbitrary Continuous Geometric Transformation Networks (Arbicon-Net), to directly predict the dense displacement field for pairwise image alignment. Arbicon-Net is generalized from training data to predict the desired arbitrary continuous geometric transformation in a data-driven manner for unseen new pair of images.

arbicon-net, arbitrary continuous geometric transformation network, image registration, (2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.45)

Neural Information Processing SystemsOct-9-2024, 23:29:11 GMT

Multiscale Deep Equilibrium Models

We propose a new class of implicit networks, the multiscale deep equilibrium model (MDEQ), suited to large-scale and highly hierarchical pattern recognition domains. An MDEQ directly solves for and backpropagates through the equilibrium points of multiple feature resolutions simultaneously, using implicit differentiation to avoid storing intermediate states (and thus requiring only O(1) memory consumption). These simultaneously-learned multi-resolution features allow us to train a single model on a diverse set of tasks and loss functions, such as using a single MDEQ to perform both image classification and semantic segmentation. We illustrate the effectiveness of this approach on two large-scale vision tasks: ImageNet classification and semantic segmentation on high-resolution images from the Cityscapes dataset. In both settings, MDEQs are able to match or exceed the performance of recent competitive computer vision models: the first time such performance and scale have been achieved by an implicit deep learning approach.

classification and semantic segmentation, mdeq, multiscale deep equilibrium model

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.64)

arXiv.org Artificial IntelligenceOct-8-2024

Understanding with toy surrogate models in machine learning

Páez, Andrés

Unlike regular models, these very simple models--often referred to as toy models--are not required to be linked to the real world through structural similarity or resemblance relations. They are not meant to be approximations of the target world system, and in some cases, they are not even required to be representational. In semantic terms, they do not accurately map onto their targets. Despite these limitations, they are still useful in understanding theoretical concepts and possible configurations of the target system. Paradigmatic examples of toy models include Boyle's law and the Ising model in physics, the Lotka-Volterra model in population ecology, and the Schelling model in the social sciences (Weisberg, 2013). In recent years, philosophers of science have become interested in toy models (Grüne-Yanoff, 2009; Luczak, 2017; Reutlinger et al., 2018; Frigg & Nguyen, 2017; Nguyen, 2020). The main purpose of this literature is to explore the nature of these models and examine how they perform their epistemic function. Despite lacking the regular descriptive and predictive features of full-scale scientific models, they often offer an elementary understanding of a phenomenon. Their definitions of "toy model" differ as well as their assessment of the importance of representation in modelling generally, but they all agree that toy models play an important epistemic role in scientific research, exploration, and pedagogy.

representation, surrogate model, toy model, (17 more...)

2410.05675

Country:

North America > United States > New York (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)
North America > United States > Illinois > Cook County > Chicago (0.04)
(8 more...)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Endocrinology (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.69)
(5 more...)