AITopics

2408.05894

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
North America > United States > Massachusetts > Norfolk County > Wellesley (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Italy > Tuscany > Florence (0.04)

Genre: Research Report > New Finding (0.89)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Al-Tahan, Haider, Garrido, Quentin, Balestriero, Randall, Bouchacourt, Diane, Hazirbas, Caner, Ibrahim, Mark

UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling

arXiv.org Artificial IntelligenceAug-8-2024

Significant research efforts have been made to scale and improve vision-language model (VLM) training approaches. Yet, with an ever-growing number of benchmarks, researchers are tasked with the heavy burden of implementing each protocol, bearing a non-trivial computational cost, and making sense of how all these benchmarks translate into meaningful axes of progress. To facilitate a systematic evaluation of VLM progress, we introduce UniBench: a unified implementation of 50+ VLM benchmarks spanning a comprehensive range of carefully categorized capabilities from object recognition to spatial awareness, counting, and much more. We showcase the utility of UniBench for measuring progress by evaluating nearly 60 publicly available vision-language models, trained on scales of up to 12.8B samples. We find that while scaling training data or model size can boost many vision-language model capabilities, scaling offers little benefit for reasoning or relations. Surprisingly, we also discover today's best VLMs struggle on simple digit recognition and counting tasks, e.g. MNIST, which much simpler networks can solve. Where scale falls short, we find that more precise interventions, such as data quality or tailored-learning objectives offer more promise. For practitioners, we also offer guidance on selecting a suitable VLM for a given application. Finally, we release an easy-to-run UniBench code-base with the full set of 50+ benchmarks and comparisons across 59 models as well as a distilled, representative set of benchmarks that runs in 5 minutes on a single GPU.

benchmark, digit, unibench, (12 more...)

2408.0481

Country:

Europe > Spain > Andalusia > Granada Province > Granada (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Diagnostic Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.51)
(2 more...)

arXiv.org Artificial IntelligenceAug-7-2024

A comparative study of generative adversarial networks for image recognition algorithms based on deep learning and traditional methods

Zhong, Yihao, Wei, Yijing, Liang, Yingbin, Liu, Xiqing, Ji, Rongwei, Cang, Yiru

In this paper, an image recognition algorithm based on the combination of deep learning and generative adversarial network (GAN) is studied, and compared with traditional image recognition methods. The purpose of this study is to evaluate the advantages and application prospects of deep learning technology, especially GAN, in the field of image recognition. Firstly, this paper reviews the basic principles and techniques of traditional image recognition methods, including the classical algorithms based on feature extraction such as SIFT, HOG and their combination with support vector machine (SVM), random forest, and other classifiers. Then, the working principle, network structure, and unique advantages of GAN in image generation and recognition are introduced. In order to verify the effectiveness of GAN in image recognition, a series of experiments are designed and carried out using multiple public image data sets for training and testing. The experimental results show that compared with traditional methods, GAN has excellent performance in processing complex images, recognition accuracy, and anti-noise ability. Specifically, Gans are better able to capture high-dimensional features and details of images, significantly improving recognition performance. In addition, Gans shows unique advantages in dealing with image noise, partial missing information, and generating high-quality images.

discriminator, generator, image recognition, (14 more...)

2408.03568

Country: North America > United States > New York (0.05)

Genre:

Research Report > New Finding (0.66)
Overview > Innovation (0.46)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceAug-5-2024

A Review on Organ Deformation Modeling Approaches for Reliable Surgical Navigation using Augmented Reality

Han, Zheng, Dou, Qi

Augmented Reality (AR) holds the potential to revolutionize surgical procedures by allowing surgeons to visualize critical structures within the patient's body. This is achieved through superimposing preoperative organ models onto the actual anatomy. Challenges arise from dynamic deformations of organs during surgery, making preoperative models inadequate for faithfully representing intraoperative anatomy. To enable reliable navigation in augmented surgery, modeling of intraoperative deformation to obtain an accurate alignment of the preoperative organ model with the intraoperative anatomy is indispensable. Despite the existence of various methods proposed to model intraoperative organ deformation, there are still few literature reviews that systematically categorize and summarize these approaches. This review aims to fill this gap by providing a comprehensive and technical-oriented overview of modeling methods for intraoperative organ deformation in augmented reality in surgery. Through a systematic search and screening process, 112 closely relevant papers were included in this review. By presenting the current status of organ deformation modeling methods and their clinical applications, this review seeks to enhance the understanding of organ deformation modeling in AR-guided surgery, and discuss the potential topics for future advancements.

deformation, registration, surgery, (12 more...)

doi: 10.1080/24699322.2024.2357164

2408.02713

Country:

North America > United States (0.04)
Asia > China > Hong Kong (0.04)
Europe > Sweden > Uppsala County > Uppsala (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Surgery (1.00)
(4 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(5 more...)

arXiv.org Artificial IntelligenceAug-3-2024

CAR: Contrast-Agnostic Deformable Medical Image Registration with Contrast-Invariant Latent Regularization

Wang, Yinsong, Du, Siyi, Zheng, Shaoming, Luo, Xinzhe, Qin, Chen

Multi-contrast image registration is a challenging task due to the complex intensity relationships between different imaging contrasts. Conventional image registration methods are typically based on iterative optimizations for each input image pair, which is time-consuming and sensitive to contrast variations. While learning-based approaches are much faster during the inference stage, due to generalizability issues, they typically can only be applied to the fixed contrasts observed during the training stage. In this work, we propose a novel contrast-agnostic deformable image registration framework that can be generalized to arbitrary contrast images, without observing them during training. Particularly, we propose a random convolution-based contrast augmentation scheme, which simulates arbitrary contrasts of images over a single image contrast while preserving their inherent structural information. To ensure that the network can learn contrast-invariant representations for facilitating contrast-agnostic registration, we further introduce contrast-invariant latent regularization (CLR) that regularizes representation in latent space through a contrast invariance loss. Experiments show that CAR outperforms the baseline approaches regarding registration accuracy and also possesses better generalization ability to unseen imaging contrasts. Code is available at \url{https://github.com/Yinsong0510/CAR}.

image registration, registration, representation, (14 more...)

2408.05341

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Singapore (0.04)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.47)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

arXiv.org Artificial IntelligenceAug-1-2024

Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval

Zeng, Gangyan, Zhang, Yuan, Wei, Jin, Yang, Dongbao, Zhang, Peng, Gao, Yiwen, Qin, Xugong, Zhou, Yu

Scene text retrieval aims to find all images containing the query text from an image gallery. Current efforts tend to adopt an Optical Character Recognition (OCR) pipeline, which requires complicated text detection and/or recognition processes, resulting in inefficient and inflexible retrieval. Different from them, in this work we propose to explore the intrinsic potential of Contrastive Language-Image Pre-training (CLIP) for OCR-free scene text retrieval. Through empirical analysis, we observe that the main challenges of CLIP as a text retriever are: 1) limited text perceptual scale, and 2) entangled visual-semantic concepts. To this end, a novel model termed FDP (Focus, Distinguish, and Prompt) is developed. FDP first focuses on scene text via shifting the attention to the text area and probing the hidden text knowledge, and then divides the query text into content word and function word for processing, in which a semantic-aware prompting scheme and a distracted queries assistance module are utilized. Extensive experiments show that FDP significantly enhances the inference speed while achieving better or competitive retrieval accuracy compared to existing methods. Notably, on the IIIT-STR benchmark, FDP surpasses the state-of-the-art model by 4.37% with a 4 times faster speed. Furthermore, additional experiments under phrase-level and attribute-aware scene text retrieval settings validate FDP's particular advantages in handling diverse forms of query text. The source code will be publicly available at https://github.com/Gyann-z/FDP.

query text, retrieval, scene text, (13 more...)

2408.00441

Country:

Oceania > Australia > Victoria > Melbourne (0.15)
Asia > China > Jiangsu Province > Nanjing (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report > Promising Solution (0.54)

Industry: Consumer Products & Services > Food, Beverage, Tobacco & Cannabis > Beverages (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.49)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.48)

A Prior Embedding-Driven Architecture for Long Distance Blind Iris Recognition

Xiong, Qi, Zhang, Xinman, Shen, Jun

Blind iris images, which result from unknown degradation during the process of iris recognition at long distances, often lead to decreased iris recognition rates. Currently, little existing literature offers a solution to this problem. In response, we propose a prior embedding-driven architecture for long distance blind iris recognition. We first proposed a blind iris image restoration network called Iris-PPRGAN. To effectively restore the texture of the blind iris, Iris-PPRGAN includes a Generative Adversarial Network (GAN) used as a Prior Decoder, and a DNN used as the encoder. To extract iris features more efficiently, we then proposed a robust iris classifier by modifying the bottleneck module of InsightFace, which called Insight-Iris. A low-quality blind iris image is first restored by Iris-PPRGAN, then the restored iris image undergoes recognition via Insight-Iris. Experimental results on the public CASIA-Iris-distance dataset demonstrate that our proposed method significantly superior results to state-of-the-art blind iris restoration methods both quantitatively and qualitatively, Specifically, the recognition rate for long-distance blind iris images reaches 90% after processing with our methods, representing an improvement of approximately ten percentage points compared to images without restoration.

classifier, iris image, recognition, (13 more...)

2408.0021

Country:

Asia > China > Shaanxi Province > Xi'an (0.04)
Oceania > Australia > New South Wales > Wollongong (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

From Attributes to Natural Language: A Survey and Foresight on Text-based Person Re-identification

Jiang, Fanzhi, Yang, Su, Jones, Mark W., Zhang, Liumei

Text-based person re-identification (Re-ID) is a challenging topic in the field of complex multimodal analysis, its ultimate aim is to recognize specific pedestrians by scrutinizing attributes/natural language descriptions. Despite the wide range of applicable areas such as security surveillance, video retrieval, person tracking, and social media analytics, there is a notable absence of comprehensive reviews dedicated to summarizing the text-based person Re-ID from a technical perspective. To address this gap, we propose to introduce a taxonomy spanning Evaluation, Strategy, Architecture, and Optimization dimensions, providing a comprehensive survey of the text-based person Re-ID task. We start by laying the groundwork for text-based person Re-ID, elucidating fundamental concepts related to attribute/natural language-based identification. Then a thorough examination of existing benchmark datasets and metrics is presented. Subsequently, we further delve into prevalent feature extraction strategies employed in text-based person Re-ID research, followed by a concise summary of common network architectures within the domain. Prevalent loss functions utilized for model optimization and modality alignment in text-based person Re-ID are also scrutinized. To conclude, we offer a concise summary of our findings, pinpointing challenges in text-based person Re-ID. In response to these challenges, we outline potential avenues for future open-set text-based person Re-ID and present a baseline architecture for text-based pedestrian image generation-guided re-identification(TBPGR).

computer vision, person re-identification, wang, (12 more...)

2408.00096

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Switzerland (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
(11 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.94)

Invariant Discovery of Features Across Multiple Length Scales: Applications in Microscopy and Autonomous Materials Characterization

Raghavan, Aditya, Pratiush, Utkarsh, Valleti, Mani, Liu, Richard, Emery, Reece, Funakubo, Hiroshi, Liu, Yongtao, Rack, Philip, Kalinin, Sergei

Physical imaging is a foundational characterization method in areas from condensed matter physics and chemistry to astronomy and spans length scales from atomic to universe. Images encapsulate crucial data regarding atomic bonding, materials microstructures, and dynamic phenomena such as microstructural evolution and turbulence, among other phenomena. The challenge lies in effectively extracting and interpreting this information. Variational Autoencoders (VAEs) have emerged as powerful tools for identifying underlying factors of variation in image data, providing a systematic approach to distilling meaningful patterns from complex datasets. However, a significant hurdle in their application is the definition and selection of appropriate descriptors reflecting local structure. Here we introduce the scale-invariant VAE approach (SI-VAE) based on the progressive training of the VAE with the descriptors sampled at different length scales. The SI-VAE allows the discovery of the length scale dependent factors of variation in the system. Here, we illustrate this approach using the ferroelectric domain images and generalize it to the movies of the electron-beam induced phenomena in graphene and topography evolution across combinatorial libraries. This approach can further be used to initialize the decision making in automated experiments including structure-property discovery and can be applied across a broad range of imaging methods. This approach is universal and can be applied to any spatially resolved data including both experimental imaging studies and simulations, and can be particularly useful for exploration of phenomena such as turbulence, scale-invariant transformation fronts, etc.

evolution, latent distribution, window size, (15 more...)

2408.00229

Country:

North America > United States > Tennessee > Knox County > Knoxville (0.14)
North America > United States > Washington > Benton County > Richland (0.04)
North America > United States > Tennessee > Anderson County > Oak Ridge (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry:

Energy (0.68)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Sensing and Signal Processing > Image Processing (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.46)

TRGR: Transmissive RIS-aided Gait Recognition Through Walls

Huang, Yunlong, Liu, Junshuo, Zhang, Jianan, Mi, Tiebin, Shi, Xin, Qiu, Robert Caiming

Gait recognition with radio frequency (RF) signals enables many potential applications requiring accurate identification. However, current systems require individuals to be within a line-of-sight (LOS) environment and struggle with low signal-to-noise ratio (SNR) when signals traverse concrete and thick walls. To address these challenges, we present TRGR, a novel transmissive reconfigurable intelligent surface (RIS)-aided gait recognition system. TRGR can recognize human identities through walls using only the magnitude measurements of channel state information (CSI) from a pair of transceivers. Specifically, by leveraging transmissive RIS alongside a configuration alternating optimization algorithm, TRGR enhances wall penetration and signal quality, enabling accurate gait recognition. Furthermore, a residual convolution network (RCNN) is proposed as the backbone network to learn robust human information. Experimental results confirm the efficacy of transmissive RIS, highlighting the significant potential of transmissive RIS in enhancing RF-based gait recognition systems. Extensive experiment results show that TRGR achieves an average accuracy of 97.88\% in identifying persons when signals traverse concrete walls, demonstrating the effectiveness and robustness of TRGR.

recognition, ris, transmissive ris, (14 more...)

2407.21566

Country:

Asia > China > Hubei Province > Wuhan (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)