Pattern Recognition
Reviews: Arbicon-Net: Arbitrary Continuous Geometric Transformation Networks for Image Registration
The paper presents a neural network model for image registration, which generates an arbitrary displacement field to transform the input image in a way that matches the target. This neural network has several components, including a common feature extraction model that results in a 4D tensor with the correlations of local features from both images. The tensor is then transformed into a vector representation of the transformation, and later used to reconstruct a displacement field. COMMENTS Overall, the work is relatively well presented and provides details to understand most of the formulation and solution. However, there are some confusing aspects that could be clarified or stated more prominently.
Reviews: Arbicon-Net: Arbitrary Continuous Geometric Transformation Networks for Image Registration
This submission received mixed ratings. The most positive reviewers has a non confident rating. R1 and R2 appreciate that the paper is well written and presents an interesting approach to image registration. R1 and R3 point out that the central contribution is not clearly stated in the text. Also overlap of text in sections 3.1-3.3
Topological constraints on self-organisation in locally interacting systems
Sacco, Francesco, Sakthivadivel, Dalton A R, Levin, Michael
All intelligence is collective intelligence, in the sense that it is made of parts which must align with respect to system-level goals. Understanding the dynamics which facilitate or limit navigation of problem spaces by aligned parts thus impacts many fields ranging across life sciences and engineering. To that end, consider a system on the vertices of a planar graph, with pairwise interactions prescribed by the edges of the graph. Such systems can sometimes exhibit long-range order, distinguishing one phase of macroscopic behaviour from another. In networks of interacting systems we may view spontaneous ordering as a form of self-organisation, modelling neural and basal forms of cognition. Here, we discuss necessary conditions on the topology of the graph for an ordered phase to exist, with an eye towards finding constraints on the ability of a system with local interactions to maintain an ordered target state. By studying the scaling of free energy under the formation of domain walls in three model systems -- the Potts model, autoregressive models, and hierarchical networks -- we show how the combinatorics of interactions on a graph prevent or allow spontaneous ordering. As an application we are able to analyse why multiscale systems like those prevalent in biology are capable of organising into complex patterns, whereas rudimentary language models are challenged by long sequences of outputs.
Why it's so hard to use AI to diagnose cancer
In theory, artificial intelligence should be great at helping out. "Our job is pattern recognition," says Andrew Norgan, a pathologist and medical director of the Mayo Clinic's digital pathology platform. "We look at the slide and we gather pieces of information that have been proven to be important." Visual analysis is something that AI has gotten quite good at since the first image recognition models began taking off nearly 15 years ago. Even though no model will be perfect, you can imagine a powerful algorithm someday catching something that a human pathologist missed, or at least speeding up the process of getting a diagnosis.
Survey on Hand Gesture Recognition from Visual Input
Linardakis, Manousos, Varlamis, Iraklis, Papadopoulos, Georgios Th.
Hand gesture recognition has become an important research area, driven by the growing demand for human-computer interaction in fields such as sign language recognition, virtual and augmented reality, and robotics. Despite the rapid growth of the field, there are few surveys that comprehensively cover recent research developments, available solutions, and benchmark datasets. This survey addresses this gap by examining the latest advancements in hand gesture and 3D hand pose recognition from various types of camera input data including RGB images, depth images, and videos from monocular or multiview cameras, examining the differing methodological requirements of each approach. Furthermore, an overview of widely used datasets is provided, detailing their main characteristics and application domains. Finally, open challenges such as achieving robust recognition in real-world environments, handling occlusions, ensuring generalization across diverse users, and addressing computational efficiency for real-time applications are highlighted to guide future research directions. By synthesizing the objectives, methodologies, and applications of recent studies, this survey offers valuable insights into current trends, challenges, and opportunities for future research in human hand gesture recognition.
Reviews: Dense Associative Memory for Pattern Recognition
The theoretical contribution presented in 291--310 is a welcome insight on the computational power of ReLUs. The experimental results for rectified polynomial units reported in figures 2 and 3 are interesting and apparently novel, even in the context of standard feedforward multi-layer networks. Being 291--297 a central point of the paper it should be expanded and better justified. Furthermore, the simple capacity analysis developed in p. 3 for the polynomial energy function is invoked here for the rectified polynomial energy function. This has to be justified. The paper starts from and mostly focuses on the associative memory (Hamiltonian) formulation, but then the findings are restricted to one-step retrieval.
Reviews: Generative Shape Models: Joint Text Recognition and Segmentation with Very Little Training Data
Method and Novelty: The authors present a model that has a number of strengths. First, the character-level model is trained on synthetically generated images from a font library, independently of the training corpus. Second, the model converts each training image into a factor graph and learns the spatial relationships between landmarks in each character. This model can readily assign a probability to each candidate character for an image, and the authors provide a description of a two-stage inference algorithm that consists of approximate belief propagation followed by refinement via a backtracking procedure. The candidate characters are then supplied to a word model, which is a fairly standard structured prediction using bigram and trigram features.
Disentangling Voice and Content with Self-Supervision for Speaker Recognition
For speaker recognition, it is difficult to extract an accurate speaker representation from speech because of its mixture of speaker traits and content. This paper proposes a disentanglement framework that simultaneously models speaker traits and content variability in speech. It is realized with the use of three Gaussian inference layers, each consisting of a learnable transition model that extracts distinct speech components. Notably, a strengthened transition model is specifically designed to model complex speech dynamics. We also propose a self-supervision method to dynamically disentangle content without the use of labels other than speaker identities. The efficacy of the proposed framework is validated via experiments conducted on the VoxCeleb and SITW datasets with 9.56\% and 8.24\% average reductions in EER and minDCF, respectively.
Learning Robust Hierarchical Patterns of Human Brain across Many fMRI Studies
Multi-site fMRI studies face the challenge that the pooling introduces systematic non-biological site-specific variance due to hardware, software, and environment. In this paper, we propose to reduce site-specific variance in the estimation of hierarchical Sparsity Connectivity Patterns (hSCPs) in fMRI data via a simple yet effective matrix factorization while preserving biologically relevant variations. Our method leverages unsupervised adversarial learning to improve the reproducibility of the components. Experiments on simulated datasets display that the proposed method can estimate components with higher accuracy and reproducibility, while preserving age-related variation on a multi-center clinical data set.
Tsetlin Machine for Solving Contextual Bandit Problems
This paper introduces an interpretable contextual bandit algorithm using Tsetlin Machines, which solves complex pattern recognition tasks using propositional (Boolean) logic. The proposed bandit learning algorithm relies on straightforward bit manipulation, thus simplifying computation and interpretation. We then present a mechanism for performing Thompson sampling with Tsetlin Machine, given its non-parametric nature. Our empirical analysis shows that Tsetlin Machine as a base contextual bandit learner outperforms other popular base learners on eight out of nine datasets. We further analyze the interpretability of our learner, investigating how arms are selected based on propositional expressions that model the context.