Pattern Recognition
Identifying and Characterising Higher Order Interactions in Mobility Networks Using Hypergraphs
Sambaturu, Prathyush, Gutierrez, Bernardo, Kraemer, Moritz U. G.
Human mobility data is crucial for understanding patterns of movement across geographical regions, with applications spanning urban planning[1], transportation systems design[2], infectious disease modeling and control [3, 4], and social dynamics studies [5]. Traditionally, mobility data has been represented using flow networks[6, 7] or colocation matrices [8], where the primary representation is via pairwise interactions. In flow networks, this means directed edges represent the movement of individuals between two locations; colocation matrices measure the probability that a random individual from a region is colocated with a random individual from another region at the same location. These data types and their pairwise representation structure have been used to identify the spatial scales and regularity of human mobility, but have inherent limitations in their capacity to capture more complex patterns of human movement involving higher-order interactions between locations - that is, group of locations that are frequently visited by many individuals within a period of time (e.g., a week) and revisited regularly over time. Higher-order interactions between locations can contain crucial information under certain scenarios.
Semantic Feature Learning for Universal Unsupervised Cross-Domain Retrieval
Cross-domain retrieval (CDR) is finding increasingly broad applications across various domains. However, existing efforts have several major limitations, with the most critical being their reliance on accurate supervision. Recent studies thus focus on achieving unsupervised CDR, but they typically assume that the category spaces across domains are identical, an assumption that is often unrealistic in real-world scenarios. This is because only through dedicated and comprehensive analysis can the category composition of a data domain be obtained, which contradicts the premise of unsupervised scenarios.
A multimodal developmental benchmark for language learning
How (dis)similar are the learning trajectories of vision-language models and children? Recent modeling work has attempted to understand the gap between models' and humans' data efficiency by constructing models trained on less data, especially multimodal naturalistic data. However, such models are often evaluated on adultlevel benchmarks, with limited breadth in language abilities tested, and without direct comparison to behavioral data.
Muharaf: Manuscripts of Handwritten Arabic Dataset for Cursive Text Recognition Adrian Chan 1 Joseph Moukarzel 2
We present the Manuscripts of Handwritten Arabic (Muharaf) dataset, which is a machine learning dataset consisting of more than 1,600 historic handwritten page images transcribed by experts in archival Arabic. Each document image is accompanied by spatial polygonal coordinates of its text lines as well as basic page elements. This dataset was compiled to advance the state of the art in handwritten text recognition (HTR), not only for Arabic manuscripts but also for cursive text in general. The Muharaf dataset includes diverse handwriting styles and a wide range of document types, including personal letters, diaries, notes, poems, church records, and legal correspondences. In this paper, we describe the data acquisition pipeline, notable dataset features, and statistics. We also provide a preliminary baseline result achieved by training convolutional neural networks using this data.
Supplementary material Guided Adversarial Attack for Evaluating and Enhancing Adversarial Defenses
In this section, we present details on the improved local properties achieved using the proposed single-step defense, GAT (Guided Adversarial Training). We examine the local properties of networks trained using the proposed methodology here. Thus, given that we want to obtain the strongest adversary achievable within a single backward-pass of the loss, we find x as given in Alg.1, L6 to L9. Hence, imposing the proposed regularizer encourages the optimization procedure to produce a network that is locally Lipschitz continuous, with a smaller local Lipschitz constant. The value of ฮป can be chosen so as to achieve the desired trade-off between clean accuracy and robustness [16]. We run extensive evaluations on MNIST [10], CIFAR-10 [9] and ImageNet [5] datasets to validate our claims on the proposed attack and defense. MNIST [10] is a handwritten digit recognition dataset consisting of 60,000 training images and 10,000 test images. The images are grayscale, and of dimension 28 28. We split the training set into a random subset of 50,000 training images and 10,000 validation images.
In Pursuit of Causal Label Correlations for Multi-label Image Recognition 3 1
Multi-label image recognition aims to predict all objects present in an input image. A common belief is that modeling the correlations between objects is beneficial for multi-label recognition. However, this belief has been recently challenged as label correlations may mislead the classifier in testing, due to the possible contextual bias in training. Accordingly, a few of recent works not only discarded label correlation modeling, but also advocated to remove contextual information for multi-label image recognition. This work explicitly explores label correlations for multi-label image recognition based on a principled causal intervention approach. With causal intervention, we pursue causal label correlations and suppress spurious label correlations, as the former tend to convey useful contextual cues while the later may mislead the classifier. Specifically, we decouple label-specific features with a Transformer decoder attached to the backbone network, and model the confounders which may give rise to spurious correlations by clustering spatial features of all training images. Based on label-specific features and confounders, we employ a cross-attention module to implement causal intervention, quantifying the causal correlations from all object categories to each predicted object category. Finally, we obtain image labels by combining the predictions from decoupled features and causal label correlations.