Pattern Recognition
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Italy > Veneto > Venice (0.04)
- (8 more...)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.34)
- North America (0.14)
- South America > Uruguay > Maldonado > Maldonado (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- (2 more...)
You Need Better Attention Priors
We generalize the attention mechanism by viewing it through the lens of Entropic Optimal Transport, revealing that standard attention corresponds to a transport problem regularized by an implicit uniform prior. We introduce Generalized Optimal transport Attention with Trainable priors (GOAT), a new attention mechanism that replaces this naive assumption with a learnable, continuous prior. This prior maintains full compatibility with optimized kernels such as FlashAttention. GOAT also provides an EOT-based explanation of attention sinks and materializes a solution for them, avoiding the representational trade-offs of standard attention. Finally, by absorbing spatial information into the core attention computation, GOAT learns an extrapolatable prior that combines the flexibility of learned positional embeddings with the length generalization of fixed encodings.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.46)
CROCS: A Two-Stage Clustering Framework for Behaviour-Centric Consumer Segmentation with Smart Meter Data
Yerbury, Luke W., Campello, Ricardo J. G. B., Livingston, G. C. Jr, Goldsworthy, Mark, O'Neil, Lachlan
With grid operators confronting rising uncertainty from renewable integration and a broader push toward electrification, Demand-Side Management (DSM) -- particularly Demand Response (DR) -- has attracted significant attention as a cost-effective mechanism for balancing modern electricity systems. Unprecedented volumes of consumption data from a continuing global deployment of smart meters enable consumer segmentation based on real usage behaviours, promising to inform the design of more effective DSM and DR programs. However, existing clustering-based segmentation methods insufficiently reflect the behavioural diversity of consumers, often relying on rigid temporal alignment, and faltering in the presence of anomalies, missing data, or large-scale deployments. To address these challenges, we propose a novel two-stage clustering framework -- Clustered Representations Optimising Consumer Segmentation (CROCS). In the first stage, each consumer's daily load profiles are clustered independently to form a Representative Load Set (RLS), providing a compact summary of their typical diurnal consumption behaviours. In the second stage, consumers are clustered using the Weighted Sum of Minimum Distances (WSMD), a novel set-to-set measure that compares RLSs by accounting for both the prevalence and similarity of those behaviours. Finally, community detection on the WSMD-induced graph reveals higher-order prototypes that embody the shared diurnal behaviours defining consumer groups, enhancing the interpretability of the resulting clusters. Extensive experiments on both synthetic and real Australian smart meter datasets demonstrate that CROCS captures intra-consumer variability, uncovers both synchronous and asynchronous behavioural similarities, and remains robust to anomalies and missing data, while scaling efficiently through natural parallelisation. These results...
- Oceania > Australia > New South Wales (0.04)
- Europe > Netherlands > South Holland > Leiden (0.04)
- Europe > Italy (0.04)
- (10 more...)
- Energy > Renewable (1.00)
- Energy > Power Industry (1.00)
UNIT: Unifying Image and Text Recognition in One Vision Encoder
Currently, vision encoder models like Vision Transformers (ViTs) typically excel at image recognition tasks but cannot simultaneously support text recognition like human visual recognition. To address this limitation, we propose UNIT, a novel training framework aimed at UNifying Image and Text recognition within a single model. Starting with a vision encoder pre-trained with image recognition tasks, UNIT introduces a lightweight language decoder for predicting text outputs and a lightweight vision decoder to prevent catastrophic forgetting of the original image encoding capabilities. The training process comprises two stages: intra-scale pretraining and inter-scale finetuning. During intra-scale pretraining, UNIT learns unified representations from multi-scale inputs, where images and documents are at their commonly used resolution, to enable fundamental recognition capability. In the inter-scale finetuning stage, the model introduces scale-exchanged data, featuring images and documents at resolutions different from the most commonly used ones, to enhance its scale robustness. Notably, UNIT retains the original vision encoder architecture, making it cost-free in terms of inference and deployment. Experiments across multiple benchmarks confirm that our method significantly outperforms existing methods on document-related tasks (e.g., OCR and DocQA) while maintaining the performances on natural images, demonstrating its ability to substantially enhance text recognition without compromising its core image recognition capabilities.
Tsetlin Machine for Solving Contextual Bandit Problems
This paper introduces an interpretable contextual bandit algorithm using Tsetlin Machines, which solves complex pattern recognition tasks using propositional (Boolean) logic. The proposed bandit learning algorithm relies on straightforward bit manipulation, thus simplifying computation and interpretation. We then present a mechanism for performing Thompson sampling with Tsetlin Machine, given its non-parametric nature. Our empirical analysis shows that Tsetlin Machine as a base contextual bandit learner outperforms other popular base learners on eight out of nine datasets. We further analyze the interpretability of our learner, investigating how arms are selected based on propositional expressions that model the context.
- Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.63)
- Information Technology > Data Science > Data Mining > Big Data (0.43)
Multiscale Deep Equilibrium Models
We propose a new class of implicit networks, the multiscale deep equilibrium model (MDEQ), suited to large-scale and highly hierarchical pattern recognition domains. An MDEQ directly solves for and backpropagates through the equilibrium points of multiple feature resolutions simultaneously, using implicit differentiation to avoid storing intermediate states (and thus requiring only O(1) memory consumption). These simultaneously-learned multi-resolution features allow us to train a single model on a diverse set of tasks and loss functions, such as using a single MDEQ to perform both image classification and semantic segmentation. We illustrate the effectiveness of this approach on two large-scale vision tasks: ImageNet classification and semantic segmentation on high-resolution images from the Cityscapes dataset. In both settings, MDEQs are able to match or exceed the performance of recent competitive computer vision models: the first time such performance and scale have been achieved by an implicit deep learning approach. The code and pre-trained models are at https://github.com/locuslab/mdeq.
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.61)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)
SigTime: Learning and Visually Explaining Time Series Signatures
Huang, Yu-Chia, Chen, Juntong, Liu, Dongyu, Ma, Kwan-Liu
Understanding and distinguishing temporal patterns in time series data is essential for scientific discovery and decision-making. For example, in biomedical research, uncovering meaningful patterns in physiological signals can improve diagnosis, risk assessment, and patient outcomes. However, existing methods for time series pattern discovery face major challenges, including high computational complexity, limited interpretability, and difficulty in capturing meaningful temporal structures. To address these gaps, we introduce a novel learning framework that jointly trains two Transformer models using complementary time series representations: shapelet-based representations to capture localized temporal structures and traditional feature engineering to encode statistical properties. The learned shapelets serve as interpretable signatures that differentiate time series across classification labels. Additionally, we develop a visual analytics system -- SigTIme -- with coordinated views to facilitate exploration of time series signatures from multiple perspectives, aiding in useful insights generation. We quantitatively evaluate our learning framework on eight publicly available datasets and one proprietary clinical dataset. Additionally, we demonstrate the effectiveness of our system through two usage scenarios along with the domain experts: one involving public ECG data and the other focused on preterm labor analysis.
- North America > United States > California > Yolo County > Davis (0.14)
- Asia > Taiwan (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
- (3 more...)
- Overview (0.92)
- Research Report > Experimental Study (0.67)
- Research Report > New Finding (0.46)
Training-Free Dual Hyperbolic Adapters for Better Cross-Modal Reasoning
Zhang, Yi, Cheng, Chun-Wun, He, Junyi, Yu, Ke, Tang, Yushun, Schönlieb, Carola-Bibiane, He, Zhihai, Aviles-Rivero, Angelica I.
Abstract--Recent research in Vision-Language Models (VLMs) has significantly advanced our capabilities in cross-modal reasoning. However, existing methods suffer from performance degradation with domain changes or require substantial computational resources for fine-tuning in new domains. T o address this issue, we develop a new adaptation method for large vision-language models, called Training-free Dual Hyperbolic Adapters (T -DHA). We characterize vision-language relationship between semantic concepts, which typically has a hierarchical tree structure, in the hyperbolic space instead of the traditional Euclidean space. We find that this unique property is particularly effective for embedding hierarchical data structures using the Poincar e ball model, achieving significantly improved representation and discrimination power . Coupled with negative learning, it provides more accurate and robust classifications with fewer feature dimensions. Our extensive experimental results on various datasets demonstrate that the T -DHA method significantly outperforms existing state-of-the-art methods in few-shot image recognition and domain generalization tasks. ARGE Vision-Language Models (VLMs), such as CLIP [1] and ALIGN [2], are trained on extensive image-text datasets using contrastive learning. These models excel in creating a unified vision-language embedding space by aligning visual and textual modalities, enabling their successful application across a wide range of downstream visual tasks, such as few-shot image recognition [3]-[5].
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > Canada > Newfoundland and Labrador > Labrador (0.05)
- (3 more...)
- Research Report > New Finding (0.68)
- Research Report > Promising Solution (0.66)
A Comparative Study of EMG- and IMU-based Gesture Recognition at the Wrist and Forearm
Baghernezhad, Soroush, Mohammadreza, Elaheh, da Fonseca, Vinicius Prado, Zou, Ting, Jiang, Xianta
Gestures are an integral part of our daily interactions with the environment. Hand gesture recognition (HGR) is the process of interpreting human intent through various input modalities, such as visual data (images and videos) and bio-signals. Bio-signals are widely used in HGR due to their ability to be captured non-invasively via sensors placed on the arm. Among these, surface electromyography (sEMG), which measures the electrical activity of muscles, is the most extensively studied modality. However, less-explored alternatives such as inertial measurement units (IMUs) can provide complementary information on subtle muscle movements, which makes them valuable for gesture recognition. In this study, we investigate the potential of using IMU signals from different muscle groups to capture user intent. Our results demonstrate that IMU signals contain sufficient information to serve as the sole input sensor for static gesture recognition. Moreover, we compare different muscle groups and check the quality of pattern recognition on individual muscle groups. We further found that tendon-induced micro-movement captured by IMUs is a major contributor to static gesture recognition. We believe that leveraging muscle micro-movement information can enhance the usability of prosthetic arms for amputees. This approach also offers new possibilities for hand gesture recognition in fields such as robotics, teleoperation, sign language interpretation, and beyond.
- North America > Canada > Newfoundland and Labrador > Newfoundland > St. John's (0.04)
- North America > United States (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Health Care Technology (0.88)
- Health & Medicine > Therapeutic Area > Neurology (0.48)
- Health & Medicine > Therapeutic Area > Musculoskeletal (0.46)
- (2 more...)
- Information Technology > Artificial Intelligence > Vision > Gesture Recognition (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)