Pattern Recognition
Synthetic Data Applications in Finance
Potluru, Vamsi K., Borrajo, Daniel, Coletta, Andrea, Dalmasso, Niccolò, El-Laham, Yousef, Fons, Elizabeth, Ghassemi, Mohsen, Gopalakrishnan, Sriram, Gosai, Vikesh, Kreačić, Eleonora, Mani, Ganapathy, Obitayo, Saheed, Paramanand, Deepak, Raman, Natraj, Solonin, Mikhail, Sood, Srijan, Vyetrenko, Svitlana, Zhu, Haibei, Veloso, Manuela, Balch, Tucker
Synthetic data has made tremendous strides in various commercial settings including finance, healthcare, and virtual reality. We present a broad overview of prototypical applications of synthetic data in the financial sector and in particular provide richer details for a few select ones. These cover a wide variety of data modalities including tabular, time-series, event-series, and unstructured arising from both markets and retail financial applications. Since finance is a highly regulated industry, synthetic data is a potential approach for dealing with issues related to privacy, fairness, and explainability. Various metrics are utilized in evaluating the quality and effectiveness of our approaches in these applications. We conclude with open directions in synthetic data in the context of the financial domain.
Enhancing User Intent Capture in Session-Based Recommendation with Attribute Patterns
Liu, Xin, Li, Zheng, Gao, Yifan, Yang, Jingfeng, Cao, Tianyu, Wang, Zhengyang, Yin, Bing, Song, Yangqiu
The goal of session-based recommendation in E-commerce is to predict the next item that an anonymous user will purchase based on the browsing and purchase history. However, constructing global or local transition graphs to supplement session data can lead to noisy correlations and user intent vanishing. In this work, we propose the Frequent Attribute Pattern Augmented Transformer (FAPAT) that characterizes user intents by building attribute transition graphs and matching attribute patterns. Specifically, the frequent and compact attribute patterns are served as memory to augment session representations, followed by a gate and a transformer block to fuse the whole session information. Through extensive experiments on two public benchmarks and 100 million industrial data in three domains, we demonstrate that FAPAT consistently outperforms state-of-the-art methods by an average of 4.5% across various evaluation metrics (Hits, NDCG, MRR). Besides evaluating the next-item prediction, we estimate the models' capabilities to capture user intents via predicting items' attributes and period-item recommendations.
Deformable Image Registration with Stochastically Regularized Biomechanical Equilibrium
Alvarez, Pablo, Cotin, Stéphane
Numerous regularization methods for deformable image registration aim at enforcing smooth transformations, but are difficult to tune-in a priori and lack a clear physical basis. Physically inspired strategies have emerged, offering a sound theoretical basis, but still necessitating complex discretization and resolution schemes. This study introduces a regularization strategy that does not require discretization, making it compatible with current registration frameworks, while retaining the benefits of physically motivated regularization for medical image registration. The proposed method performs favorably in both synthetic and real datasets, exhibiting an accuracy comparable to current state-of-the-art methods.
Multi-view user representation learning for user matching without personal information
Cao, Hongliu, Baamrani, Ilias El, Thomas, Eoin
As the digitization of travel industry accelerates, analyzing and understanding travelers' behaviors becomes increasingly important. However, traveler data frequently exhibit high data sparsity due to the relatively low frequency of user interactions with travel providers. Compounding this effect the multiplication of devices, accounts and platforms while browsing travel products online also leads to data dispersion. To deal with these challenges, probabilistic traveler matching can be used. Most existing solutions for user matching are not suitable for traveler matching as a traveler's browsing history is typically short and URLs in the travel industry are very heterogeneous with many tokens. To deal with these challenges, we propose the similarity based multi-view information fusion to learn a better user representation from URLs by treating the URLs as multi-view data. The experimental results show that the proposed multi-view user representation learning can take advantage of the complementary information from different views, highlight the key information in URLs and perform significantly better than other representation learning solutions for the user matching task.
KitBit: A New AI Model for Solving Intelligence Tests and Numerical Series
Corsino, Víctor, Gilpérez, José Manuel, Herrera, Luis
The resolution of intelligence tests, in particular numerical sequences, has been of great interest in the evaluation of AI systems. We present a new computational model called KitBit that uses a reduced set of algorithms and their combinations to build a predictive model that finds the underlying pattern in numerical sequences, such as those included in IQ tests and others of much greater complexity. We present the fundamentals of the model and its application in different cases. First, the system is tested on a set of number series used in IQ tests collected from various sources. Next, our model is successfully applied on the sequences used to evaluate the models reported in the literature. In both cases, the system is capable of solving these types of problems in less than a second using standard computing power. Finally, KitBit's algorithms have been applied for the first time to the complete set of entire sequences of the well-known OEIS database. We find a pattern in the form of a list of algorithms and predict the following terms in the largest number of series to date. These results demonstrate the potential of KitBit to solve complex problems that could be represented numerically.
Unleashing the Power of CNN and Transformer for Balanced RGB-Event Video Recognition
Wang, Xiao, Rong, Yao, Wang, Shiao, Chen, Yuan, Wu, Zhe, Jiang, Bo, Tian, Yonghong, Tang, Jin
Pattern recognition based on RGB-Event data is a newly arising research topic and previous works usually learn their features using CNN or Transformer. As we know, CNN captures the local features well and the cascaded self-attention mechanisms are good at extracting the long-range global relations. It is intuitive to combine them for high-performance RGB-Event based video recognition, however, existing works fail to achieve a good balance between the accuracy and model parameters, as shown in Fig.~\ref{firstimage}. In this work, we propose a novel RGB-Event based recognition framework termed TSCFormer, which is a relatively lightweight CNN-Transformer model. Specifically, we mainly adopt the CNN as the backbone network to first encode both RGB and Event data. Meanwhile, we initialize global tokens as the input and fuse them with RGB and Event features using the BridgeFormer module. It captures the global long-range relations well between both modalities and maintains the simplicity of the whole model architecture at the same time. The enhanced features will be projected and fused into the RGB and Event CNN blocks, respectively, in an interactive manner using F2E and F2V modules. Similar operations are conducted for other CNN blocks to achieve adaptive fusion and local-global feature enhancement under different resolutions. Finally, we concatenate these three features and feed them into the classification head for pattern recognition. Extensive experiments on two large-scale RGB-Event benchmark datasets (PokerEvent and HARDVS) fully validated the effectiveness of our proposed TSCFormer. The source code and pre-trained models will be released at https://github.com/Event-AHU/TSCFormer.
LaViP:Language-Grounded Visual Prompts
Kunananthaseelan, Nilakshan, Zhang, Jing, Harandi, Mehrtash
We introduce a language-grounded visual prompting method to adapt the visual encoder of vision-language models for downstream tasks. By capitalizing on language integration, we devise a parameter-efficient strategy to adjust the input of the visual encoder, eliminating the need to modify or add to the model's parameters. Due to this design choice, our algorithm can operate even in black-box scenarios, showcasing adaptability in situations where access to the model's parameters is constrained. We will empirically demonstrate that, compared to prior art, grounding visual prompts with language enhances both the accuracy and speed of adaptation. Moreover, our algorithm excels in base-to-novel class generalization, overcoming limitations of visual prompting and exhibiting the capacity to generalize beyond seen classes. We thoroughly assess and evaluate our method across a variety of image recognition datasets, such as EuroSAT, UCF101, DTD, and CLEVR, spanning different learning situations, including few-shot learning, base-to-novel class generalization, and transfer learning.
When Graph Data Meets Multimodal: A New Paradigm for Graph Understanding and Reasoning
Ai, Qihang, Zhou, Jianwu, Jiang, Haiyun, Liu, Lemao, Shi, Shuming
Graph data is ubiquitous in the physical world, and it has always been a challenge to efficiently model graph structures using a unified paradigm for the understanding and reasoning on various graphs. Moreover, in the era of large language models, integrating complex graph information into text sequences has become exceptionally difficult, which hinders the ability to interact with graph data through natural language instructions.The paper presents a new paradigm for understanding and reasoning about graph data by integrating image encoding and multimodal technologies. This approach enables the comprehension of graph data through an instruction-response format, utilizing GPT-4V's advanced capabilities. The study evaluates this paradigm on various graph types, highlighting the model's strengths and weaknesses, particularly in Chinese OCR performance and complex reasoning tasks. The findings suggest new direction for enhancing graph data processing and natural language interaction.
SAT-Based Algorithms for Regular Graph Pattern Matching
Terra-Neves, Miguel, Amaral, José, Lemos, Alexandre, Quintino, Rui, Resende, Pedro, Alegria, Antonio
Graph matching is a fundamental problem in pattern recognition, with many applications such as software analysis and computational biology. One well-known type of graph matching problem is graph isomorphism, which consists of deciding if two graphs are identical. Despite its usefulness, the properties that one may check using graph isomorphism are rather limited, since it only allows strict equality checks between two graphs. For example, it does not allow one to check complex structural properties such as if the target graph is an arbitrary length sequence followed by an arbitrary size loop. We propose a generalization of graph isomorphism that allows one to check such properties through a declarative specification. This specification is given in the form of a Regular Graph Pattern (ReGaP), a special type of graph, inspired by regular expressions, that may contain wildcard nodes that represent arbitrary structures such as variable-sized sequences or subgraphs. We propose a SAT-based algorithm for checking if a target graph matches a given ReGaP. We also propose a preprocessing technique for improving the performance of the algorithm and evaluate it through an extensive experimental evaluation on benchmarks from the CodeSearchNet dataset.
On Mask-based Image Set Desensitization with Recognition Support
Li, Qilong, Liu, Ji, Sun, Yifan, Zhang, Chongsheng, Dou, Dejing
In recent years, Deep Neural Networks (DNN) have emerged as a practical method for image recognition. The raw data, which contain sensitive information, are generally exploited within the training process. However, when the training process is outsourced to a third-party organization, the raw data should be desensitized before being transferred to protect sensitive information. Although masks are widely applied to hide important sensitive information, preventing inpainting masked images is critical, which may restore the sensitive information. The corresponding models should be adjusted for the masked images to reduce the degradation of the performance for recognition or classification tasks due to the desensitization of images. In this paper, we propose a mask-based image desensitization approach while supporting recognition. This approach consists of a mask generation algorithm and a model adjustment method. We propose exploiting an interpretation algorithm to maintain critical information for the recognition task in the mask generation algorithm. In addition, we propose a feature selection masknet as the model adjustment method to improve the performance based on the masked images. Extensive experimentation results based on multiple image datasets reveal significant advantages (up to 9.34% in terms of accuracy) of our approach for image desensitization while supporting recognition.