Goto

Collaborating Authors

 similarity distance


Causality-driven Sequence Segmentation for Enhancing Multiphase Industrial Process Data Analysis and Soft Sensing

He, Yimeng, Yao, Le, Zhang, Xinmin, Kong, Xiangyin, Song, Zhihuan

arXiv.org Artificial Intelligence

The dynamic characteristics of multiphase industrial processes present significant challenges in the field of industrial big data modeling. Traditional soft sensing models frequently neglect the process dynamics and have difficulty in capturing transient phenomena like phase transitions. To address this issue, this article introduces a causality-driven sequence segmentation (CDSS) model. This model first identifies the local dynamic properties of the causal relationships between variables, which are also referred to as causal mechanisms. It then segments the sequence into different phases based on the sudden shifts in causal mechanisms that occur during phase transitions. Additionally, a novel metric, similarity distance, is designed to evaluate the temporal consistency of causal mechanisms, which includes both causal similarity distance and stable similarity distance. The discovered causal relationships in each phase are represented as a temporal causal graph (TCG). Furthermore, a soft sensing model called temporal-causal graph convolutional network (TC-GCN) is trained for each phase, by using the time-extended data and the adjacency matrix of TCG. The numerical examples are utilized to validate the proposed CDSS model, and the segmentation results demonstrate that CDSS has excellent performance on segmenting both stable and unstable multiphase series. Especially, it has higher accuracy in separating non-stationary time series compared to other methods. The effectiveness of the proposed CDSS model and the TC-GCN model is also verified through a penicillin fermentation process. Experimental results indicate that the breakpoints discovered by CDSS align well with the reaction mechanisms and TC-GCN significantly has excellent predictive accuracy.


Zero-Shot Generalization during Instruction Tuning: Insights from Similarity and Granularity

He, Bingxiang, Ding, Ning, Qian, Cheng, Deng, Jia, Cui, Ganqu, Yuan, Lifan, Gao, Huan-ang, Chen, Huimin, Liu, Zhiyuan, Sun, Maosong

arXiv.org Artificial Intelligence

Understanding alignment techniques begins with comprehending zero-shot generalization brought by instruction tuning, but little of the mechanism has been understood. Existing work has largely been confined to the task level, without considering that tasks are artificially defined and, to LLMs, merely consist of tokens and representations. This line of research has been limited to examining transfer between tasks from a task-pair perspective, with few studies focusing on understanding zero-shot generalization from the perspective of the data itself. To bridge this gap, we first demonstrate through multiple metrics that zero-shot generalization during instruction tuning happens very early. Next, we investigate the facilitation of zero-shot generalization from both data similarity and granularity perspectives, confirming that encountering highly similar and fine-grained training data earlier during instruction tuning, without the constraints of defined "tasks", enables better generalization. Finally, we propose a more grounded training data arrangement method, Test-centric Multi-turn Arrangement, and show its effectiveness in promoting continual learning and further loss reduction. For the first time, we show that zero-shot generalization during instruction tuning is a form of similarity-based generalization between training and test data at the instance level. We hope our analysis will advance the understanding of zero-shot generalization during instruction tuning and contribute to the development of more aligned LLMs.


The Distribution Family of Similarity Distances

Neural Information Processing Systems

Assessing similarity between features is a key step in object recognition and scene categorization tasks. We argue that knowledge on the distribution of distances generated by similarity functions is crucial in deciding whether features are similar or not. Intuitively one would expect that similarities between features could arise from any distribution. In this paper, we will derive the contrary, and report the theoretical result that L_p -norms --a class of commonly applied distance metrics-- from one feature vector to other vectors are Weibull-distributed if the feature values are correlated and non-identically distributed. Besides these assumptions being realistic for images, we experimentally show them to hold for various popular feature extraction algorithms, for a diverse range of images.


Improving Video Retrieval by Adaptive Margin

He, Feng, Wang, Qi, Feng, Zhifan, Jiang, Wenbin, Lv, Yajuan, zhu, Yong, Tan, Xiao

arXiv.org Artificial Intelligence

Video retrieval is becoming increasingly important owing to the rapid emergence of videos on the Internet. The dominant paradigm for video retrieval learns video-text representations by pushing the distance between the similarity of positive pairs and that of negative pairs apart from a fixed margin. However, negative pairs used for training are sampled randomly, which indicates that the semantics between negative pairs may be related or even equivalent, while most methods still enforce dissimilar representations to decrease their similarity. This phenomenon leads to inaccurate supervision and poor performance in learning video-text representations. While most video retrieval methods overlook that phenomenon, we propose an adaptive margin changed with the distance between positive and negative pairs to solve the aforementioned issue. First, we design the calculation framework of the adaptive margin, including the method of distance measurement and the function between the distance and the margin. Then, we explore a novel implementation called "Cross-Modal Generalized Self-Distillation" (CMGSD), which can be built on the top of most video retrieval models with few modifications. Notably, CMGSD adds few computational overheads at train time and adds no computational overhead at test time. Experimental results on three widely used datasets demonstrate that the proposed method can yield significantly better performance than the corresponding backbone model, and it outperforms state-of-the-art methods by a large margin.


Adversarial Robust Deep Reinforcement Learning Requires Redefining Robustness

Korkmaz, Ezgi

arXiv.org Artificial Intelligence

Learning from raw high dimensional data via interaction with a given environment has been effectively achieved through the utilization of deep neural networks. Yet the observed degradation in policy performance caused by imperceptible worst-case policy dependent translations along high sensitivity directions (i.e. adversarial perturbations) raises concerns on the robustness of deep reinforcement learning policies. In our paper, we show that these high sensitivity directions do not lie only along particular worst-case directions, but rather are more abundant in the deep neural policy landscape and can be found via more natural means in a black-box setting. Furthermore, we show that vanilla training techniques intriguingly result in learning more robust policies compared to the policies learnt via the state-of-the-art adversarial training techniques. We believe our work lays out intriguing properties of the deep reinforcement learning policy manifold and our results can help to build robust and generalizable deep reinforcement learning policies.


Agglomerative Hierarchical Clustering with Dynamic Time Warping for Household Load Curve Clustering

AlMahamid, Fadi, Grolinger, Katarina

arXiv.org Artificial Intelligence

Energy companies often implement various demand response (DR) programs to better match electricity demand and supply by offering the consumers incentives to reduce their demand during critical periods. Classifying clients according to their consumption patterns enables targeting specific groups of consumers for DR. Traditional clustering algorithms use standard distance measurement to find the distance between two points. The results produced by clustering algorithms such as K-means, K-medoids, and Gaussian Mixture Models depend on the clustering parameters or initial clusters. In contrast, our methodology uses a shape-based approach that combines Agglomerative Hierarchical Clustering (AHC) with Dynamic Time Warping (DTW) to classify residential households' daily load curves based on their consumption patterns. While DTW seeks the optimal alignment between two load curves, AHC provides a realistic initial clusters center. In this paper, we compare the results with other clustering algorithms such as K-means, K-medoids, and GMM using different distance measures, and we show that AHC using DTW outperformed other clustering algorithms and needed fewer clusters.


The Distribution Family of Similarity Distances

Burghouts, Gertjan, Smeulders, Arnold, Geusebroek, Jan-mark

Neural Information Processing Systems

Assessing similarity between features is a key step in object recognition and scene categorization tasks. We argue that knowledge on the distribution of distances generated by similarity functions is crucial in deciding whether features are similar or not. Intuitively one would expect that similarities between features could arise from any distribution. In this paper, we will derive the contrary, and report the theoretical result that $L_p$-norms --a class of commonly applied distance metrics-- from one feature vector to other vectors are Weibull-distributed if the feature values are correlated and non-identically distributed. Besides these assumptions being realistic for images, we experimentally show them to hold for various popular feature extraction algorithms, for a diverse range of images.


Evaluating approval-based multiwinner voting in terms of robustness to noise

Caragiannis, Ioannis, Kaklamanis, Christos, Karanikolas, Nikos, Krimpas, George A.

arXiv.org Artificial Intelligence

Approval-based multiwinner voting rules have recently received much attention in the Computational Social Choice literature. Such rules aggregate approval ballots and determine a winning committee of alternatives. To assess effectiveness, we propose to employ new noise models that are specifically tailored for approval votes and committees. These models take as input a ground truth committee and return random approval votes to be thought of as noisy estimates of the ground truth. A minimum robustness requirement for an approval-based multiwinner voting rule is to return the ground truth when applied to profiles with sufficiently many noisy votes. Our results indicate that approval-based multiwinner voting is always robust to reasonable noise. We further refine this finding by presenting a hierarchy of rules in terms of how robust to noise they are.


optimalFlow: Optimal-transport approach to flow cytometry gating and population matching

del Barrio, Eustasio, Inouzhe, Hristo, Loubes, Jean-Michel, Matrán, Carlos, Mayo-Íscar, Agustín

arXiv.org Machine Learning

Data used in Flow Cytometry present pronounced variability due to biological and technical reasons. Biological variability is a well known phenomenon produced by measurements on different individuals, with different characteristics such as age, sex, etc... The use of different settings for measurement, the variation of the conditions during experiments or the different types of flow cytometers are some of the technical sources of variability. This high variability makes difficult the use of supervised machine learning for identification of cell populations. We propose optimalFlowTemplates, based on a similarity distance and Wasserstein barycenters, which clusterizes cytometries and produces prototype cytometries for the different groups. We show that supervised learning restricted to the new groups performs better than the same techniques applied to the whole collection. We also present optimalFlowClassification, which uses a database of gated cytometries and optimalFlowTemplates to assign cell types to a new cytometry. We show that this procedure can outperform state of the art techniques in the proposed datasets. Our code and data are freely available as R packages at https://github.com/HristoInouzhe/optimalFlow and https://github.com/HristoInouzhe/optimalFlowData.


Large Scale Local Online Similarity/Distance Learning Framework based on Passive/Aggressive

Hamdan, Baida, Zabihzadeh, Davood, Reza, Monsefi

arXiv.org Machine Learning

Similarity/Distance measures play a key role in many machine learning, pattern recognition, and data mining algorithms, which leads to the emergence of metric learning field. Many metric learning algorithms learn a global distance function from data that satisfy the constraints of the problem. However, in many real-world datasets that the discrimination power of features varies in the different regions of input space, a global metric is often unable to capture the complexity of the task. To address this challenge, local metric learning methods are proposed that learn multiple metrics across the different regions of input space. Some advantages of these methods are high flexibility and the ability to learn a nonlinear mapping but typically achieves at the expense of higher time requirement and overfitting problem. To overcome these challenges, this research presents an online multiple metric learning framework. Each metric in the proposed framework is composed of a global and a local component learned simultaneously. Adding a global component to a local metric efficiently reduce the problem of overfitting. The proposed framework is also scalable with both sample size and the dimension of input data. To the best of our knowledge, this is the first local online similarity/distance learning framework based on PA (Passive/Aggressive). In addition, for scalability with the dimension of input data, DRP (Dual Random Projection) is extended for local online learning in the present work. It enables our methods to be run efficiently on high-dimensional datasets, while maintains their predictive performance. The proposed framework provides a straightforward local extension to any global online similarity/distance learning algorithm based on PA.