Supervised Learning
Generalized Zero-shot Intent Detection via Commonsense Knowledge
Siddique, A. B., Jamour, Fuad, Xu, Luxun, Hristidis, Vagelis
Identifying user intents from natural language utterances is a crucial step in conversational systems that has been extensively studied as a supervised classification problem. However, in practice, new intents emerge after deploying an intent detection model. Thus, these models should seamlessly adapt and classify utterances with both seen and unseen intents -- unseen intents emerge after deployment and they do not have training data. The few existing models that target this setting rely heavily on the scarcely available training data and overfit to seen intents data, resulting in a bias to misclassify utterances with unseen intents into seen ones. We propose RIDE: an intent detection model that leverages commonsense knowledge in an unsupervised fashion to overcome the issue of training data scarcity. RIDE computes robust and generalizable relationship meta-features that capture deep semantic relationships between utterances and intent labels; these features are computed by considering how the concepts in an utterance are linked to those in an intent label via commonsense knowledge. Our extensive experimental analysis on three widely-used intent detection benchmarks shows that relationship meta-features significantly increase the accuracy of detecting both seen and unseen intents and that RIDE outperforms the state-of-the-art model for unseen intents.
Time Series Classification via Topological Data Analysis
Karan, Alperen, Kaygun, Atabey
In this study, we use persistent homology to perform classification tasks on two publicly available multivariate time series datasets [19, 11] that include physiological data collected during stressful and non stressful tasks. Instead of directly computing signal-specific features from sliding windows and subwindows on modalities such as electrocardiogram and wrist temperature (Figure 7), we extracted features using persistence diagrams and their statistical properties. Subwindowing method allowed us to reduce noise without incurring an extra computational cost. We then developed machine learning models and assess the performance of our models by varying window sizes and using different flavors of persistence diagrams. Topological Data Analysis (TDA) techniques usually work with points embedded in an affine space of large enough dimension. However, TDA techniques can still be applied to time series data sets whether they are univariate or multivariate. One can convert a univariate time series into a finite collection of points in a -dimensional affine space using delay embedding methods, of which one can compute persistent homology. Since Taken's Theorem implies that the delay embeddings produces topologically invariant subsets on a non-chaotical dynamical system [21], one can reasonably expect that persistent homology produces features that would distinguish different time series. There is a handful of research on the persistent homology of delay embeddings for time series classification [23, 20, 1].
Fast rates in structured prediction
Cabannes, Vivien, Rudi, Alessandro, Bach, Francis
Discrete supervised learning problems such as classification are often tackled by introducing a continuous surrogate problem akin to regression. Bounding the original error, between estimate and solution, by the surrogate error endows discrete problems with convergence rates already shown for continuous instances. Yet, current approaches do not leverage the fact that discrete problems are essentially predicting a discrete output when continuous problems are predicting a continuous value. In this paper, we tackle this issue for general structured prediction problems, opening the way to "super fast" rates, that is, convergence rates for the excess risk faster than $n^{-1}$, where $n$ is the number of observations, with even exponential rates with the strongest assumptions. We first illustrate it for predictors based on nearest neighbors, generalizing rates known for binary classification to any discrete problem within the framework of structured prediction. We then consider kernel ridge regression where we improve known rates in $n^{-1/4}$ to arbitrarily fast rates, depending on a parameter characterizing the hardness of the problem, thus allowing, under smoothness assumptions, to bypass the curse of dimensionality.
Exact Recovery of Clusters in Finite Metric Spaces Using Oracle Queries
Bressan, Marco, Cesa-Bianchi, Nicolò, Lattanzi, Silvio, Paudice, Andrea
We investigate the problem of exact cluster recovery using oracle queries. Previous results show that clusters in Euclidean spaces that are convex and separated with a margin can be reconstructed exactly using only $O(\log n)$ same-cluster queries, where $n$ is the number of input points. In this work, we study this problem in the more challenging non-convex setting. We introduce a structural characterization of clusters, called $(\beta,\gamma)$-convexity, that can be applied to any finite set of points equipped with a metric (or even a semimetric, as the triangle inequality is not needed). Using $(\beta,\gamma)$-convexity, we can translate natural density properties of clusters (which include, for instance, clusters that are strongly non-convex in $R^d$) into a graph-theoretic notion of convexity. By exploiting this convexity notion, we design a deterministic algorithm that recovers $(\beta,\gamma)$-convex clusters using $O(k^2 \log n + k^2 (\frac{6}{\beta\gamma})^{dens(X)})$ same-cluster queries, where $k$ is the number of clusters and $dens(X)$ is the density dimension of the semimetric. We show that an exponential dependence on the density dimension is necessary, and we also show that, if we are allowed to make $O(k^2 + k \log n)$ additional queries to a "cluster separation" oracle, then we can recover clusters that have different and arbitrary scales, even when the scale of each cluster is unknown.
Randomized Deep Structured Prediction for Discourse-Level Processing
Widmoser, Manuel, Pacheco, Maria Leonor, Honorio, Jean, Goldwasser, Dan
Expressive text encoders such as RNNs and Transformer Networks have been at the center of NLP models in recent work. Most of the effort has focused on sentence-level tasks, capturing the dependencies between words in a single sentence, or pairs of sentences. However, certain tasks, such as argumentation mining, require accounting for longer texts and complicated structural dependencies between them. Deep structured prediction is a general framework to combine the complementary strengths of expressive neural encoders and structured inference for highly structured domains. Nevertheless, when the need arises to go beyond sentences, most work relies on combining the output scores of independently trained classifiers. One of the main reasons for this is that constrained inference comes at a high computational cost. In this paper, we explore the use of randomized inference to alleviate this concern and show that we can efficiently leverage deep structured prediction and expressive neural encoders for a set of tasks involving complicated argumentative structures.
NEMR: Network Embedding on Metric of Relation
Xie, Luodi, Shen, Hong, Ren, Jiaxin
Network embedding maps the nodes of a given network into a low-dimensional space such that the semantic similarities among the nodes can be effectively inferred. Most existing approaches use inner-product of node embedding to measure the similarity between nodes leading to the fact that they lack the capacity to capture complex relationships among nodes. Besides, they take the path in the network just as structural auxiliary information when inferring node embeddings, while paths in the network are formed with rich user informations which are semantically relevant and cannot be ignored. In this paper, We propose a novel method called Network Embedding on the Metric of Relation, abbreviated as NEMR, which can learn the embeddings of nodes in a relational metric space efficiently. First, our NEMR models the relationships among nodes in a metric space with deep learning methods including variational inference that maps the relationship of nodes to a gaussian distribution so as to capture the uncertainties. Secondly, our NEMR considers not only the equivalence of multiple-paths but also the natural order of a single-path when inferring embeddings of nodes, which makes NEMR can capture the multiple relationships among nodes since multiple paths contain rich user information, e.g., age, hobby and profession. Experimental results on several public datasets show that the NEMR outperforms the state-of-the-art methods on relevant inference tasks including link prediction and node classification.
A Survey on the Explainability of Supervised Machine Learning
Burkart, Nadia (Fraunhofer IOSB) | Huber, Marco F. (Fraunhofer IPA, University of Stuttgart)
Predictions obtained by, e.g., artificial neural networks have a high accuracy but humans often perceive the models as black boxes. Insights about the decision making are mostly opaque for humans. Particularly understanding the decision making in highly sensitive areas such as healthcare or finance, is of paramount importance. The decision-making behind the black boxes requires it to be more transparent, accountable, and understandable for humans. This survey paper provides essential definitions, an overview of the different principles and methodologies of explainable Supervised Machine Learning (SML). We conduct a state-of-the-art survey that reviews past and recent explainable SML approaches and classifies them according to the introduced definitions. Finally, we illustrate principles by means of an explanatory case study and discuss important future directions.
Studying Catastrophic Forgetting in Neural Ranking Models
Lovon-Melgarejo, Jesus, Soulier, Laure, Pinel-Sauvagnat, Karen, Tamine, Lynda
Several deep neural ranking models have been proposed in the recent IR literature. While their transferability to one target domain held by a dataset has been widely addressed using traditional domain adaptation strategies, the question of their cross-domain transferability is still under-studied. We study here in what extent neural ranking models catastrophically forget old knowledge acquired from previously observed domains after acquiring new knowledge, leading to performance decrease on those domains. Our experiments show that the effectiveness of neuralIR ranking models is achieved at the cost of catastrophic forgetting and that a lifelong learning strategy using a cross-domain regularizer success-fully mitigates the problem. Using an explanatory approach built on a regression model, we also show the effect of domain characteristics on the rise of catastrophic forgetting. We believe that the obtained results can be useful for both theoretical and practical future work in neural IR.
ExpFinder: An Ensemble Expert Finding Model Integrating $N$-gram Vector Space Model and $\mu$CO-HITS
Kang, Yong-Bin, Du, Hung, Forkan, Abdur Rahim Mohammad, Jayaraman, Prem Prakash, Aryani, Amir, Sellis, Timos
Finding an expert plays a crucial role in driving successful collaborations and speeding up high-quality research development and innovations. However, the rapid growth of scientific publications and digital expertise data makes identifying the right experts a challenging problem. Existing approaches for finding experts given a topic can be categorised into information retrieval techniques based on vector space models, document language models, and graph-based models. In this paper, we propose $\textit{ExpFinder}$, a new ensemble model for expert finding, that integrates a novel $N$-gram vector space model, denoted as $n$VSM, and a graph-based model, denoted as $\textit{$\mu$CO-HITS}$, that is a proposed variation of the CO-HITS algorithm. The key of $n$VSM is to exploit recent inverse document frequency weighting method for $N$-gram words and $\textit{ExpFinder}$ incorporates $n$VSM into $\textit{$\mu$CO-HITS}$ to achieve expert finding. We comprehensively evaluate $\textit{ExpFinder}$ on four different datasets from the academic domains in comparison with six different expert finding models. The evaluation results show that $\textit{ExpFinder}$ is a highly effective model for expert finding, substantially outperforming all the compared models in 19% to 160.2%.
SHARKS: Smart Hacking Approaches for RisK Scanning in Internet-of-Things and Cyber-Physical Systems based on Machine Learning
Saha, Tanujay, Aaraj, Najwa, Ajjarapu, Neel, Jha, Niraj K.
Cyber-physical systems (CPS) and Internet-of-Things (IoT) devices are increasingly being deployed across multiple functionalities, ranging from healthcare devices and wearables to critical infrastructures, e.g., nuclear power plants, autonomous vehicles, smart cities, and smart homes. These devices are inherently not secure across their comprehensive software, hardware, and network stacks, thus presenting a large attack surface that can be exploited by hackers. In this article, we present an innovative technique for detecting unknown system vulnerabilities, managing these vulnerabilities, and improving incident response when such vulnerabilities are exploited. The novelty of this approach lies in extracting intelligence from known real-world CPS/IoT attacks, representing them in the form of regular expressions, and employing machine learning (ML) techniques on this ensemble of regular expressions to generate new attack vectors and security vulnerabilities. Our results show that 10 new attack vectors and 122 new vulnerability exploits can be successfully generated that have the potential to exploit a CPS or an IoT ecosystem. The ML methodology achieves an accuracy of 97.4% and enables us to predict these attacks efficiently with an 87.2% reduction in the search space. We demonstrate the application of our method to the hacking of the in-vehicle network of a connected car. To defend against the known attacks and possible novel exploits, we discuss a defense-in-depth mechanism for various classes of attacks and the classification of data targeted by such attacks. This defense mechanism optimizes the cost of security measures based on the sensitivity of the protected resource, thus incentivizing its adoption in real-world CPS/IoT by cybersecurity practitioners.