Support Vector Machines
Towards Generalizable Detection of Urgency of Discussion Forum Posts
Švábenský, Valdemar, Baker, Ryan S., Zambrano, Andrés, Zou, Yishan, Slater, Stefan
Students who take an online course, such as a MOOC, use the course's discussion forum to ask questions or reach out to instructors when encountering an issue. However, reading and responding to students' questions is difficult to scale because of the time needed to consider each message. As a result, critical issues may be left unresolved, and students may lose the motivation to continue in the course. To help address this problem, we build predictive models that automatically determine the urgency of each forum post, so that these posts can be brought to instructors' attention. This paper goes beyond previous work by predicting not just a binary decision cut-off but a post's level of urgency on a 7-point scale. First, we train and cross-validate several models on an original data set of 3,503 posts from MOOCs at University of Pennsylvania. Second, to determine the generalizability of our models, we test their performance on a separate, previously published data set of 29,604 posts from MOOCs at Stanford University. While the previous work on post urgency used only one data set, we evaluated the prediction across different data sets and courses. The best-performing model was a support vector regressor trained on the Universal Sentence Encoder embeddings of the posts, achieving an RMSE of 1.1 on the training set and 1.4 on the test set. Understanding the urgency of forum posts enables instructors to focus their time more effectively and, as a result, better support student learning.
Sampling-based sublinear low-rank matrix arithmetic framework for dequantizing quantum machine learning
Chia, Nai-Hui, Gilyén, András, Li, Tongyang, Lin, Han-Hsuan, Tang, Ewin, Wang, Chunhao
We present an algorithmic framework for quantum-inspired classical algorithms on close-to-low-rank matrices, generalizing the series of results started by Tang's breakthrough quantum-inspired algorithm for recommendation systems [STOC'19]. Motivated by quantum linear algebra algorithms and the quantum singular value transformation (SVT) framework of Gily\'en, Su, Low, and Wiebe [STOC'19], we develop classical algorithms for SVT that run in time independent of input dimension, under suitable quantum-inspired sampling assumptions. Our results give compelling evidence that in the corresponding QRAM data structure input model, quantum SVT does not yield exponential quantum speedups. Since the quantum SVT framework generalizes essentially all known techniques for quantum linear algebra, our results, combined with sampling lemmas from previous work, suffice to generalize all recent results about dequantizing quantum machine learning algorithms. In particular, our classical SVT framework recovers and often improves the dequantization results on recommendation systems, principal component analysis, supervised clustering, support vector machines, low-rank regression, and semidefinite program solving. We also give additional dequantization results on low-rank Hamiltonian simulation and discriminant analysis. Our improvements come from identifying the key feature of the quantum-inspired input model that is at the core of all prior quantum-inspired results: $\ell^2$-norm sampling can approximate matrix products in time independent of their dimension. We reduce all our main results to this fact, making our exposition concise, self-contained, and intuitive.
Duality in Multi-View Restricted Kernel Machines
Achten, Sonny, Pandey, Arun, De Meulemeester, Hannes, De Moor, Bart, Suykens, Johan A. K.
While kernel methods have shown excellent performance and generalization capabilities, they tend to fall behind when We propose a unifying setting that combines existing it comes to large-scale problems due to their memory and restricted kernel machine methods into a single computational complexity. Additionally, it can be difficult primal-dual multi-view framework for kernel to change their architecture to allow for hierarchical principal component analysis in both supervised representation learning, which is one of the most powerful and unsupervised settings. We derive the primal capabilities of neural networks. Recently, Restricted and dual representations of the framework and Kernel Machines (RKM), were proposed which connect relate different training and inference algorithms least-squares support vector machines and kernel principal from a theoretical perspective. We show how to component analysis (kernel PCA) with Restricted Boltzmann achieve full equivalence in primal and dual formulations machines (Suykens, 2017). RKMs extend the primal by rescaling primal variables. Finally, and dual model representations present in least-squares support we experimentally validate the equivalence and vector machines, from shallow to deep architectures by provide insight into the relationships between different introducing the dual variables as hidden features through methods on a number of time series data conjugate feature duality. This provides a framework of sets by recursively forecasting unseen test data kernel methods represented by visible and hidden units as and visualizing the learned features.
Active Class Selection for Few-Shot Class-Incremental Learning
McClurg, Christopher, Ayub, Ali, Tyagi, Harsh, Rajtmajer, Sarah M., Wagner, Alan R.
For real-world applications, robots will need to continually learn in their environments through limited interactions with their users. Toward this, previous works in few-shot class incremental learning (FSCIL) and active class selection (ACS) have achieved promising results but were tested in constrained setups. Therefore, in this paper, we combine ideas from FSCIL and ACS to develop a novel framework that can allow an autonomous agent to continually learn new objects by asking its users to label only a few of the most informative objects in the environment. To this end, we build on a state-of-the-art (SOTA) FSCIL model and extend it with techniques from ACS literature. We further integrate a potential field-based navigation technique with our model to develop a complete framework that can allow an agent to process and reason on its sensory data through the FIASco model, navigate towards the most informative object in the environment, gather data about the object through its sensors and incrementally update the FIASco model. A primary challenge faced by robots deployed in the real world is continual adaptation to dynamic environments. Central to this challenge is object recognition (Ayub & Wagner, 2020d), a task typically requiring labeled examples. In this work, we address the problem of parsimonious object labelling wherein a robot may request labels for a small number of objects about which it knows least. In recent years, several works have been directed toward the problem of Few-Shot Class Incremental Learning (FSCIL) (Tao et al., 2020; Ayub & Wagner, 2020c) to develop models of incremental object learning that can learn from limited training data for each object class. The literature has made significant progress toward developing robots that can continually learn new objects from limited training data while preserving knowledge of previous objects. However, existing methods make strong assumptions about the training data that are rarely true in the real world. For example, FSCIL assumes that in each increment the robot will receive a fully labeled image dataset for the object classes in that increment, and the robot will not receive more data for these classes again (Tao et al., 2020; Ayub & Wagner, 2020c;d). In real world environments, however, robots will most likely encounter many unlabeled objects in their environment, and they will have to direct their learning toward a smaller subset of those unknown objects. Active learning is a subfield of machine learning that focuses on improving the learning efficiency of models by selectively seeking labels from within a large unlabeled data pool (Settles, 2009; Ayub & Fendley, 2022). Related to active learning is active class selection (ACS) in which a model seeks labels for specific object classes (Lomasky et al., 2007).
Learning ECG signal features without backpropagation
Pósfay, Péter, Kurbucz, Marcell T., Kovács, Péter, Jakovác, Antal
Representation learning has become a crucial area of research in machine learning, as it aims to discover efficient ways of representing raw data with useful features to increase the effectiveness, scope and applicability of downstream tasks such as classification and prediction. In this paper, we propose a novel method to generate representations for time series-type data. This method relies on ideas from theoretical physics to construct a compact representation in a data-driven way, and it can capture both the underlying structure of the data and task-specific information while still remaining intuitive, interpretable and verifiable. This novel methodology aims to identify linear laws that can effectively capture a shared characteristic among samples belonging to a specific class. By subsequently utilizing these laws to generate a classifier-agnostic representation in a forward manner, they become applicable in a generalized setting. We demonstrate the effectiveness of our approach on the task of ECG signal classification, achieving state-of-the-art performance.
Analyzing and Improving Greedy 2-Coordinate Updates for Equality-Constrained Optimization via Steepest Descent in the 1-Norm
Ramesh, Amrutha Varshini, Mishkin, Aaron, Schmidt, Mark, Zhou, Yihan, Lavington, Jonathan Wilder, She, Jennifer
We consider minimizing a smooth function subject to a summation constraint over its variables. By exploiting a connection between the greedy 2-coordinate update for this problem and equality-constrained steepest descent in the 1-norm, we give a convergence rate for greedy selection under a proximal Polyak-Lojasiewicz assumption that is faster than random selection and independent of the problem dimension $n$. We then consider minimizing with both a summation constraint and bound constraints, as arises in the support vector machine dual problem. Existing greedy rules for this setting either guarantee trivial progress only or require $O(n^2)$ time to compute. We show that bound- and summation-constrained steepest descent in the L1-norm guarantees more progress per iteration than previous rules and can be computed in only $O(n \log n)$ time.
Feature-Based Time-Series Analysis in R using the theft Package
Henderson, Trent, Fulcher, Ben D.
Time series are measured and analyzed across the sciences. One method of quantifying the structure of time series is by calculating a set of summary statistics or `features', and then representing a time series in terms of its properties as a feature vector. The resulting feature space is interpretable and informative, and enables conventional statistical learning approaches, including clustering, regression, and classification, to be applied to time-series datasets. Many open-source software packages for computing sets of time-series features exist across multiple programming languages, including catch22 (22 features: Matlab, R, Python, Julia), feasts (42 features: R), tsfeatures (63 features: R), Kats (40 features: Python), tsfresh (779 features: Python), and TSFEL (390 features: Python). However, there are several issues: (i) a singular access point to these packages is not currently available; (ii) to access all feature sets, users must be fluent in multiple languages; and (iii) these feature-extraction packages lack extensive accompanying methodological pipelines for performing feature-based time-series analysis, such as applications to time-series classification. Here we introduce a solution to these issues in an R software package called theft: Tools for Handling Extraction of Features from Time series. theft is a unified and extendable framework for computing features from the six open-source time-series feature sets listed above. It also includes a suite of functions for processing and interpreting the performance of extracted features, including extensive data-visualization templates, low-dimensional projections, and time-series classification operations. With an increasing volume and complexity of time-series datasets in the sciences and industry, theft provides a standardized framework for comprehensively quantifying and interpreting informative structure in time series.
Beyond Neural-on-Neural Approaches to Speaker Gender Protection
van Bemmel, Loes, Liu, Zhuoran, Vaessen, Nik, Larson, Martha
Recent research has proposed approaches that modify speech to defend against gender inference attacks. The goal of these protection algorithms is to control the availability of information about a speaker's gender, a privacy-sensitive attribute. Currently, the common practice for developing and testing gender protection algorithms is "neural-on-neural", i.e., perturbations are generated and tested with a neural network. In this paper, we propose to go beyond this practice to strengthen the study of gender protection. First, we demonstrate the importance of testing gender inference attacks that are based on speech features historically developed by speech scientists, alongside the conventionally used neural classifiers. Next, we argue that researchers should use speech features to gain insight into how protective modifications change the speech signal. Finally, we point out that gender-protection algorithms should be compared with novel "vocal adversaries", human-executed voice adaptations, in order to improve interpretability and enable before-the-mic protection.
Moreau Envelope Based Difference-of-weakly-Convex Reformulation and Algorithm for Bilevel Programs
Gao, Lucy L., Ye, Jane J., Yin, Haian, Zeng, Shangzhi, Zhang, Jin
Recently, Ye et al. (Mathematical Programming 2023) designed an algorithm for solving a specific class of bilevel programs with an emphasis on applications related to hyperparameter selection, utilizing the difference of convex algorithm based on the value function approach reformulation. The proposed algorithm is particularly powerful when the lower level problem is fully convex , such as a support vector machine model or a least absolute shrinkage and selection operator model. In this paper, to suit more applications related to machine learning and statistics, we substantially weaken the underlying assumption from lower level full convexity to weak convexity. Accordingly, we propose a new reformulation using Moreau envelope of the lower level problem and demonstrate that this reformulation is a difference of weakly convex program. Subsequently, we develop a sequentially convergent algorithm for solving this difference of weakly convex program. To evaluate the effectiveness of our approach, we conduct numerical experiments on the bilevel hyperparameter selection problem from elastic net, sparse group lasso, and RBF kernel support vector machine models.
Orbit Classification of asteroids using implementation of radial Basis Function on Support Vector Machines
Tiberwal, Yashvir, Dwivedi, Nishchal
This research paper focuses on the implementation of radial Basis Function (RBF) Support Vector Machines (SVM) for classifying asteroid orbits. Asteroids are important astronomical objects, and their orbits play a crucial role in understanding the dynamics of the solar system. The International Astronomical Union maintains data archives that provide a playground to experiment with various machine-learning techniques. In this study, we explore the application of RBF SVM algorithm to classify asteroids. The results show that the RBF SVM algorithm provides a good efficiency and accuracy to the dataset. We also analyze the impact of various parameters on the performance of the RBF SVM algorithm and present the optimal parameter settings. Our study highlights the importance of using machine learning techniques for classifying asteroid orbits and the effectiveness of the RBF SVM algorithm in this regard.