explicit feature
Learning Bounds for Greedy Approximation with Explicit Feature Maps from Multiple Kernels
Nonlinear kernels can be approximated using finite-dimensional feature maps for efficient risk minimization. Due to the inherent trade-off between the dimension of the (mapped) feature space and the approximation accuracy, the key problem is to identify promising (explicit) features leading to a satisfactory out-of-sample performance. In this work, we tackle this problem by efficiently choosing such features from multiple kernels in a greedy fashion. Our method sequentially selects these explicit features from a set of candidate features using a correlation metric. We establish an out-of-sample error bound capturing the trade-off between the error in terms of explicit features (approximation error) and the error due to spectral properties of the best model in the Hilbert space associated to the combined kernel (spectral error). The result verifies that when the (best) underlying data model is sparse enough, i.e., the spectral error is negligible, one can control the test error with a small number of explicit features, that can scale poly-logarithmically with data. Our empirical results show that given a fixed number of explicit features, the method can achieve a lower test error with a smaller time cost, compared to the state-of-the-art in data-dependent random features.
Structured Contrastive Learning for Interpretable Latent Representations
Shen, Zhengyang, Tu, Hua, Shi, Mayue
Neural networks exhibit severe brittleness to semantically irrelevant transformations. A mere 75ms electrocardiogram (ECG) phase shift degrades latent cosine similarity from 1.0 to 0.2, while sensor rotations collapse activity recognition performance with inertial measurement units (IMUs). We identify the root cause as "laissez-faire" representation learning, where latent spaces evolve unconstrained provided task performance is satisfied. We propose Structured Contrastive Learning (SCL), a framework that partitions latent space representations into three semantic groups: invariant features that remain consistent under given transformations (e.g., phase shifts or rotations), variant features that actively differentiate transformations via a novel variant mechanism, and free features that preserve task flexibility. This creates controllable push-pull dynamics where different latent dimensions serve distinct, interpretable purposes. The variant mechanism enhances contrastive learning by encouraging variant features to differentiate within positive pairs, enabling simultaneous robustness and interpretability. Our approach requires no architectural modifications and integrates seamlessly into existing training pipelines. Experiments on ECG phase invariance and IMU rotation robustness demonstrate superior performance: ECG similarity improves from 0.25 to 0.91 under phase shifts, while WISDM activity recognition achieves 86.65% accuracy with 95.38% rotation consistency, consistently outperforming traditional data augmentation. This work represents a paradigm shift from reactive data augmentation to proactive structural learning, enabling interpretable latent representations in neural networks.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- South America > Peru > Loreto Department (0.04)
- South America > Brazil (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
Fishing for Phishers: Learning-Based Phishing Detection in Ethereum Transactions
Alghuried, Ahod, Alghamdi, Abdulaziz, Alkinoon, Ali, Choi, Soohyeon, Mohaisen, Manar, Mohaisen, David
Phishing detection on Ethereum has increasingly leveraged advanced machine learning techniques to identify fraudulent transactions. However, limited attention has been given to understanding the effectiveness of feature selection strategies and the role of graph-based models in enhancing detection accuracy. In this paper, we systematically examine these issues by analyzing and contrasting explicit transactional features and implicit graph-based features, both experimentally and analytically. We explore how different feature sets impact the performance of phishing detection models, particularly in the context of Ethereum's transactional network. Additionally, we address key challenges such as class imbalance and dataset composition and their influence on the robustness and precision of detection methods. Our findings demonstrate the advantages and limitations of each feature type, while also providing a clearer understanding of how feature affect model resilience and generalization in adversarial environments.
- North America > United States > New York > Richmond County > New York City (0.04)
- North America > United States > New York > Queens County > New York City (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (6 more...)
- Information Technology > Security & Privacy (1.00)
- Banking & Finance > Trading (1.00)
- Information Technology > Services > e-Commerce Services (0.42)
- Information Technology > e-Commerce > Financial Technology (1.00)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Reviews: Link Prediction Based on Graph Neural Networks
Quality: From the technical point of view, the paper seems to be well-prepared. In particular, the authors propose \gamma-heuristic theory for link prediction and show that many common heuristics can be expressed in a general formulation. They build their framework based on this theory and on the previous work (WLNM). However, I would like to touch on some points here: - In Line 115, there is no clear definition of the \textbf{h-order heuristic} although it is mentioned between lines 26 and 31. If \textbf{h-order heuristic} is defined as, for example, "a heuristic requiring to know up to h-hop neighborhood of the target nodes", then do we really need Theorem 1? - In lines 274 and 275, it is stated that "We run all experiments for 5 times and report the average".
Repairing Catastrophic-Neglect in Text-to-Image Diffusion Models via Attention-Guided Feature Enhancement
Chang, Zhiyuan, Li, Mingyang, Wang, Junjie, Liu, Yi, Wang, Qing, Liu, Yang
Text-to-Image Diffusion Models (T2I DMs) have garnered significant attention for their ability to generate high-quality images from textual descriptions. However, these models often produce images that do not fully align with the input prompts, resulting in semantic inconsistencies. The most prominent issue among these semantic inconsistencies is catastrophic-neglect, where the images generated by T2I DMs miss key objects mentioned in the prompt. We first conduct an empirical study on this issue, exploring the prevalence of catastrophic-neglect, potential mitigation strategies with feature enhancement, and the insights gained. Guided by the empirical findings, we propose an automated repair approach named Patcher to address catastrophic-neglect in T2I DMs. Specifically, Patcher first determines whether there are any neglected objects in the prompt, and then applies attention-guided feature enhancement to these neglected objects, resulting in a repaired prompt. Experimental results on three versions of Stable Diffusion demonstrate that Patcher effectively repairs the issue of catastrophic-neglect, achieving 10.1%-16.3% higher Correct Rate in image generation compared to baselines.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Asia > China > Beijing > Beijing (0.04)
- (6 more...)
Xu
Hashing, as a popular approximate nearest neighbor search, has been widely used for large-scale similarity search. Recently, a spectrum of machine learning methods are utilized to learn similarity-preserving binary codes. However, most of them directly encode the explicit features, keywords, which fail to preserve the accurate semantic similarities in binary code beyond keyword matching, especially on short texts. Here we propose a novel text hashing framework with convolutional neural networks.
Generalizing to Unseen Domains: A Survey on Domain Generalization
Wang, Jindong, Lan, Cuiling, Liu, Chang, Ouyang, Yidong, Qin, Tao
Domain generalization (DG), i.e., out-of-distribution generalization, has attracted increased interests in recent years. Domain generalization deals with a challenging setting where one or several different but related domain(s) are given, and the goal is to learn a model that can generalize to an unseen test domain. For years, great progress has been achieved. This paper presents the first review for recent advances in domain generalization. First, we provide a formal definition of domain generalization and discuss several related fields. Next, we thoroughly review the theories related to domain generalization and carefully analyze the theory behind generalization. Then, we categorize recent algorithms into three classes and present them in detail: data manipulation, representation learning, and learning strategy, each of which contains several popular algorithms. Third, we introduce the commonly used datasets and applications. Finally, we summarize existing literature and present some potential research topics for the future.
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- Asia > China > Beijing > Beijing (0.04)
Learning Bounds for Greedy Approximation with Explicit Feature Maps from Multiple Kernels
Shahrampour, Shahin, Tarokh, Vahid
Nonlinear kernels can be approximated using finite-dimensional feature maps for efficient risk minimization. Due to the inherent trade-off between the dimension of the (mapped) feature space and the approximation accuracy, the key problem is to identify promising (explicit) features leading to a satisfactory out-of-sample performance. In this work, we tackle this problem by efficiently choosing such features from multiple kernels in a greedy fashion. Our method sequentially selects these explicit features from a set of candidate features using a correlation metric. We establish an out-of-sample error bound capturing the trade-off between the error in terms of explicit features (approximation error) and the error due to spectral properties of the best model in the Hilbert space associated to the combined kernel (spectral error).
Graph Bayesian Optimization: Algorithms, Evaluations and Applications
Network structure optimization is a fundamental task in complex network analysis. However, almost all the research on Bayesian optimization is aimed at optimizing the objective functions with vectorial inputs. In this work, we first present a flexible framework, denoted graph Bayesian optimization, to handle arbitrary graphs in the Bayesian optimization community. By combining the proposed framework with graph kernels, it can take full advantage of implicit graph structural features to supplement explicit features guessed according to the experience, such as tags of nodes and any attributes of graphs. The proposed framework can identify which features are more important during the optimization process. We apply the framework to solve four problems including two evaluations and two applications to demonstrate its efficacy and potential applications.
- North America > United States (0.14)
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- (2 more...)
- Government (0.67)
- Health & Medicine (0.67)
- Transportation > Infrastructure & Services (0.50)
- Transportation > Ground > Road (0.31)
Link Prediction Based on Graph Neural Networks
Traditional methods for link prediction can be categorized into three main types: graph structure feature-based, latent feature-based, and explicit feature-based. Graph structure feature methods leverage some handcrafted node proximity scores, e.g., common neighbors, to estimate the likelihood of links. Latent feature methods rely on factorizing networks' matrix representations to learn an embedding for each node. Explicit feature methods train a machine learning model on two nodes' explicit attributes. Each of the three types of methods has its unique merits. In this paper, we propose SEAL (learning from Subgraphs, Embeddings, and Attributes for Link prediction), a new framework for link prediction which combines the power of all the three types into a single graph neural network (GNN). GNN is a new type of neural network which directly accepts graphs as input and outputs their labels. In SEAL, the input to the GNN is a local subgraph around each target link. We prove theoretically that our local subgraphs also reserve a great deal of high-order graph structure features related to link existence. Another key feature is that our GNN can naturally incorporate latent features and explicit features. It is achieved by concatenating node embeddings (latent features) and node attributes (explicit features) in the node information matrix for each subgraph, thus combining the three types of features to enhance GNN learning. Through extensive experiments, SEAL shows unprecedentedly strong performance against a wide range of baseline methods, including various link prediction heuristics and network embedding methods.
- Oceania > Australia > New South Wales > Sydney (0.04)
- Asia (0.04)