Dimensionality Reduction
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
- North America > Canada (0.04)
- Europe > Portugal > Coimbra > Coimbra (0.04)
- (4 more...)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.43)
PCA-Guided Autoencoding for Structured Dimensionality Reduction in Active Infrared Thermography
Salah, Mohammed, Saeed, Numan, Svetinovic, Davor, Sfarra, Stefano, Omar, Mohammed, Abdulrahman, Yusra
Active Infrared thermography (AIRT) is a widely adopted non-destructive testing (NDT) technique for detecting subsurface anomalies in industrial components. Due to the high dimensionality of AIRT data, current approaches employ non-linear autoencoders (AEs) for dimensionality reduction. However, the latent space learned by AIRT AEs lacks structure, limiting their effectiveness in downstream defect characterization tasks. To address this limitation, this paper proposes a principal component analysis guided (PCA-guided) autoencoding framework for structured dimensionality reduction to capture intricate, non-linear features in thermographic signals while enforcing a structured latent space. A novel loss function, PCA distillation loss, is introduced to guide AIRT AEs to align the latent representation with structured PCA components while capturing the intricate, non-linear patterns in thermographic signals. To evaluate the utility of the learned, structured latent space, we propose a neural network-based evaluation metric that assesses its suitability for defect characterization. Experimental results show that the proposed PCA-guided AE outperforms state-of-the-art dimensionality reduction methods on PVC, CFRP, and PLA samples in terms of contrast, signal-to-noise ratio (SNR), and neural network-based metrics.
- Europe > Austria (0.28)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.15)
- Energy (0.68)
- Materials > Construction Materials (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Understanding Bias in Perceiving Dimensionality Reduction Projections
Doh, Seoyoung, Jeon, Hyeon, Shin, Sungbok, Quadri, Ghulam Jilani, Kim, Nam Wook, Seo, Jinwook
We assume a scenario where a practitioner wants to select DR techniques that produce projections suitable for analyzing their dataset. It is crucial to select projections with high faithfulness for reliable analysis, but practitioners tend to be more biased towards the visual interestingness. We verify the existence of such bias and investigate its underlying causes, providing a grounded basis for mitigating the biases. Selecting the dimensionality reduction technique that faithfully represents the structure is essential for reliable visual communication and analytics. In reality, however, practitioners favor projections for other attractions, such as aesthetics and visual saliency, over the projection's structural faithfulness, a bias we define as visual interestingness. In this research, we conduct a user study that (1) verifies the existence of such bias and (2) explains why the bias exists. Our study suggests that visual interestingness biases practitioners' preferences when selecting projections for analysis, and this bias intensifies with color-encoded labels and shorter exposure time. Based on our findings, we discuss strategies to mitigate bias in perceiving and interpreting DR projections.
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.62)
How does Labeling Error Impact Contrastive Learning? A Perspective from Data Dimensionality Reduction
Chen, Jun, Chen, Hong, Yu, Yonghua, Ying, Yiming
In recent years, contrastive learning has achieved state-of-the-art performance in the territory of self-supervised representation learning. Many previous works have attempted to provide the theoretical understanding underlying the success of contrastive learning. Almost all of them rely on a default assumption, i.e., the label consistency assumption, which may not hold in practice (the probability of failure is called labeling error) due to the strength and randomness of common augmentation strategies, such as random resized crop (RRC). This paper investigates the theoretical impact of labeling error on the downstream classification performance of contrastive learning. We first reveal several significant negative impacts of labeling error on downstream classification risk. To mitigate these impacts, data dimensionality reduction method (e.g., singular value decomposition, SVD) is applied on original data to reduce false positive samples, and establish both theoretical and empirical evaluations. Moreover, it is also found that SVD acts as a double-edged sword, which may lead to the deterioration of downstream classification accuracy due to the reduced connectivity of the augmentation graph. Based on the above observations, we give the augmentation suggestion that we should use some moderate embedding dimension (such as $512, 1024$ in our experiments), data inflation, weak augmentation, and SVD to ensure large graph connectivity and small labeling error to improve model performance.
- Asia > China > Hubei Province > Wuhan (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > Canada (0.04)
- Africa > Guinea > Kankan Region > Kankan Prefecture > Kankan (0.04)
- Research Report > New Finding (0.34)
- Research Report > Experimental Study (0.34)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.62)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.56)
Improving Clustering on Occupational Text Data through Dimensionality Reduction
García, Iago Xabier Vázquez, Partanaz, Damla, Yetkin, Emrullah Fatih
In this study, we focused on proposing an optimal clustering mechanism for the occupations defined in the well-known US-based occupational database, O*NET. Even though all occupations are defined according to well-conducted surveys in the US, their definitions can vary for different firms and countries. Hence, if one wants to expand the data that is already collected in O*NET for the occupations defined with different tasks, a map between the definitions will be a vital requirement. We proposed a pipeline using several BERT-based techniques with various clustering approaches to obtain such a map. We also examined the effect of dimensionality reduction approaches on several metrics used in measuring performance of clustering algorithms. Finally, we improved our results by using a specialized silhouette approach. This new clustering-based mapping approach with dimensionality reduction may help distinguish the occupations automatically, creating new paths for people wanting to change their careers.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- (5 more...)
- Education (0.68)
- Information Technology (0.46)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.84)
Enabling stratified sampling in high dimensions via nonlinear dimensionality reduction
Geraci, Gianluca, Schiavazzi, Daniele E., Zanoni, Andrea
We consider the problem of propagating the uncertainty from a possibly large number of random inputs through a computationally expensive model. Stratified sampling is a well-known variance reduction strategy, but its application, thus far, has focused on models with a limited number of inputs due to the challenges of creating uniform partitions in high dimensions. To overcome these challenges, we perform stratification with respect to the uniform distribution defined over the unit interval, and then derive the corresponding strata in the original space using nonlinear dimensionality reduction. We show that our approach is effective in high dimensions and can be used to further reduce the variance of multifidelity Monte Carlo estimators.
- North America > United States > New York (0.04)
- North America > United States > Ohio > Franklin County > Columbus (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- (4 more...)
- Government > Regional Government > North America Government > United States Government (0.68)
- Energy (0.67)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.61)
Randomized Dimensionality Reduction for Euclidean Maximization and Diversity Measures
Gao, Jie, Jayaram, Rajesh, Kolbe, Benedikt, Sapir, Shay, Schwiegelshohn, Chris, Silwal, Sandeep, Waingarten, Erik
Randomized dimensionality reduction is a widely-used algorithmic technique for speeding up large-scale Euclidean optimization problems. In this paper, we study dimension reduction for a variety of maximization problems, including max-matching, max-spanning tree, max TSP, as well as various measures for dataset diversity. For these problems, we show that the effect of dimension reduction is intimately tied to the \emph{doubling dimension} $λ_X$ of the underlying dataset $X$ -- a quantity measuring intrinsic dimensionality of point sets. Specifically, we prove that a target dimension of $O(λ_X)$ suffices to approximately preserve the value of any near-optimal solution,which we also show is necessary for some of these problems. This is in contrast to classical dimension reduction results, whose dependence increases with the dataset size $|X|$. We also provide empirical results validating the quality of solutions found in the projected space, as well as speedups due to dimensionality reduction.
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
- Europe > Denmark (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- (13 more...)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.82)
FedNE: Surrogate-Assisted Federated Neighbor Embedding for Dimensionality Reduction
Federated learning (FL) has rapidly evolved as a promising paradigm that enables collaborative model training across distributed participants without exchanging their local data. Despite its broad applications in fields such as computer vision, graph learning, and natural language processing, the development of a data projection model that can be effectively used to visualize data in the context of FL is crucial yet remains heavily under-explored. Neighbor embedding (NE) is an essential technique for visualizing complex high-dimensional data, but collaboratively learning a joint NE model is difficult. The key challenge lies in the objective function, as effective visualization algorithms like NE require computing loss functions among pairs of data. In this paper, we introduce \textsc{FedNE}, a novel approach that integrates the \textsc{FedAvg} framework with the contrastive NE technique, without any requirements of shareable data.
- Information Technology > Artificial Intelligence > Natural Language (0.99)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.40)
Multi-Criteria Dimensionality Reduction with Applications to Fairness
Dimensionality reduction is a classical technique widely used for data analysis. One foundational instantiation is Principal Component Analysis (PCA), which minimizes the average reconstruction error. In this paper, we introduce the multi-criteria dimensionality reduction problem where we are given multiple objectives that need to be optimized simultaneously. As an application, our model captures several fairness criteria for dimensionality reduction such as the Fair-PCA problem introduced by Samadi et al. [NeurIPS18] and the Nash Social Welfare (NSW) problem. In the Fair-PCA problem, the input data is divided into k groups, and the goal is to find a single d-dimensional representation for all groups for which the maximum reconstruction error of any one group is minimized.
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (1.00)
ESPACE: Dimensionality Reduction of Activations for Model Compression
We propose ESPACE, an LLM compression technique based on dimensionality reduction of activations. Unlike prior works on weight-centric tensor decomposition, ESPACE projects activations onto a pre-calibrated set of principal components. The activation-centrality of the approach enables retraining LLMs with no loss of expressivity; while at inference, weight decomposition is obtained as a byproduct of matrix multiplication associativity. Theoretical results on the construction of projection matrices with optimal computational accuracy are provided. Experimentally, we find ESPACE enables 50% compression of GPT3, Llama2, and Nemotron4 models with small accuracy degradation, as low as a 0.18 perplexity increase on GPT3-22B.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.65)