AITopics | Learning in High Dimensional Spaces

Collaborating Authors

Learning in High Dimensional Spaces

High-dimensional spaces frequently occur in mathematics and the sciences. They may be parameter spaces or configuration spaces such as in Lagrangian or Hamiltonian mechanics; these are abstract spaces, independent of the physical space we live in. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

On Probabilistic Embeddings in Optimal Dimension Reduction

Murray, Ryan, Pickarski, Adam

arXiv.org Machine LearningAug-5-2024

Dimension reduction algorithms are a crucial part of many data science pipelines, including data exploration, feature creation and selection, and denoising. Despite their wide utilization, many non-linear dimension reduction algorithms are poorly understood from a theoretical perspective. In this work we consider a generalized version of multidimensional scaling, which is posed as an optimization problem in which a mapping from a high-dimensional feature space to a lower-dimensional embedding space seeks to preserve either inner products or norms of the distribution in feature space, and which encompasses many commonly used dimension reduction algorithms. We analytically investigate the variational properties of this problem, leading to the following insights: 1) Solutions found using standard particle descent methods may lead to non-deterministic embeddings, 2) A relaxed or probabilistic formulation of the problem admits solutions with easily interpretable necessary conditions, 3) The globally optimal solutions to the relaxed problem actually must give a deterministic embedding. This progression of results mirrors the classical development of optimal transportation, and in a case relating to the Gromov-Wasserstein distance actually gives explicit insight into the structure of the optimal embeddings, which are parametrically determined and discontinuous. Finally, we illustrate that a standard computational implementation of this task does not learn deterministic embeddings, which means that it learns sub-optimal mappings, and that the embeddings learned in that context have highly misleading clustering structure, underscoring the delicate nature of solving this problem computationally.

dx dy, marginal problem, minimizer, (13 more...)

arXiv.org Machine Learning

2408.02433

Country: North America > United States > North Carolina (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (1.00)

Add feedback

Word Embedding Dimension Reduction via Weakly-Supervised Feature Selection

Xue, Jintang, Wang, Yun-Cheng, Wei, Chengwei, Kuo, C. -C. Jay

arXiv.org Artificial IntelligenceJul-17-2024

As a fundamental task in natural language processing, word embedding converts each word into a representation in a vector space. A challenge with word embedding is that as the vocabulary grows, the vector space's dimension increases and it can lead to a vast model size. Storing and processing word vectors are resource-demanding, especially for mobile edge-devices applications. This paper explores word embedding dimension reduction. To balance computational costs and performance, we propose an efficient and effective weakly-supervised feature selection method, named WordFS. It has two variants, each utilizing novel criteria for feature selection. Experiments conducted on various tasks (e.g., word and sentence similarity and binary and multi-class classification) indicate that the proposed WordFS model outperforms other dimension reduction methods at lower computational costs.

dimension, dimension reduction, feature selection method, (10 more...)

arXiv.org Artificial Intelligence

2407.12342

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Physics-Informed Geometric Operators to Support Surrogate, Dimension Reduction and Generative Models for Engineering Design

Khan, Shahroz, Masood, Zahid, Usama, Muhammad, Kostas, Konstantinos, Kaklis, Panagiotis, Wei, null, Chen, null

arXiv.org Artificial IntelligenceJul-10-2024

In this work, we propose a set of physics-informed geometric operators (GOs) to enrich the geometric data provided for training surrogate/discriminative models, dimension reduction, and generative models, typically employed for performance prediction, dimension reduction, and creating data-driven parameterisations, respectively. However, as both the input and output streams of these models consist of low-level shape representations, they often fail to capture shape characteristics essential for performance analyses. Therefore, the proposed GOs exploit the differential and integral properties of shapes--accessed through Fourier descriptors, curvature integrals, geometric moments, and their invariants--to infuse high-level intrinsic geometric information and physics into the feature vector used for training, even when employing simple model architectures or low-level parametric descriptions. We showed that for surrogate modelling, along with the inclusion of the notion of physics, GOs enact regularisation to reduce over-fitting and enhance generalisation to new, unseen designs. Furthermore, through extensive experimentation, we demonstrate that for dimension reduction and generative models, incorporating the proposed GOs enriches the training data with compact global and local geometric features. This significantly enhances the quality of the resulting latent space, thereby facilitating the generation of valid and diverse designs. Lastly, we also show that GOs can enable learning parametric sensitivities to a great extent. Consequently, these enhancements accelerate the convergence rate of shape optimisers towards optimal solutions.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2407.07611

Country:

Europe (0.92)
North America > United States (0.46)

Genre: Research Report > New Finding (0.93)

Industry:

Transportation > Marine (0.69)
Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.90)

Add feedback

Sharp detection of low-dimensional structure in probability measures via dimensional logarithmic Sobolev inequalities

Li, Matthew T. C., Cui, Tiangang, Li, Fengyi, Marzouk, Youssef, Zahm, Olivier

arXiv.org Machine LearningJun-21-2024

Identifying low-dimensional structure in high-dimensional probability measures is an essential pre-processing step for efficient sampling. We introduce a method for identifying and approximating a target measure $\pi$ as a perturbation of a given reference measure $\mu$ along a few significant directions of $\mathbb{R}^{d}$. The reference measure can be a Gaussian or a nonlinear transformation of a Gaussian, as commonly arising in generative modeling. Our method extends prior work on minimizing majorizations of the Kullback--Leibler divergence to identify optimal approximations within this class of measures. Our main contribution unveils a connection between the \emph{dimensional} logarithmic Sobolev inequality (LSI) and approximations with this ansatz. Specifically, when the target and reference are both Gaussian, we show that minimizing the dimensional LSI is equivalent to minimizing the KL divergence restricted to this ansatz. For general non-Gaussian measures, the dimensional LSI produces majorants that uniformly improve on previous majorants for gradient-based dimension reduction. We further demonstrate the applicability of this analysis to the squared Hellinger distance, where analogous reasoning shows that the dimensional Poincar\'e inequality offers improved bounds.

dimension reduction, inequality, logarithmic sobolev inequality, (14 more...)

arXiv.org Machine Learning

2406.13036

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Energy (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.36)

Add feedback

Coupled Input-Output Dimension Reduction: Application to Goal-oriented Bayesian Experimental Design and Global Sensitivity Analysis

Chen, Qiao, Arnaud, Elise, Baptista, Ricardo, Zahm, Olivier

arXiv.org Machine LearningJun-19-2024

We introduce a new method to jointly reduce the dimension of the input and output space of a high-dimensional function. Choosing a reduced input subspace influences which output subspace is relevant and vice versa. Conventional methods focus on reducing either the input or output space, even though both are often reduced simultaneously in practice. Our coupled approach naturally supports goal-oriented dimension reduction, where either an input or output quantity of interest is prescribed. We consider, in particular, goal-oriented sensor placement and goal-oriented sensitivity analysis, which can be viewed as dimension reduction where the most important output or, respectively, input components are chosen. Both applications present difficult combinatorial optimization problems with expensive objectives such as the expected information gain and Sobol indices. By optimizing gradient-based bounds, we can determine the most informative sensors and most sensitive parameters as the largest diagonal entries of some diagnostic matrices, thus bypassing the combinatorial optimization and objective evaluation.

dimension reduction, inequality, reduction, (15 more...)

arXiv.org Machine Learning

2406.13425

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
North America > United States > California > Los Angeles County > Pasadena (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > India > West Bengal > Kolkata (0.04)

Genre: Research Report (0.87)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.84)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)

Add feedback

Mitigating the Curse of Dimensionality for Certified Robustness via Dual Randomized Smoothing

Xia, Song, Yu, Yi, Jiang, Xudong, Ding, Henghui

arXiv.org Artificial IntelligenceJun-15-2024

Randomized Smoothing (RS) has been proven a promising method for endowing an arbitrary image classifier with certified robustness. However, the substantial uncertainty inherent in the high-dimensional isotropic Gaussian noise imposes the curse of dimensionality on RS. Specifically, the upper bound of ${\ell_2}$ certified robustness radius provided by RS exhibits a diminishing trend with the expansion of the input dimension $d$, proportionally decreasing at a rate of $1/\sqrt{d}$. This paper explores the feasibility of providing ${\ell_2}$ certified robustness for high-dimensional input through the utilization of dual smoothing in the lower-dimensional space. The proposed Dual Randomized Smoothing (DRS) down-samples the input image into two sub-images and smooths the two sub-images in lower dimensions. Theoretically, we prove that DRS guarantees a tight ${\ell_2}$ certified robustness radius for the original input and reveal that DRS attains a superior upper bound on the ${\ell_2}$ robustness radius, which decreases proportionally at a rate of $(1/\sqrt m + 1/\sqrt n )$ with $m+n=d$. Extensive experiments demonstrate the generalizability and effectiveness of DRS, which exhibits a notable capability to integrate with established methodologies, yielding substantial improvements in both accuracy and ${\ell_2}$ certified robustness baselines of RS on the CIFAR-10 and ImageNet datasets. Code is available at https://github.com/xiasong0501/DRS.

classifier, conference paper, robustness, (13 more...)

arXiv.org Artificial Intelligence

2404.09586

Country: Asia > Singapore (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Noisy Data Visualization using Functional Data Analysis

Chen, Haozhe, Correa, Andres Felipe Duque, Wolf, Guy, Moon, Kevin R.

arXiv.org Machine LearningJun-5-2024

Data visualization via dimensionality reduction is an important tool in exploratory data analysis. However, when the data are noisy, many existing methods fail to capture the underlying structure of the data. The method called Empirical Intrinsic Geometry (EIG) was previously proposed for performing dimensionality reduction on high dimensional dynamical processes while theoretically eliminating all noise. However, implementing EIG in practice requires the construction of high-dimensional histograms, which suffer from the curse of dimensionality. Here we propose a new data visualization method called Functional Information Geometry (FIG) for dynamical processes that adapts the EIG framework while using approaches from functional data analysis to mitigate the curse of dimensionality. We experimentally demonstrate that the resulting method outperforms a variant of EIG designed for visualization in terms of capturing the true structure, hyperparameter robustness, and computational speed. We then use our method to visualize EEG brain measurements of sleep activity.

basis function, mahalanobis distance, visualization, (12 more...)

arXiv.org Machine Learning

2406.03396

Country:

North America > United States > Utah > Cache County > Logan (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Republic of Türkiye > İzmir Province > İzmir (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.48)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.54)

Add feedback

Enhancing Sufficient Dimension Reduction via Hellinger Correlation

Hong, Seungbeom, Kim, Ilmun, Song, Jun

arXiv.org Machine LearningMay-30-2024

In this work, we develop a new theory and method for sufficient dimension reduction (SDR) in single-index models, where SDR is a sub-field of supervised dimension reduction based on conditional independence. Our work is primarily motivated by the recent introduction of the Hellinger correlation as a dependency measure. Utilizing this measure, we develop a method capable of effectively detecting the dimension reduction subspace, complete with theoretical justification. Through extensive numerical experiments, we demonstrate that our proposed method significantly enhances and outperforms existing SDR methods. This improvement is largely attributed to our proposed method's deeper understanding of data dependencies and the refinement of existing SDR techniques.

enhancing sufficient dimension reduction, hellinger correlation, predictor, (8 more...)

arXiv.org Machine Learning

2405.19704

Country:

Europe > Austria > Vienna (0.14)
Asia > South Korea > Seoul > Seoul (0.04)
North America > United States > Minnesota (0.04)
Asia > India > West Bengal > Kolkata (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (1.00)

Add feedback

Sensitivity Analysis for Active Sampling, with Applications to the Simulation of Analog Circuits

Chhaibi, Reda, Gamboa, Fabrice, Oger, Christophe, Oliveira, Vinicius, Pellegrini, Clément, Remot, Damien

arXiv.org Machine LearningMay-13-2024

We propose an active sampling flow, with the use-case of simulating the impact of combined variations on analog circuits. In such a context, given the large number of parameters, it is difficult to fit a surrogate model and to efficiently explore the space of design features. By combining a drastic dimension reduction using sensitivity analysis and Bayesian surrogate modeling, we obtain a flexible active sampling flow. On synthetic and real datasets, this flow outperforms the usual Monte-Carlo sampling which often forms the foundation of design space exploration.

dataset, selection, variation, (13 more...)

arXiv.org Machine Learning

2405.07971

Country:

Europe > France > Occitanie > Haute-Garonne > Toulouse (0.07)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.34)

Add feedback

Super-Bit Locality-Sensitive Hashing Jianqiu Ji

Neural Information Processing SystemsMar-14-2024, 01:22:30 GMT

Sign-random-projection locality-sensitive hashing (SRP-LSH) is a probabilistic dimension reduction method which provides an unbiased estimate of angular similarity, yet suffers from the large variance of its estimation. In this work, we propose the Super-Bit locality-sensitive hashing (SBLSH). It is easy to implement, which orthogonalizes the random projection vectors in batches, and it is theoretically guaranteed that SBLSH also provides an unbiased estimate of angular similarity, yet with a smaller variance when the angle to estimate is within (0, /2]. The extensive experiments on real data well validate that given the same length of binary code, SBLSH may achieve significant mean squared error reduction in estimating pairwise angular similarity. Moreover, SBLSH shows the superiority over SRP-LSH in approximate nearest neighbor (ANN) retrieval experiments.

Neural Information Processing Systems

Country:

North America > United States > Texas > Bexar County > San Antonio (0.04)
Asia > Singapore > Central Region > Singapore (0.04)
Asia > China > Beijing > Beijing (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.34)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.30)

Add feedback