AITopics

2306.00361

Country: North America > United States > Ohio (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Energy (0.67)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

arXiv.org Artificial IntelligenceMay-20-2023

Contrastive inverse regression for dimension reduction

Hawke, Sam, Luo, Hengrui, Li, Didong

Supervised dimension reduction (SDR) has been a topic of growing interest in data science, as it enables the reduction of high-dimensional covariates while preserving the functional relation with certain response variables of interest. However, existing SDR methods are not suitable for analyzing datasets collected from case-control studies. In this setting, the goal is to learn and exploit the low-dimensional structure unique to or enriched by the case group, also known as the foreground group. While some unsupervised techniques such as the contrastive latent variable model and its variants have been developed for this purpose, they fail to preserve the functional relationship between the dimension-reduced covariates and the response variable. In this paper, we propose a supervised dimension reduction method called contrastive inverse regression (CIR) specifically designed for the contrastive setting. CIR introduces an optimization problem defined on the Stiefel manifold with a non-standard loss function. We prove the convergence of CIR to a local optimum using a gradient descent-based algorithm, and our numerical study empirically demonstrates the improved performance over competing methods for high-dimensional data.

algorithm, artificial intelligence, machine learning, (17 more...)

2305.12287

Country: North America > United States (0.14)

Genre: Research Report > Experimental Study (0.88)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

arXiv.org Artificial IntelligenceApr-27-2023

Spherical Rotation Dimension Reduction with Geometric Loss Functions

Luo, Hengrui, Purvis, Jeremy E., Li, Didong

Modern datasets often exhibit high dimensionality, yet the data reside in low-dimensional manifolds that can reveal underlying geometric structures critical for data analysis. A prime example of such a dataset is a collection of cell cycle measurements, where the inherently cyclical nature of the process can be represented as a circle or sphere. Motivated by the need to analyze these types of datasets, we propose a nonlinear dimension reduction method, Spherical Rotation Component Analysis (SRCA), that incorporates geometric information to better approximate low-dimensional manifolds. SRCA is a versatile method designed to work in both high-dimensional and small sample size settings. By employing spheres or ellipsoids, SRCA provides a low-rank spherical representation of the data with general theoretic guarantees, effectively retaining the geometric structure of the dataset during dimensionality reduction. A comprehensive simulation study, along with a successful application to human cell cycle data, further highlights the advantages of SRCA compared to state-of-the-art alternatives, demonstrating its superior performance in approximating the manifold while preserving inherent geometric structures.

artificial intelligence, dataset, machine learning, (17 more...)

2204.10975

Country:

North America > United States > North Carolina (0.14)
North America > United States > California (0.14)

Genre:

Research Report > New Finding (0.45)
Research Report > Experimental Study (0.34)

Industry:

Health & Medicine (1.00)
Government > Regional Government (0.45)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Artificial IntelligenceMar-13-2023

Nonparametric Multi-shape Modeling with Uncertainty Quantification

Luo, Hengrui, Strait, Justin D.

The modeling and uncertainty quantification of closed curves is an important problem in the field of shape analysis, and can have significant ramifications for subsequent statistical tasks. Many of these tasks involve collections of closed curves, which often exhibit structural similarities at multiple levels. Modeling multiple closed curves in a way that efficiently incorporates such between-curve dependence remains a challenging problem. In this work, we propose and investigate a multiple-output (a.k.a. multi-output), multi-dimensional Gaussian process modeling framework. We illustrate the proposed methodological advances, and demonstrate the utility of meaningful uncertainty quantification, on several curve and shape-related tasks. This model-based approach not only addresses the problem of inference on closed curves (and their shapes) with kernel constructions, but also opens doors to nonparametric modeling of multi-level dependence for functional objects in general.

artificial intelligence, machine learning, optimization problem, (20 more...)

2206.09127

Country:

North America > United States (0.67)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (0.50)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

arXiv.org Machine LearningSep-15-2021

Non-smooth Bayesian Optimization in Tuning Problems

Luo, Hengrui, Demmel, James W., Cho, Younghyun, Li, Xiaoye S., Liu, Yang

Building surrogate models is one common approach when we attempt to learn unknown black-box functions. Bayesian optimization provides a framework which allows us to build surrogate models based on sequential samples drawn from the function and find the optimum. Tuning algorithmic parameters to optimize the performance of large, complicated "black-box" application codes is a specific important application, which aims at finding the optima of black-box functions. Within the Bayesian optimization framework, the Gaussian process model produces smooth or continuous sample paths. However, the black-box function in the tuning problem is often non-smooth. This difficult tuning problem is worsened by the fact that we usually have limited sequential samples from the black-box function. Motivated by these issues encountered in tuning, we propose a novel additive Gaussian process model called clustered Gaussian process (cGP), where the additive components are induced by clustering. In the examples we studied, the performance can be improved by as much as 90% among repetitive experiments. By using this surrogate model, we want to capture the non-smoothness of the black-box function. In addition to an algorithm for constructing this model, we also apply the model to several artificial and real applications to evaluate it.

optimization problem, surrogate model, survey article, (16 more...)

2109.07563

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report > New Finding (0.45)

Industry:

Energy (1.00)
Government > Regional Government (0.45)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

arXiv.org Machine LearningSep-8-2020

A Distance-preserving Matrix Sketch

Wilkinson, Leland, Luo, Hengrui

Visualizing very large matrices involves many formidable problems. Various popular solutions to these problems involve sampling, clustering, projection, or feature selection to reduce the size and complexity of the original task. An important aspect of these methods is how to preserve relative distances between points in the higher-dimensional space after reducing rows and columns to fit in a lower dimensional space. This aspect is important because conclusions based on faulty visual reasoning can be harmful. Judging dissimilar points as similar or similar points as dissimilar on the basis of a visualization can lead to false conclusions. To ameliorate this bias and to make visualizations of very large datasets feasible, we introduce a new algorithm that selects a subset of rows and columns of a rectangular matrix. This selection is designed to preserve relative distances as closely as possible. We compare our matrix sketch to more traditional alternatives on a variety of artificial and real datasets.

algorithm, artificial intelligence, health & medicine, (18 more...)

2009.03979

Country:

North America > United States > New York (0.14)
North America > United States > Illinois (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Visualization (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Machine LearningJun-3-2020

Generalized Penalty for Circular Coordinate Representation

Luo, Hengrui, Patania, Alice, Kim, Jisu, Vejdemo-Johansson, Mikael

Topological Data Analysis (TDA) provides novel approaches that allow us to analyze the geometrical shapes and topological structures of a dataset. As one important application, TDA can be used for data visualization and dimension reduction. We follow the framework of circular coordinate representation, which allows us to perform dimension reduction and visualization for high-dimensional datasets on a torus using persistent cohomology. In this paper, we propose a method to adapt the circular coordinate framework to take into account sparsity in high-dimensional applications. We use a generalized penalty function instead of an $L_{2}$ penalty in the traditional circular coordinate algorithm. We provide simulation experiments and real data analysis to support our claim that circular coordinates with generalized penalty will accommodate the sparsity in high-dimensional datasets under different sampling schemes while preserving the topological structures.

artificial intelligence, circular coordinate, spatial reasoning, (15 more...)

2006.02554

Country:

North America > United States > New York (0.14)
North America > United States > Indiana (0.14)
North America > United States > Ohio (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

arXiv.org Machine LearningOct-10-2019

Combining Geometric and Topological Information in Image Segmentation

Luo, Hengrui, Strait, Justin, Saha, Abhijoy

A fundamental problem in computer vision is image segmentation, where the goal is to delineate the boundary of the object in the image. The focus of this work is on the segmentation of grayscale images and its purpose is two-fold. First, we conduct an in-depth study comparing active contour and topologically-based methods, two popular approaches for boundary detection of 2-dimensional images. Certain properties of the image dataset may favor one method over the other, both from an interpretability perspective as well as through evaluation of performance measures. Second, we propose the use of topological knowledge to assist an active contour method, which can potentially incorporate prior shape information. The latter is known to be extremely sensitive to algorithm initialization, and thus, we use a topological model to provide an automatic initialization. In addition, our proposed model can handle objects in images with more complex topological structures. We demonstrate this on artificially-constructed image datasets from computer vision, as well as real medical image data.

contour, health & medicine, survey article, (22 more...)

1910.04778

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Therapeutic Area (0.93)
Health & Medicine > Diagnostic Medicine > Imaging (0.48)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)