AITopics

2212.01046

Country:

North America > United States > Missouri (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Li, Jichao, Du, Xiaosong, Martins, Joaquim R. R. A.

Machine Learning in Aerodynamic Shape Optimization

Machine learning (ML) has been increasingly used to aid aerodynamic shape optimization (ASO), thanks to the availability of aerodynamic data and continued developments in deep learning. We review the applications of ML in ASO to date and provide a perspective on the state-of-the-art and future directions. We first introduce conventional ASO and current challenges. Next, we introduce ML fundamentals and detail ML algorithms that have been successful in ASO. Then, we review ML applications to ASO addressing three aspects: compact geometric design space, fast aerodynamic analysis, and efficient optimization architecture. In addition to providing a comprehensive summary of the research, we comment on the practicality and effectiveness of the developed methods. We show how cutting-edge ML approaches can benefit ASO and address challenging demands, such as interactive design optimization. Practical large-scale design optimizations remain a challenge because of the high cost of ML training. Further research on coupling ML model construction with prior experience and knowledge, such as physics-informed ML, is recommended to solve large-scale ASO problems.

evolutionary algorithm, machine learning, reinforcement learning, (21 more...)

doi: 10.1016/j.paerosci.2022.100849

2202.07141

Country:

Europe (1.00)
Asia (1.00)
North America > United States > California (0.45)

Genre:

Research Report (1.00)
Overview (1.00)
Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Transportation > Air (1.00)
Health & Medicine (1.00)
Energy > Oil & Gas > Upstream (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(4 more...)

Pesce, Luca, Loureiro, Bruno, Krzakala, Florent, Zdeborová, Lenka

Subspace clustering in high-dimensions: Phase transitions & Statistical-to-Computational gap

With the growing size of modern data, clustering techniques play an important role in reducing the dimensionality of the features used in modern Machine Learning pipelines. Indeed, in many tasks of interest ranging from DNA sequence analysis to image classification, the relevant features are known to live in a lower-dimensional space (intrinsic dimension) than their raw acquisition format (extrinsic dimension) [1]. In these cases, identifying these features can help saving computational resources while significantly improving on learning performance. But given a corrupted embedding of low-dimensional features in a high-dimensional space, is it always statistically possible to retrieve them? And if yes - how can reconstruction be achieved efficiently in practice? In this manuscript we address these two fundamental questions in a simple model for subspace clustering: a k-cluster Gaussian mixture model with sparse centroids. In this model, the low-dimensional hidden features are given by the sparse centroids, which are embedded in a higher dimensional space and corrupted by additive Gaussian noise. We assume that the number of non-zero components of the centroids as well as the number of samples scales linearly with the dimension of the embedding space. Given a finite sample from the mixture, the goal of the statistician is to cluster the data, i.e. estimate the centroids (or features) as well as possible.

algorithm, artificial intelligence, machine learning, (14 more...)

2205.13527

Country:

North America > United States (0.28)
Europe > Switzerland > Vaud > Lausanne (0.04)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.54)
Government > Regional Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.87)

Sub-quadratic Algorithms for Kernel Matrices via Kernel Density Estimation

Bakshi, Ainesh, Indyk, Piotr, Kacham, Praneeth, Silwal, Sandeep, Zhou, Samson

Kernel matrices, as well as weighted graphs represented by them, are ubiquitous objects in machine learning, statistics and other related fields. The main drawback of using kernel methods (learning and inference using kernel matrices) is efficiency -- given $n$ input points, most kernel-based algorithms need to materialize the full $n \times n$ kernel matrix before performing any subsequent computation, thus incurring $\Omega(n^2)$ runtime. Breaking this quadratic barrier for various problems has therefore, been a subject of extensive research efforts. We break the quadratic barrier and obtain $\textit{subquadratic}$ time algorithms for several fundamental linear-algebraic and graph processing primitives, including approximating the top eigenvalue and eigenvector, spectral sparsification, solving linear systems, local clustering, low-rank approximation, arboricity estimation and counting weighted triangles. We build on the recent Kernel Density Estimation framework, which (after preprocessing in time subquadratic in $n$) can return estimates of row/column sums of the kernel matrix. In particular, we develop efficient reductions from $\textit{weighted vertex}$ and $\textit{weighted edge sampling}$ on kernel graphs, $\textit{simulating random walks}$ on kernel graphs, and $\textit{importance sampling}$ on matrices to Kernel Density Estimation and show that we can generate samples from these distributions in $\textit{sublinear}$ (in the support of the distribution) time. Our reductions are the central ingredient in each of our applications and we believe they may be of independent interest. We empirically demonstrate the efficacy of our algorithms on low-rank approximation (LRA) and spectral sparsification, where we observe a $\textbf{9x}$ decrease in the number of kernel evaluations over baselines for LRA and a $\textbf{41x}$ reduction in the graph size for spectral sparsification.

algorithm, artificial intelligence, machine learning, (17 more...)

2212.00642

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre:

Workflow (0.67)
Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.81)

Kapp-Joswig, Jan-Oliver Felix, Keller, Bettina G.

Clustering -- Basic concepts and methods

We review clustering as an analysis tool and the underlying concepts from an introductory perspective. What is clustering and how can clusterings be realised programmatically? How can data be represented and prepared for a clustering task? And how can clustering results be validated? Connectivity-based versus prototype-based approaches are reflected in the context of several popular methods: single-linkage, spectral embedding, k-means, and Gaussian mixtures are discussed as well as the density-based protocols (H)DBSCAN, Jarvis-Patrick, CommonNN, and density-peaks.

artificial intelligence, data mining, machine learning, (18 more...)

2212.01248

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > California > Ventura County > Thousand Oaks (0.04)
(3 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Richardson, Ashlin, Leckie, Donald

Locally Adaptive Hierarchical Cluster Termination With Application To Individual Tree Delineation

Abstract--A clustering termination procedure which is locally adaptive (with respect to the hierarchical tree of sets representative of the agglomerative merging) is proposed, for agglomerative hierarchical clustering on a set equipped with a distance function. It represents a multi-scale alternative to conventional scale dependent threshold based termination criteria. We trim the tree at specific locations by studying cumulative extreme values of rates of change of parameters along paths of the agglomeration hierarchy, each path representing the "history" of successive merges with respect to an initial set. Thus the method considers the smallest localities. We refer to this qualitative phenomenon as geometric "paradigm shift".

artificial intelligence, isol, machine learning, (18 more...)

2212.00288

Country: North America > Canada > British Columbia > Vancouver Island > Capital Regional District > Victoria (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)

Roberts, Manley, Mani, Pranav, Garg, Saurabh, Lipton, Zachary C.

Unsupervised Learning under Latent Label Shift

What sorts of structure might enable a learner to discover classes from unlabeled data? Traditional approaches rely on feature-space similarity and heroic assumptions on the data. In this paper, we introduce unsupervised learning under Latent Label Shift (LLS), where we have access to unlabeled data from multiple domains such that the label marginals $p_d(y)$ can shift across domains but the class conditionals $p(\mathbf{x}|y)$ do not. This work instantiates a new principle for identifying classes: elements that shift together group together. For finite input spaces, we establish an isomorphism between LLS and topic modeling: inputs correspond to words, domains to documents, and labels to topics. Addressing continuous data, we prove that when each label's support contains a separable region, analogous to an anchor word, oracle access to $p(d|\mathbf{x})$ suffices to identify $p_d(y)$ and $p_d(y|\mathbf{x})$ up to permutation. Thus motivated, we introduce a practical algorithm that leverages domain-discriminative models as follows: (i) push examples through domain discriminator $p(d|\mathbf{x})$; (ii) discretize the data by clustering examples in $p(d|\mathbf{x})$ space; (iii) perform non-negative matrix factorization on the discrete data; (iv) combine the recovered $p(y|d)$ with the discriminator outputs $p(d|\mathbf{x})$ to compute $p_d(y|x) \; \forall d$. With semi-synthetic experiments, we show that our algorithm can leverage domain information to improve upon competitive unsupervised classification methods. We reveal a failure mode of standard unsupervised classification methods when feature-space similarity does not indicate true groupings, and show empirically that our method better handles this case. Our results establish a deep connection between distribution shift and topic modeling, opening promising lines for future work.

artificial intelligence, ddfa, machine learning, (15 more...)

2207.13179

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > Alameda County > Oakland (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Arnaiz-Rodriguez, Adrian, Begga, Ahmed, Escolano, Francisco, Oliver, Nuria

DiffWire: Inductive Graph Rewiring via the Lov\'asz Bound

arXiv.org Artificial IntelligenceNov-30-2022

Graph Neural Networks (GNNs) have been shown to achieve competitive results to tackle graph-related tasks, such as node and graph classification, link prediction and node and graph clustering in a variety of domains. Most GNNs use a message passing framework and hence are called MPNNs. Despite their promising results, MPNNs have been reported to suffer from over-smoothing, over-squashing and under-reaching. Graph rewiring and graph pooling have been proposed in the literature as solutions to address these limitations. However, most state-of-the-art graph rewiring methods fail to preserve the global topology of the graph, are neither differentiable nor inductive, and require the tuning of hyper-parameters. In this paper, we propose DiffWire, a novel framework for graph rewiring in MPNNs that is principled, fully differentiable and parameter-free by leveraging the Lov\'asz bound. The proposed approach provides a unified theory for graph rewiring by proposing two new, complementary layers in MPNNs: CT-Layer, a layer that learns the commute times and uses them as a relevance function for edge re-weighting; and GAP-Layer, a layer to optimize the spectral gap, depending on the nature of the network and the task at hand. We empirically validate the value of each of these layers separately with benchmark datasets for graph classification. We also perform preliminary studies on the use of CT-Layer for homophilic and heterophilic node classification tasks. DiffWire brings together the learnability of commute times to related definitions of curvature, opening the door to creating more expressive MPNNs.

artificial intelligence, graph, machine learning, (16 more...)

2206.07369

Country:

Europe > Spain (0.14)
North America > United States > Wisconsin (0.04)

Genre:

Research Report (0.82)
Overview (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Sitokonstantinou, Vasileios, Koukos, Alkiviadis, Tsoumas, Ilias, Bartsotas, Nikolaos S., Kontoes, Charalampos, Karathanassi, Vassilia

Fuzzy clustering for the within-season estimation of cotton phenology

arXiv.org Artificial IntelligenceNov-30-2022

Crop phenology is crucial information for crop yield estimation and agricultural management. Traditionally, phenology has been observed from the ground; however Earth observation, weather and soil data have been used to capture the physiological growth of crops. In this work, we propose a new approach for the within-season phenology estimation for cotton at the field level. For this, we exploit a variety of Earth observation vegetation indices (derived from Sentinel-2) and numerical simulations of atmospheric and soil parameters. Our method is unsupervised to address the ever-present problem of sparse and scarce ground truth data that makes most supervised alternatives impractical in real-world scenarios. We applied fuzzy c-means clustering to identify the principal phenological stages of cotton and then used the cluster membership weights to further predict the transitional phases between adjacent stages. In order to evaluate our models, we collected 1,285 crop growth ground observations in Orchomenos, Greece. We introduced a new collection protocol, assigning up to two phenology labels that represent the primary and secondary growth stage in the field and thus indicate when stages are transitioning. Our model was tested against a baseline model that allowed to isolate the random agreement and evaluate its true competence. The results showed that our model considerably outperforms the baseline one, which is promising considering the unsupervised nature of the approach. The limitations and the relevant future work are thoroughly discussed. The ground observations are formatted in an ready-to-use dataset and will be available at https://github.com/Agri-Hub/cotton-phenology-dataset upon publication.

artificial intelligence, machine learning, phenology, (18 more...)

2211.14099

Country:

Europe > Greece > Attica > Athens (0.04)
Asia > India (0.04)
Asia > China (0.04)
(7 more...)

Genre: Research Report > New Finding (0.68)

Industry: Food & Agriculture > Agriculture (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

arXiv.org Artificial IntelligenceNov-30-2022

High-Dimensional Wide Gap $k$-Means Versus Clustering Axioms

Kłopotek, Mieczysław A.

Kleinberg's axioms for distance based clustering proved to be contradictory. Various efforts have been made to overcome this problem. Here we make an attempt to handle the issue by embedding in high-dimensional space and granting wide gaps between clusters.

algorithm, artificial intelligence, machine learning, (18 more...)

2211.17036

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > Italy (0.04)

Genre: Research Report (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)