AITopics

Technology:

Information Technology > Data Science > Data Mining (0.75)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.57)

Zheng, Peng, Aravkin, Aleksandr Y., Ramamurthy, Karthikeyan Natesan, Thiagarajan, Jayaraman Jayaraman

Learning Robust Representations for Computer Vision

arXiv.org Machine LearningJul-31-2017

Unsupervised learning techniques in computer vision often require learning latent representations, such as low-dimensional linear and non-linear subspaces. Noise and outliers in the data can frustrate these approaches by obscuring the latent spaces. Our main goal is deeper understanding and new development of robust approaches for representation learning. We provide a new interpretation for existing robust approaches and present two specific contributions: a new robust PCA approach, which can separate foreground features from dynamic background, and a novel robust spectral clustering method, that can cluster facial images with high accuracy. Both contributions show superior performance to standard methods on real-world test sets.

artificial intelligence, machine learning, representation, (15 more...)

1708.00069

Country: North America > United States > Washington > King County > Seattle (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)

Revillon, Guillaume, Mohammad-Djafari, Ali, Enderli, Cyrille

A generalized multivariate Student-t mixture model for Bayesian classification and clustering of radar waveforms

arXiv.org Machine LearningJul-29-2017

In this paper, a generalized multivariate Student-t mixture model is developed for classification and clustering of Low Probability of Intercept radar waveforms. A Low Probability of Intercept radar signal is characterized by a pulse compression waveform which is either frequency-modulated or phase-modulated. The proposed model can classify and cluster different modulation types such as linear frequency modulation, non linear frequency modulation, polyphase Barker, polyphase P1, P2, P3, P4, Frank and Zadoff codes. The classification method focuses on the introduction of a new prior distribution for the model hyper-parameters that gives us the possibility to handle sensitivity of mixture models to initialization and to allow a less restrictive modeling of data. Inference is processed through a Variational Bayes method and a Bayesian treatment is adopted for model learning, supervised classification and clustering. Moreover, the novel prior distribution is not a well-known probability distribution and both deterministic and stochastic methods are employed to estimate its expectations. Some numerical experiments show that the proposed method is less sensitive to initialization and provides more accurate results than the previous state of the art mixture models.

mixture model, posterior distribution, student-t mixture model, (15 more...)

1707.09548

Country:

Europe > France (0.04)
North America > United States > New York > Suffolk County > Deer Park (0.04)
North America > United States > Massachusetts > Norfolk County > Dedham (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Campbell, Trevor, Kulis, Brian, How, Jonathan

Dynamic Clustering Algorithms via Small-Variance Analysis of Markov Chain Mixture Models

arXiv.org Machine LearningJul-26-2017

Bayesian nonparametrics are a class of probabilistic models in which the model size is inferred from data. A recently developed methodology in this field is small-variance asymptotic analysis, a mathematical technique for deriving learning algorithms that capture much of the flexibility of Bayesian nonparametric inference algorithms, but are simpler to implement and less computationally expensive. Past work on small-variance analysis of Bayesian nonparametric inference algorithms has exclusively considered batch models trained on a single, static dataset, which are incapable of capturing time evolution in the latent structure of the data. This work presents a small-variance analysis of the maximum a posteriori filtering problem for a temporally varying mixture model with a Markov dependence structure, which captures temporally evolving clusters within a dataset. Two clustering algorithms result from the analysis: D-Means, an iterative clustering algorithm for linearly separable, spherical clusters; and SD-Means, a spectral clustering algorithm derived from a kernelized, relaxed version of the clustering problem. Empirical results from experiments demonstrate the advantages of using D-Means and SD-Means over contemporary clustering algorithms, in terms of both computational cost and clustering accuracy.

artificial intelligence, data mining, machine learning, (20 more...)

1707.08493

Country: North America > United States > Massachusetts (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.82)

#artificialintelligenceJul-25-2017, 19:41:30 GMT

How Machines Make Sense of Big Data: an Introduction to Clustering Algorithms

While there's not necessarily a "correct" answer here, it's most likely you split the bugs into four clusters. That wasn't too bad, was it? You could probably do the same with twice as many bugs, right? If you had a bit of time to spare -- or a passion for entomology -- you could probably even do the same with a hundred bugs. For a machine though, grouping ten objects into however many meaningful clusters is no small task, thanks to a mind-bending branch of maths called combinatorics, which tells us that are 115,975 different possible ways you could have grouped those ten insects together. Had there been twenty bugs, there would have been over fifty trillion possible ways of clustering them. With a hundred bugs -- there'd be many times more solutions than there are particles in the known universe. In fact, there are more than four million billion googol solutions (what's a googol?).

data mining, machine learning, vertex, (20 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

#artificialintelligenceJul-24-2017, 05:35:28 GMT

R Clustering – A Tutorial for Cluster Analysis with R

Clustering is a data segmentation technique that divides huge datasets into different groups on the basis of similarity in the data. It is a statistical operation of grouping objects. The resulting groups are clusters.

artificial intelligence, clustering, machine learning, (15 more...)

Country: Europe (0.05)

Genre: Instructional Material > Course Syllabus & Notes (0.40)

Industry: Banking & Finance (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.53)

Haslbeck, Jonas M. B., Wulff, Dirk U.

Estimating the Number of Clusters via Normalized Cluster Instability

arXiv.org Machine LearningJul-24-2017

We improve existing instability-based methods for the selection of the number of clusters $k$ in cluster analysis by normalizing instability. In contrast to existing instability methods which only perform well for bounded sequences of small $k$, our method performs well across the whole sequence of possible $k$. In addition, we compare for the first time model-based and model-free variants of $k$ selection via cluster instability and find that their performance is similar. We make our method available in the R-package \verb+cstab+.

artificial intelligence, instability, machine learning, (16 more...)

1608.07494

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

#artificialintelligenceJul-14-2017, 21:41:02 GMT

Which Spark machine learning API should you use?

Remember, just because you get the algorithm to run doesn't mean the result isn't nonsense. If you're new to all of this, then the Machine Learning Foundations course on Coursera is a good place to start -- despite the creepy floating half-professor.

artificial intelligence, classification, machine learning, (9 more...)

Country: Asia > China (0.05)

Industry:

Education (0.97)
Automobiles & Trucks > Manufacturer (0.52)
Transportation > Passenger (0.40)
Transportation > Ground > Road (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.30)

arXiv.org Machine LearningJul-12-2017

Sequential geophysical and flow inversion to characterize fracture networks in subsurface systems

Mudunuru, M. K., Karra, S., Makedonska, N., Chen, T.

Subsurface applications including geothermal, geological carbon sequestration, oil and gas, etc., typically involve maximizing either the extraction of energy or the storage of fluids. Characterizing the subsurface is extremely complex due to heterogeneity and anisotropy. Due to this complexity, there are uncertainties in the subsurface parameters, which need to be estimated from multiple diverse as well as fragmented data streams. In this paper, we present a non-intrusive sequential inversion framework, for integrating data from geophysical and flow sources to constraint subsurface Discrete Fracture Networks (DFN). In this approach, we first estimate bounds on the statistics for the DFN fracture orientations using microseismic data. These bounds are estimated through a combination of a focal mechanism (physics-based approach) and clustering analysis (statistical approach) of seismic data. Then, the fracture lengths are constrained based on the flow data. The efficacy of this multi-physics based sequential inversion is demonstrated through a representative synthetic example.

fracture, survey article, upstream oil & gas, (18 more...)

doi: 10.1002/sam.11356

1606.04464

Country:

North America > United States > Texas (0.28)
North America > United States > New Mexico > Los Alamos County (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
(7 more...)

Genre: Research Report (0.82)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)

arXiv.org Machine LearningJul-12-2017

Density Level Set Estimation on Manifolds with DBSCAN

Jiang, Heinrich

We show that DBSCAN can estimate the connected components of the $\lambda$-density level set $\{ x : f(x) \ge \lambda\}$ given $n$ i.i.d. samples from an unknown density $f$. We characterize the regularity of the level set boundaries using parameter $\beta > 0$ and analyze the estimation error under the Hausdorff metric. When the data lies in $\mathbb{R}^D$ we obtain a rate of $\widetilde{O}(n^{-1/(2\beta + D)})$, which matches known lower bounds up to logarithmic factors. When the data lies on an embedded unknown $d$-dimensional manifold in $\mathbb{R}^D$, then we obtain a rate of $\widetilde{O}(n^{-1/(2\beta + d\cdot \max\{1, \beta \})})$. Finally, we provide adaptive parameter tuning in order to attain these rates with no a priori knowledge of the intrinsic dimension, density, or $\beta$.

artificial intelligence, data mining, machine learning, (15 more...)

1703.03503

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)