AITopics

doi: 10.1109/TITS.2018.2877572

1810.09568

Country: North America > United States > California (0.68)

Genre: Research Report (0.50)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Air (1.00)
Aerospace & Defense > Aircraft (1.00)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)

arXiv.org Artificial IntelligenceOct-22-2018

Biomedical Document Clustering and Visualization based on the Concepts of Diseases

Shah, Setu, Luo, Xiao

Document clustering is a text mining technique used to provide better document search and browsing in digital libraries or online corpora. A lot of research has been done on biomedical document clustering that is based on using existing ontology. But, associations and co-occurrences of the medical concepts are not well represented by using ontology. In this research, a vector representation of concepts of diseases and similarity measurement between concepts are proposed. They identify the closest concepts of diseases in the context of a corpus. Each document is represented by using the vector space model. A weight scheme is proposed to consider both local content and associations between concepts. A Self-Organizing Map is used as document clustering algorithm. The vector projection and visualization features of SOM enable visualization and analysis of the clusters distributions and relationships on the two dimensional space. The experimental results show that the proposed document clustering framework generates meaningful clusters and facilitate visualization of the clusters based on the concepts of diseases.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

1810.09597

Country: North America > United States > Indiana (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Duan, Leo L, Dunson, David B

Bayesian Distance Clustering

arXiv.org Machine LearningOct-19-2018

Model-based clustering is widely-used in a variety of application areas. However, fundamental concerns remain about robustness. In particular, results can be sensitive to the choice of kernel representing the within-cluster data density. Leveraging on properties of pairwise differences between data points, we propose a class of Bayesian distance clustering methods, which rely on modeling the likelihood of the pairwise distances in place of the original data. Although some information in the data is discarded, we gain substantial robustness to modeling assumptions. The proposed approach represents an appealing middle ground between distance- and model-based clustering, drawing advantages from each of these canonical approaches. We illustrate dramatic gains in the ability to infer clusters that are not well represented by the usual choices of kernel. A simulation study is included to assess performance relative to competitors, and we apply the approach to clustering of brain genome expression data. Keywords: Distance-based clustering; Mixture model; Model-based clustering; Model misspecification; Pairwise distance matrix; Partial likelihood; Robustness.

artificial intelligence, likelihood, machine learning, (16 more...)

1810.08537

Country: North America (0.46)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

#artificialintelligenceOct-17-2018, 15:38:11 GMT

Customer Profiling and Segmentation in Python An Overview & Demo

While most marketing managers understand that all customers have different preferences, these differences still tend to raise quite a challenge when it comes time to develop new offers. Not every product or service that your company makes will be right for every customer, nor will every customer be equally responsive to each of your company's marketing campaigns. That's why when I prepare custom training plans, I usually recommend that my clients get familiar with how they can use customer profiling and segmentation to organize their customer base into different groups. Simply put, segmentation is a way of organizing your customer base into groups. For marketing purposes, these groups are formed on the basis of people having similar product or service preferences, although segments can be constructed on any variety of other factors.

artificial intelligence, customer, machine learning, (15 more...)

#artificialintelligence

Genre: Questionnaire & Opinion Survey (0.71)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.51)

A Disease Diagnosis and Treatment Recommendation System Based on Big Data Mining and Cloud Computing

Chen, Jianguo, Li, Kenli, Rong, Huigui, Bilal, Kashif, Yang, Nan, Li, Keqin

It is crucial to provide compatible treatment schemes for a disease according to various symptoms at different stages. However, most classification methods might be ineffective in accurately classifying a disease that holds the characteristics of multiple treatment stages, various symptoms, and multi-pathogenesis. Moreover, there are limited exchanges and cooperative actions in disease diagnoses and treatments between different departments and hospitals. Thus, when new diseases occur with atypical symptoms, inexperienced doctors might have difficulty in identifying them promptly and accurately. Therefore, to maximize the utilization of the advanced medical technology of developed hospitals and the rich medical knowledge of experienced doctors, a Disease Diagnosis and Treatment Recommendation System (DDTRS) is proposed in this paper. First, to effectively identify disease symptoms more accurately, a Density-Peaked Clustering Analysis (DPCA) algorithm is introduced for disease-symptom clustering. In addition, association analyses on Disease-Diagnosis (D-D) rules and Disease-Treatment (D-T) rules are conducted by the Apriori algorithm separately. The appropriate diagnosis and treatment schemes are recommended for patients and inexperienced doctors, even if they are in a limited therapeutic environment. Moreover, to reach the goals of high performance and low latency response, we implement a parallel solution for DDTRS using the Apache Spark cloud platform. Extensive experimental results demonstrate that the proposed DDTRS realizes disease-symptom clustering effectively and derives disease treatment recommendations intelligently and accurately.

artificial intelligence, data mining, machine learning, (20 more...)

1810.07762

Country:

Asia (0.46)
North America > United States (0.45)

Genre: Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
(5 more...)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.70)

A Self-Organizing Tensor Architecture for Multi-View Clustering

He, Lifang, Lu, Chun-ta, Chen, Yong, Zhang, Jiawei, Shen, Linlin, Yu, Philip S., Wang, Fei

In many real-world applications, data are often unlabeled and comprised of different representations/views which often provide information complementary to each other. Although several multi-view clustering methods have been proposed, most of them routinely assume one weight for one view of features, and thus inter-view correlations are only considered at the view-level. These approaches, however, fail to explore the explicit correlations between features across multiple views. In this paper, we introduce a tensor-based approach to incorporate the higher-order interactions among multiple views as a tensor structure. Specifically, we propose a multi-linear multi-view clustering (MMC) method that can efficiently explore the full-order structural information among all views and reveal the underlying subspace structure embedded within the tensor. Extensive experiments on real-world datasets demonstrate that our proposed MMC algorithm clearly outperforms other related state-of-the-art methods.

artificial intelligence, machine learning, matrix, (17 more...)

1810.07874

Country: North America > United States (0.28)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.92)

Ruffini, Matteo, Rabusseau, Guillaume, Balle, Borja

Hierarchical Methods of Moments

Spectral methods of moments provide a powerful tool for learning the parameters of latent variable models. Despite their theoretical appeal, the applicability of these methods to real data is still limited due to a lack of robustness to model misspecification. In this paper we present a hierarchical approach to methods of moments to circumvent such limitations. Our method is based on replacing the tensor decomposition step used in previous algorithms with approximate joint diagonalization. Experiments on topic modeling show that our method outperforms previous tensor decomposition methods in terms of speed and model quality.

algorithm, artificial intelligence, machine learning, (16 more...)

1810.07468

Country: North America > United States > California (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Raffo, Andrea, Barrowclough, Oliver J. D., Muntingh, Georg

Reverse engineering of CAD models via clustering and approximate implicitization

In applications like computer aided design, geometric models are often represented numerically as polynomial splines or NURBS, even when they originate from primitive geometry. For purposes such as redesign and isogeometric analysis, it is of interest to extract information about the underlying geometry through reverse engineering. In this work we develop a novel method to determine these primitive shapes by combining clustering analysis with approximate implicitization. The proposed method is automatic and can recover algebraic hypersurfaces of any degree in any dimension. In exact arithmetic, the algorithm returns exact results. All the required parameters, such as the implicit degree of the patches and the number of clusters of the model, are inferred using numerical approaches in order to obtain an algorithm that requires as little manual input as possible. The effectiveness, efficiency and robustness of the method are shown both in a theoretical analysis and in numerical examples implemented in Python.

artificial intelligence, implicitization, machine learning, (16 more...)

1810.07451

Country: Europe (0.67)

Genre: Research Report > Promising Solution (0.34)

Industry: Information Technology > Software (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)

Balcan, Maria-Florina, Nagarajan, Vaishnavh, Vitercik, Ellen, White, Colin

Learning-Theoretic Foundations of Algorithm Configuration for Combinatorial Partitioning Problems

arXiv.org Artificial IntelligenceOct-16-2018

Max-cut, clustering, and many other partitioning problems that are of significant importance to machine learning and other scientific fields are NP-hard, a reality that has motivated researchers to develop a wealth of approximation algorithms and heuristics. Although the best algorithm to use typically depends on the specific application domain, a worst-case analysis is often used to compare algorithms. This may be misleading if worst-case instances occur infrequently, and thus there is a demand for optimization methods which return the algorithm configuration best suited for the given application's typical inputs. We address this problem for clustering, max-cut, and other partitioning problems, such as integer quadratic programming, by designing computationally efficient and sample efficient learning algorithms which receive samples from an application-specific distribution over problem instances and learn a partitioning algorithm with high expected performance. Our algorithms learn over common integer quadratic programming and clustering algorithm families: SDP rounding algorithms and agglomerative clustering algorithms with dynamic programming. For our sample complexity analysis, we provide tight bounds on the pseudodimension of these algorithm classes, and show that surprisingly, even for classes of algorithms parameterized by a single parameter, the pseudo-dimension is superconstant. In this way, our work both contributes to the foundations of algorithm configuration and pushes the boundaries of learning theory, since the algorithm classes we analyze consist of multi-stage optimization procedures and are significantly more complex than classes typically studied in learning theory.

algorithm, artificial intelligence, machine learning, (20 more...)

arXiv.org Artificial Intelligence

1611.04535

Genre:

Research Report (0.63)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Maggioni, Mauro, Murphy, James M.

Learning by Unsupervised Nonlinear Diffusion

arXiv.org Machine LearningOct-15-2018

This paper proposes and analyzes a novel clustering algorithm that combines graph-based diffusion geometry with density estimation. The proposed method is suitable for data generated from mixtures of distributions with densities that are both multimodal and have nonlinear shapes. A crucial aspect of this algorithm is to introduce time of a data-adapted diffusion process as a scale parameter that is different from the local spatial scale parameter used in many clustering and learning algorithms. We prove estimates for the behavior of diffusion distances with respect to this time parameter under a flexible nonparametric data model, identifying a range of times in which the mesoscopic equilibria of the underlying process are revealed, corresponding to a gap between within-cluster and between-cluster diffusion distances. This analysis is leveraged to prove sufficient conditions guaranteeing the accuracy of the proposed learning by unsupervised nonlinear diffusion (LUND) algorithm. We implement the LUND algorithm numerically and confirm its theoretical properties on illustrative datasets, showing that the proposed method enjoys both theoretical and empirical advantages over current spectral clustering and density-based clustering techniques.

artificial intelligence, diffusion distance, machine learning, (18 more...)

1810.06702

Country: North America > United States (0.67)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)