AITopics

1205.1828

Country: Asia (0.15)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.39)

Qian, Jing, Saligrama, Venkatesh, Zhao, Manqi

Graph-based Learning with Unbalanced Clusters

arXiv.org Machine LearningMay-8-2012

Graph construction is a crucial step in spectral clustering (SC) and graph-based semi-supervised learning (SSL). Spectral methods applied on standard graphs such as full-RBF, $\epsilon$-graphs and $k$-NN graphs can lead to poor performance in the presence of proximal and unbalanced data. This is because spectral methods based on minimizing RatioCut or normalized cut on these graphs tend to put more importance on balancing cluster sizes over reducing cut values. We propose a novel graph construction technique and show that the RatioCut solution on this new graph is able to handle proximal and unbalanced data. Our method is based on adaptively modulating the neighborhood degrees in a $k$-NN graph, which tends to sparsify neighborhoods in low density regions. Our method adapts to data with varying levels of unbalancedness and can be naturally used for small cluster detection. We justify our ideas through limit cut analysis. Unsupervised and semi-supervised experiments on synthetic and real data sets demonstrate the superiority of our method.

artificial intelligence, graph-based learning, machine learning, (17 more...)

1205.1496

Country: North America > United States (0.28)

Genre:

Research Report (0.64)
Workflow (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Abdolali, Behrouz, Sameti, Hossein

A Novel Method For Speech Segmentation Based On Speakers' Characteristics

arXiv.org Artificial IntelligenceMay-8-2012

Speech Segmentation is the process change point detection for partitioning an input audio stream into regions each of which corresponds to only one audio source or one speaker. One application of this system is in Speaker Diarization systems. There are several methods for speaker segmentation; however, most of the Speaker Diarization Systems use BIC-based Segmentation methods. The main goal of this paper is to propose a new method for speaker segmentation with higher speed than the current methods - e.g. BIC - and acceptable accuracy. Our proposed method is based on the pitch frequency of the speech. The accuracy of this method is similar to the accuracy of common speaker segmentation methods. However, its computation cost is much less than theirs. We show that our method is about 2.4 times faster than the BIC-based method, while the average accuracy of pitch-based method is slightly higher than that of the BIC-based method.

artificial intelligence, change point, machine learning, (16 more...)

doi: 10.5121/sipij.2012.3205

1205.1794

Country: North America > United States (0.93)

Genre: Research Report > Promising Solution (0.40)

Industry: Media (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.96)
Information Technology > Artificial Intelligence > Speech > Acoustic Processing (0.71)

Plu, Julien, Scharffe, François

Publishing and linking transport data on the Web

arXiv.org Artificial IntelligenceMay-8-2012

Without Linked Data, transport data is limited to applications exclusively around transport. In this paper, we present a workflow for publishing and linking transport data on the Web. So we will be able to develop transport applications and to add other features which will be created from other datasets. This will be possible because transport data will be linked to these datasets. We apply this workflow to two datasets: NEPTUNE, a French standard describing a transport line, and Passim, a directory containing relevant information on transport services, in every French city.

artificial intelligence, information management, ontology, (18 more...)

1205.1645

Country: Europe > France > Provence-Alpes-Côte d'Azur (0.28)

Genre:

Workflow (0.54)
Research Report (0.40)

Industry: Transportation > Infrastructure & Services (0.70)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications > Web > Semantic Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)

S, Aji, Kaimal, Ramachandra

Document summarization using positive pointwise mutual information

arXiv.org Artificial IntelligenceMay-8-2012

The degree of success in document summarization processes depends on the performance of the method used in identifying significant sentences in the documents. The collection of unique words characterizes the major signature of the document, and forms the basis for Term-Sentence-Matrix (TSM). The Positive Pointwise Mutual Information, which works well for measuring semantic similarity in the Term-Sentence-Matrix, is used in our method to assign weights for each entry in the Term-Sentence-Matrix. The Sentence-Rank-Matrix generated from this weighted TSM, is then used to extract a summary from the document. Our experiments show that such a method would outperform most of the existing methods in producing summaries from large documents.

artificial intelligence, natural language, text processing, (11 more...)

1205.1638

Country:

Asia (0.70)
North America > United States > Louisiana (0.14)
North America > Canada > British Columbia (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.49)

Obozinski, Guillaume, Bach, Francis

Convex Relaxation for Combinatorial Penalties

arXiv.org Machine LearningMay-6-2012

In this paper, we propose an unifying view of several recently proposed structured sparsity-inducing norms. We consider the situation of a model simultaneously (a) penalized by a set- function de ned on the support of the unknown parameter vector which represents prior knowledge on supports, and (b) regularized in Lp-norm. We show that the natural combinatorial optimization problems obtained may be relaxed into convex optimization problems and introduce a notion, the lower combinatorial envelope of a set-function, that characterizes the tightness of our relaxations. We moreover establish links with norms based on latent representations including the latent group Lasso and block-coding, and with norms obtained from submodular functions.

artificial intelligence, optimization problem, relaxation, (16 more...)

1205.124

Country: Europe (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.86)

Haury, Anne-Claire, Mordelet, Fantine, Vera-Licona, Paola, Vert, Jean-Philippe

TIGRESS: Trustful Inference of Gene REgulation using Stability Selection

arXiv.org Machine LearningMay-6-2012

Inferring the structure of gene regulatory networks (GRN) from gene expression data has many applications, from the elucidation of complex biological processes to the identification of potential drug targets. It is however a notoriously difficult problem, for which the many existing methods reach limited accuracy. In this paper, we formulate GRN inference as a sparse regression problem and investigate the performance of a popular feature selection method, least angle regression (LARS) combined with stability selection. We introduce a novel, robust and accurate scoring technique for stability selection, which improves the performance of feature selection with LARS. The resulting method, which we call TIGRESS (Trustful Inference of Gene REgulation using Stability Selection), was ranked among the top methods in the DREAM5 gene network reconstruction challenge. We investigate in depth the influence of the various parameters of the method and show that a fine parameter tuning can lead to significant improvements and state-of-the-art performance for GRN inference. TIGRESS reaches state-of-the-art performance on benchmark data. This study confirms the potential of feature selection techniques for GRN inference. Code and data are available on http://cbio.ensmp.fr/~ahaury. Running TIGRESS online is possible on GenePattern: http://www.broadinstitute.org/cancer/software/genepattern/.

artificial intelligence, machine learning, regulation, (17 more...)

1205.1181

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Maseleno, Andino, Hasan, Md. Mahmud

Poultry Diseases Expert System using Dempster-Shafer Theory

arXiv.org Artificial IntelligenceMay-6-2012

Based on World Health Organization (WHO) fact sheet in the 2011, outbreaks of poultry diseases especially Avian Influenza in poultry may raise global public health concerns due to their effect on poultry populations, their potential to cause serious disease in people, and their pandemic potential. In this research, we built a Poultry Diseases Expert System using Dempster-Shafer Theory. In this Poultry Diseases Expert System We describe five symptoms which include depression, combs, wattle, bluish face region, swollen face region, narrowness of eyes, and balance disorders. The result of the research is that Poultry Diseases Expert System has been successfully identifying poultry diseases.

artificial intelligence, dempster-shafer theory, poultry disease expert system

1205.0243

Genre: Research Report (0.69)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)

Kim, Dongwoo, Chung, Yeonseung, Oh, Alice

Variable Selection for Latent Dirichlet Allocation

arXiv.org Machine LearningMay-3-2012

In latent Dirichlet allocation (LDA), topics are multinomial distributions over the entire vocabulary. However, the vocabulary usually contains many words that are not relevant in forming the topics. We adopt a variable selection method widely used in statistical modeling as a dimension reduction tool and combine it with LDA. In this variable selection model for LDA (vsLDA), topics are multinomial distributions over a subset of the vocabulary, and by excluding words that are not informative for finding the latent topic structure of the corpus, vsLDA finds topics that are more robust and discriminative. We compare three models, vsLDA, LDA with symmetric priors, and LDA with asymmetric priors, on heldout likelihood, MCMC chain consistency, and document classification. The performance of vsLDA is better than symmetric LDA for likelihood and classification, better than asymmetric LDA for consistency and classification, and about the same in the other comparisons.

artificial intelligence, machine learning, natural language, (16 more...)

1205.1053

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.88)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Bornn, Luke, Caron, François

Bayesian clustering in decomposable graphs

arXiv.org Machine LearningMay-3-2012

This paper is concerned with the inference of the conditional independence graph G of a multivariate random vector Y of dimension n, a problem sometimes referred to as structure learning. We focus here on undirected decomposable graphs, whose popularity is mainly due to the tractable factorization they allow for the likelihood ([9, 20]); related work for directed graphical models can be found in [18]. Learning the conditional 1 independence graph G is an onerous task due to the large number of graphs on a set of n nodes, or variables. It is possible using optimization methods to find the graph which best fits the data according to some metric [23, 30, 13]; alternatively Bayesian model averaging may be used to accommodate for uncertainty in the estimated graph, or maximum a posteriori estimation may be used to select a given model from the posterior over graphs. Such an approach relies on a prior distribution π(G) over the set of decomposable graphs of a given size; through Bayes theorem, this prior is updated based on the data to give an a posteriori estimate of the distribution over graphs.

artificial intelligence, bayesian inference, machine learning, (17 more...)

1005.5081

Country: North America > United States (1.00)

Genre: Research Report (0.50)

Industry:

Food & Agriculture > Agriculture (1.00)
Government > Regional Government > North America Government > United States Government (0.68)
Government > Voting & Elections (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)