AITopics | Genre

Collaborating Authors

Genre

Benefits of Semantics on Web Service Composition from a Complex Network Perspective

Cherifi, Chantal, Labatut, Vincent, Santucci, Jean-François

arXiv.org Artificial IntelligenceMay-1-2013

The number of publicly available Web services (WS) is continuously growing, and in parallel, we are witnessing a rapid development in semantic-related web technologies. The intersection of the semantic web and WS allows the development of semantic WS. In this work, we adopt a complex network perspective to perform a comparative analysis of the syntactic and semantic approaches used to describe WS. From a collection of publicly available WS descriptions, we extract syntactic and semantic WS interaction networks. We take advantage of tools from the complex network field to analyze them and determine their properties. We show that WS interaction networks exhibit some of the typical characteristics observed in real-world networks, such as short average distance between nodes and community structure. By comparing syntactic and semantic networks through their properties, we show the introduction of semantics in WS descriptions should improve the composition process.

artificial intelligence, opération, semantic web, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-642-14306-9_9

1305.0191

Country:

Europe (0.68)
North America > United States (0.68)
Asia > Middle East > Republic of Türkiye (0.14)

Genre: Research Report (0.50)

Industry: Media (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications > Web > Semantic Web (0.92)

Add feedback

Semi-Supervised Information-Maximization Clustering

Calandriello, Daniele, Niu, Gang, Sugiyama, Masashi

arXiv.org Machine LearningMay-1-2013

Semi-supervised clustering aims to introduce prior knowledge in the decision process of a clustering algorithm. In this paper, we propose a novel semi-supervised clustering algorithm based on the information-maximization principle. The proposed method is an extension of a previous unsupervised information-maximization clustering algorithm based on squared-loss mutual information to effectively incorporate must-links and cannot-links. The proposed method is computationally efficient because the clustering solution can be obtained analytically via eigendecomposition. Furthermore, the proposed method allows systematic optimization of tuning parameters such as the kernel width, given the degree of belief in the must-links and cannot-links. The usefulness of the proposed method is demonstrated through experiments.

data mining, information, machine learning, (18 more...)

arXiv.org Machine Learning

1304.802

Country:

North America > United States (0.95)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Joint Training Deep Boltzmann Machines for Classification

Goodfellow, Ian J., Courville, Aaron, Bengio, Yoshua

arXiv.org Machine LearningMay-1-2013

We introduce a new method for training deep Boltzmann machines jointly. Prior methods of training DBMs require an initial learning pass that trains the model greedily, one layer at a time, or do not perform well on classification tasks. In our approach, we train all layers of the DBM simultaneously, using a novel training procedure called multi-prediction training. The resulting model can either be interpreted as a single generative model trained to maximize a variational approximation to the generalized pseudolikelihood, or as a family of recurrent networks that share parameters and may be approximately averaged together using a novel technique we call the multi-inference trick. We show that our approach performs competitively for classification and outperforms previous methods in terms of accuracy of approximate inference and classification with missing inputs.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

1301.3568

Country: North America > United States (0.14)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.73)

Add feedback

NuMVC: An Efficient Local Search Algorithm for Minimum Vertex Cover

Cai, S., Su, K., Luo, C., Sattar, A.

Journal of Artificial Intelligence ResearchApr-30-2013

The Minimum Vertex Cover (MVC) problem is a prominent NP-hard combinatorial optimization problem of great importance in both theory and application. Local search has proved successful for this problem. However, there are two main drawbacks in state-of-the-art MVC local search algorithms. First, they select a pair of vertices to exchange simultaneously, which is time-consuming. Secondly, although using edge weighting techniques to diversify the search, these algorithms lack mechanisms for decreasing the weights. To address these issues, we propose two new strategies: two-stage exchange and edge weighting with forgetting. The two-stage exchange strategy selects two vertices to exchange separately and performs the exchange in two stages. The strategy of edge weighting with forgetting not only increases weights of uncovered edges, but also decreases some weights for each edge periodically. These two strategies are used in designing a new MVC local search algorithm, which is referred to as NuMVC. We conduct extensive experimental studies on the standard benchmarks, namely DIMACS and BHOSLIB. The experiment comparing NuMVC with state-of-the-art heuristic algorithms show that NuMVC is at least competitive with the nearest competitor namely PLS on the DIMACS benchmark, and clearly dominates all competitors on the BHOSLIB benchmark. Also, experimental results indicate that NuMVC finds an optimal solution much faster than the current best exact algorithm for Maximum Clique on random instances as well as some structured ones. Moreover, we study the effectiveness of the two strategies and the run-time behaviour through experimental analysis.

algorithm, benchmark, numvc, (15 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.3907

AI Access Foundation

10812

Journal of Artificial Intelligence Research

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Oceania > Australia > Queensland > Brisbane (0.04)
Asia > China > Beijing > Beijing (0.04)
(3 more...)

Genre: Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)

Add feedback

Revealing social networks of spammers through spectral clustering

Xu, Kevin S., Kliger, Mark, Chen, Yilun, Woolf, Peter J., Hero, Alfred O. III

arXiv.org Machine LearningApr-30-2013

Previous studies on spam have mostly focused on studying its content or its source. Likewise, currently used anti-spam methods mostly involve filtering emails based on their content or by their email server IP address. More recently, there have been studies on the network-level behavior of spammers [1], [2]. However, very little attention has been devoted to studying how spammers acquire the email addresses that they send spam to, a process commonly referred to as harvesting. Harvesting is the first phase of the spam cycle; sending the spam emails to the acquired addresses is the second phase. Spammers send spam emails using spam servers, which are typically compromised computers or open proxies, both of which allow spammers to hide their identities. On the other hand, it has been observed that spammers do not make the same effort to conceal their identities during the harvesting phase [3], indicating that harvesters, which are individuals or bots that collect email addresses, are closely related to the spammers who are sending the spam emails. The harvester and spam server are the two intermediaries in the path of spam, illustrated in Figure 1. In this paper we try to reveal social networks of spammers by identifying communities of harvesters using data from both phases of the spam cycle.

harvester, machine learning, spam filtering, (18 more...)

arXiv.org Machine Learning

doi: 10.1109/ICC.2009.5199418

1305.0051

Country: North America > United States > Michigan (0.28)

Genre: Research Report (0.64)

Industry:

Information Technology > Services (0.73)
Information Technology > Security & Privacy (0.57)

Technology:

Information Technology > Security & Privacy > Spam Filtering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Add feedback

Inferring ground truth from multi-annotator ordinal data: a probabilistic approach

Lakshminarayanan, Balaji, Teh, Yee Whye

arXiv.org Machine LearningApr-30-2013

A popular approach for large scale data annotation tasks is crowdsourcing, wherein each data point is labeled by multiple noisy annotators. We consider the problem of inferring ground truth from noisy ordinal labels obtained from multiple annotators of varying and unknown expertise levels. Annotation models for ordinal data have been proposed mostly as extensions of their binary/categorical counterparts and have received little attention in the crowdsourcing literature. We propose a new model for crowdsourced ordinal data that accounts for instance difficulty as well as annotator expertise, and derive a variational Bayesian inference algorithm for parameter estimation. We analyze the ordinal extensions of several state-of-the-art annotator models for binary/categorical labels and evaluate the performance of all the models on two real world datasets containing ordinal query-URL relevance scores, collected through Amazon's Mechanical Turk. Our results indicate that the proposed model performs better or as well as existing state-of-the-art methods and is more resistant to'spammy' annotators (i.e., annotators who assign labels randomly without actually looking at the instance) than popular baselines such as mean, median, and majority vote which do not account for annotator expertise. Part of the work was done while at Yandex Labs.

annotator, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1305.0015

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Clustering processes

Ryabko, Daniil

arXiv.org Machine LearningApr-30-2013

The problem of clustering is considered, for the case when each data point is a sample generated by a stationary ergodic process. We propose a very natural asymptotic notion of consistency, and show that simple consistent algorithms exist, under most general non-parametric assumptions. The notion of consistency is as follows: two samples should be put into the same cluster if and only if they were generated by the same distribution. With this notion of consistency, clustering generalizes such classical statistical problems as homogeneity testing and process classification. We show that, for the case of a known number of clusters, consistency can be achieved under the only assumption that the joint distribution of the data is stationary ergodic (no parametric or Markovian assumptions, no assumptions of independence, neither between nor within the samples). If the number of clusters is unknown, consistency can be achieved under appropriate assumptions on the mixing rates of the processes. (again, no parametric or independence assumptions). In both cases we give examples of simple (at most quadratic in each argument) algorithms which are consistent.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1005.0826

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

Spectral Compressed Sensing via Structured Matrix Completion

Chen, Yuxin, Chi, Yuejie

arXiv.org Machine LearningApr-30-2013

The paper studies the problem of recovering a spectrally sparse object from a small number of time domain samples. Specifically, the object of interest with ambient dimension $n$ is assumed to be a mixture of $r$ complex multi-dimensional sinusoids, while the underlying frequencies can assume any value in the unit disk. Conventional compressed sensing paradigms suffer from the {\em basis mismatch} issue when imposing a discrete dictionary on the Fourier representation. To address this problem, we develop a novel nonparametric algorithm, called enhanced matrix completion (EMaC), based on structured matrix completion. The algorithm starts by arranging the data into a low-rank enhanced form with multi-fold Hankel structure, then attempts recovery via nuclear norm minimization. Under mild incoherence conditions, EMaC allows perfect recovery as soon as the number of samples exceeds the order of $\mathcal{O}(r\log^{2} n)$. We also show that, in many instances, accurate completion of a low-rank multi-fold Hankel matrix is possible when the number of observed entries is proportional to the information theoretical limits (except for a logarithmic gap). The robustness of EMaC against bounded noise and its applicability to super resolution are further demonstrated by numerical experiments.

artificial intelligence, machine learning, matrix, (17 more...)

arXiv.org Machine Learning

1304.461

Country: North America > United States (0.93)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.93)

Add feedback

On the convergence of the IRLS algorithm in Non-Local Patch Regression

Chaudhury, Kunal N.

arXiv.org Machine LearningApr-29-2013

Recently, it was demonstrated in [CS2012,CS2013] that the robustness of the classical Non-Local Means (NLM) algorithm [BCM2005] can be improved by incorporating $\ell^p (0 < p \leq 2)$ regression into the NLM framework. This general optimization framework, called Non-Local Patch Regression (NLPR), contains NLM as a special case. Denoising results on synthetic and natural images show that NLPR consistently performs better than NLM beyond a moderate noise level, and significantly so when $p$ is close to zero. An iteratively reweighted least-squares (IRLS) algorithm was proposed for solving the regression problem in NLPR, where the NLM output was used to initialize the iterations. Based on exhaustive numerical experiments, we observe that the IRLS algorithm is globally convergent (for arbitrary initialization) in the convex regime $1 \leq p \leq 2$, and locally convergent (fails very rarely using NLM initialization) in the non-convex regime $0 < p < 1$. In this letter, we adapt the "majorize-minimize" framework introduced in [Voss1980] to explain these observations. [CS2012] Chaudhury et al. (2012), "Non-local Euclidean medians," IEEE Signal Processing Letters. [CS2013] Chaudhury et al. (2013), "Non-local patch regression: Robust image denoising in patch space," IEEE ICASSP. [BCM2005] Buades et al. (2005), "A review of image denoising algorithms, with a new one," Multiscale Modeling and Simulation. [Voss1980] Voss et al. (1980), "Linear convergence of generalized Weiszfeld's method," Computing.

artificial intelligence, convergence, machine learning, (15 more...)

arXiv.org Machine Learning

doi: 10.1109/LSP.2013.2268248

1303.0417

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Add feedback

Optimal amortized regret in every interval

Panigrahy, Rina, Popat, Preyas

arXiv.org Machine LearningApr-29-2013

Consider the classical problem of predicting the next bit in a sequence of bits. A standard performance measure is {\em regret} (loss in payoff) with respect to a set of experts. For example if we measure performance with respect to two constant experts one that always predicts 0's and another that always predicts 1's it is well known that one can get regret $O(\sqrt T)$ with respect to the best expert by using, say, the weighted majority algorithm. But this algorithm does not provide performance guarantee in any interval. There are other algorithms that ensure regret $O(\sqrt {x \log T})$ in any interval of length $x$. In this paper we show a randomized algorithm that in an amortized sense gets a regret of $O(\sqrt x)$ for any interval when the sequence is partitioned into intervals arbitrarily. We empirically estimated the constant in the $O()$ for $T$ upto 2000 and found it to be small -- around 2.1. We also experimentally evaluate the efficacy of this algorithm in predicting high frequency stock data.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1304.7577

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Industry: Banking & Finance > Trading (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)

Add feedback