Goto

Collaborating Authors

 Country


Design of Automatically Adaptable Web Wrappers

arXiv.org Artificial Intelligence

Nowadays, the huge amount of information distributed through the Web motivates studying techniques to be adopted in order to extract relevant data in an efficient and reliable way. Both academia and enterprises developed several approaches of Web data extraction, for example using techniques of artificial intelligence or machine learning. Some commonly adopted procedures, namely wrappers, ensure a high degree of precision of information extracted from Web pages, and, at the same time, have to prove robustness in order not to compromise quality and reliability of data themselves. In this paper we focus on some experimental aspects related to the robustness of the data extraction process and the possibility of automatically adapting wrappers. We discuss the implementation of algorithms for finding similarities between two different version of a Web page, in order to handle modifications, avoiding the failure of data extraction tasks and ensuring reliability of information extracted. Our purpose is to evaluate performances, advantages and draw-backs of our novel system of automatic wrapper adaptation.


Automatic Wrapper Adaptation by Tree Edit Distance Matching

arXiv.org Artificial Intelligence

Information distributed through the Web keeps growing faster day by day, and for this reason, several techniques for extracting Web data have been suggested during last years. Often, extraction tasks are performed through so called wrappers, procedures extracting information from Web pages, e.g. implementing logic-based techniques. Many fields of application today require a strong degree of robustness of wrappers, in order not to compromise assets of information or reliability of data extracted. Unfortunately, wrappers may fail in the task of extracting data from a Web page, if its structure changes, sometimes even slightly, thus requiring the exploiting of new techniques to be automatically held so as to adapt the wrapper to the new structure of the page, in case of failure. In this work we present a novel approach of automatic wrapper adaptation based on the measurement of similarity of trees through improved tree edit distance matching techniques.


Efficient neuro-fuzzy system and its Memristor Crossbar-based Hardware Implementation

arXiv.org Artificial Intelligence

In this paper a novel neuro-fuzzy system is proposed where its learning is based on the creation of fuzzy relations by using new implication method without utilizing any exact mathematical techniques. Then, a simple memristor crossbar-based analog circuit is designed to implement this neuro-fuzzy system which offers very interesting properties. In addition to high connectivity between neurons and being fault-tolerant, all synaptic weights in our proposed method are always non-negative and there is no need to precisely adjust them. Finally, this structure is hierarchically expandable and can compute operations in real time since it is implemented through analog circuits. Simulation results show the efficiency and applicability of our neuro-fuzzy computing system. They also indicate that this system can be a good candidate to be used for creating artificial brain.


Estimating $\beta$-mixing coefficients

arXiv.org Machine Learning

The literature on statistical learning for time series assumes the asymptotic independence or ``mixing' of the data-generating process. These mixing assumptions are never tested, nor are there methods for estimating mixing rates from data. We give an estimator for the $\beta$-mixing rate based on a single stationary sample path and show it is $L_1$-risk consistent.


C-HiLasso: A Collaborative Hierarchical Sparse Modeling Framework

arXiv.org Machine Learning

Sparse modeling is a powerful framework for data analysis and processing. Traditionally, encoding in this framework is performed by solving an L1-regularized linear regression problem, commonly referred to as Lasso or Basis Pursuit. In this work we combine the sparsity-inducing property of the Lasso model at the individual feature level, with the block-sparsity property of the Group Lasso model, where sparse groups of features are jointly encoded, obtaining a sparsity pattern hierarchically structured. This results in the Hierarchical Lasso (HiLasso), which shows important practical modeling advantages. We then extend this approach to the collaborative case, where a set of simultaneously coded signals share the same sparsity pattern at the higher (group) level, but not necessarily at the lower (inside the group) level, obtaining the collaborative HiLasso model (C-HiLasso). Such signals then share the same active groups, or classes, but not necessarily the same active set. This model is very well suited for applications such as source identification and separation. An efficient optimization procedure, which guarantees convergence to the global optimum, is developed for these new models. The underlying presentation of the new framework and optimization approach is complemented with experimental examples and theoretical results regarding recovery guarantees for the proposed models.


The Local Rademacher Complexity of Lp-Norm Multiple Kernel Learning

arXiv.org Machine Learning

We derive an upper bound on the local Rademacher complexity of $\ell_p$-norm multiple kernel learning, which yields a tighter excess risk bound than global approaches. Previous local approaches aimed at analyzed the case $p=1$ only while our analysis covers all cases $1\leq p\leq\infty$, assuming the different feature mappings corresponding to the different kernels to be uncorrelated. We also show a lower bound that shows that the bound is tight, and derive consequences regarding excess loss, namely fast convergence rates of the order $O(n^{-\frac{\alpha}{1+\alpha}})$, where $\alpha$ is the minimum eigenvalue decay rate of the individual kernels.


A Wiki for Business Rules in Open Vocabulary, Executable English

arXiv.org Artificial Intelligence

The problem of business-IT alignment is of widespread economic concern. As one way of addressing the problem, this paper describes an online system that functions as a kind of Wiki -- one that supports the collaborative writing and running of business and scientific applications, as rules in open vocabulary, executable English, using a browser. Since the rules are in English, they are indexed by Google and other search engines. This is useful when looking for rules for a task that one has in mind. The design of the system integrates the semantics of data, with a semantics of an inference method, and also with the meanings of English sentences. As such, the system has functionality that may be useful for the Rules, Logic, Proof and Trust requirements of the Semantic Web. The system accepts rules, and small numbers of facts, typed or copy-pasted directly into a browser. One can then run the rules, again using a browser. For larger amounts of data, the system uses information in the rules to automatically generate and run SQL over networked databases. From a few highly declarative rules, the system typically generates SQL that would be too complicated to write reliably by hand. However, the system can explain its results in step-by-step hypertexted English, at the business or scientific level As befits a Wiki, shared use of the system is free.


An Agent Based Architecture (Using Planning) for Dynamic and Semantic Web Services Composition in an EBXML Context

arXiv.org Artificial Intelligence

The process-based semantic composition of Web Services is gaining a considerable momentum as an approach for the effective integration of distributed, heterogeneous, and autonomous applications. To compose Web Services semantically, we need an ontology. There are several ways of inserting semantics in Web Services. One of them consists of using description languages like OWL-S. In this paper, we introduce our work which consists in the proposition of a new model and the use of semantic matching technology for semantic and dynamic composition of ebXML business processes.


Multi-label Learning via Structured Decomposition and Group Sparsity

arXiv.org Machine Learning

In multi-label learning, each sample is associated with several labels. Existing works indicate that exploring correlations between labels improve the prediction performance. However, embedding the label correlations into the training process significantly increases the problem size. Moreover, the mapping of the label structure in the feature space is not clear. In this paper, we propose a novel multi-label learning method "Structured Decomposition + Group Sparsity (SDGS)". In SDGS, we learn a feature subspace for each label from the structured decomposition of the training data, and predict the labels of a new sample from its group sparse representation on the multi-subspace obtained from the structured decomposition. In particular, in the training stage, we decompose the data matrix $X\in R^{n\times p}$ as $X=\sum_{i=1}^kL^i+S$, wherein the rows of $L^i$ associated with samples that belong to label $i$ are nonzero and consist a low-rank matrix, while the other rows are all-zeros, the residual $S$ is a sparse matrix. The row space of $L_i$ is the feature subspace corresponding to label $i$. This decomposition can be efficiently obtained via randomized optimization. In the prediction stage, we estimate the group sparse representation of a new sample on the multi-subspace via group \emph{lasso}. The nonzero representation coefficients tend to concentrate on the subspaces of labels that the sample belongs to, and thus an effective prediction can be obtained. We evaluate SDGS on several real datasets and compare it with popular methods. Results verify the effectiveness and efficiency of SDGS.


Regularization Strategies and Empirical Bayesian Learning for MKL

arXiv.org Machine Learning

Multiple kernel learning (MKL), structured sparsity, and multi-task learning have recently received considerable attention. In this paper, we show how different MKL algorithms can be understood as applications of either regularization on the kernel weights or block-norm-based regularization, which is more common in structured sparsity and multi-task learning. We show that these two regularization strategies can be systematically mapped to each other through a concave conjugate operation. When the kernel-weight-based regularizer is separable into components, we can naturally consider a generative probabilistic model behind MKL. Based on this model, we propose learning algorithms for the kernel weights through the maximization of marginal likelihood. We show through numerical experiments that $\ell_2$-norm MKL and Elastic-net MKL achieve comparable accuracy to uniform kernel combination. Although uniform kernel combination might be preferable from its simplicity, $\ell_2$-norm MKL and Elastic-net MKL can learn the usefulness of the information sources represented as kernels. In particular, Elastic-net MKL achieves sparsity in the kernel weights.