Country
Combining Ontology Development Methodologies and Semantic Web Platforms for E-government Domain Ontology Development
Dombeu, Jean Vincent Fonou, Huisman, Magda
One of the key challenges in electronic government (e-government) is the development of systems that can be easily integrated and interoperated to provide seamless services delivery to citizens. In recent years, Semantic Web technologies based on ontology have emerged as promising solutions to the above engineering problems. However, current research practicing semantic development in e-government does not focus on the application of available methodologies and platforms for developing government domain ontologies. Furthermore, only a few of these researches provide detailed guidelines for developing semantic ontology models from a government service domain. This research presents a case study combining an ontology building methodology and two state-of-the-art Semantic Web platforms namely Protege and Java Jena ontology API for semantic ontology development in e-government. Firstly, a framework adopted from the Uschold and King ontology building methodology is employed to build a domain ontology describing the semantic content of a government service domain. Thereafter, UML is used to semi-formally represent the domain ontology. Finally, Protege and Jena API are employed to create the Web Ontology Language (OWL) and Resource Description Framework (RDF) representations of the domain ontology respectively to enable its computer processing. The study aims at: (1) providing e-government developers, particularly those from the developing world with detailed guidelines for practicing semantic content development in their e-government projects and (2), strengthening the adoption of semantic technologies in e-government. The study would also be of interest to novice Semantic Web developers who might used it as a starting point for further investigations.
A Machine Learning Based Analytical Framework for Semantic Annotation Requirements
Hassanzadeh, Hamed, Keyvanpour, MohammadReza
The Semantic Web is an extension of the current web in which information is given well-defined meaning. The perspective of Semantic Web is to promote the quality and intelligence of the current web by changing its contents into machine understandable form. Therefore, semantic level information is one of the cornerstones of the Semantic Web. The process of adding semantic metadata to web resources is called Semantic Annotation. There are many obstacles against the Semantic Annotation, such as multilinguality, scalability, and issues which are related to diversity and inconsistency in content of different web pages. Due to the wide range of domains and the dynamic environments that the Semantic Annotation systems must be performed on, the problem of automating annotation process is one of the significant challenges in this domain. To overcome this problem, different machine learning approaches such as supervised learning, unsupervised learning and more recent ones like, semi-supervised learning and active learning have been utilized. In this paper we present an inclusive layered classification of Semantic Annotation challenges and discuss the most important issues in this field. Also, we review and analyze machine learning applications for solving semantic annotation problems. For this goal, the article tries to closely study and categorize related researches for better understanding and to reach a framework that can map machine learning techniques into the Semantic Annotation challenges and requirements.
Compressive Network Analysis
Jiang, Xiaoye, Yao, Yuan, Liu, Han, Guibas, Leonidas
Modern data acquisition routinely produces massive amounts of network data. Though many methods and models have been proposed to analyze such data, the research of network data is largely disconnected with the classical theory of statistical learning and signal processing. In this paper, we present a new framework for modeling network data, which connects two seemingly different areas: network data analysis and compressed sensing. From a nonparametric perspective, we model an observed network using a large dictionary. In particular, we consider the network clique detection problem and show connections between our formulation with a new algebraic tool, namely Randon basis pursuit in homogeneous spaces. Such a connection allows us to identify rigorous recovery conditions for clique detection problems. Though this paper is mainly conceptual, we also develop practical approximation algorithms for solving empirical problems and demonstrate their usefulness on real-world datasets.
Robust Clustering Using Outlier-Sparsity Regularization
Forero, Pedro A., Kekatos, Vassilis, Giannakis, Georgios B.
Notwithstanding the popularity of conventional clustering algorithms such as K-means and probabilistic clustering, their clustering results are sensitive to the presence of outliers in the data. Even a few outliers can compromise the ability of these algorithms to identify meaningful hidden structures rendering their outcome unreliable. This paper develops robust clustering algorithms that not only aim to cluster the data, but also to identify the outliers. The novel approaches rely on the infrequent presence of outliers in the data which translates to sparsity in a judiciously chosen domain. Capitalizing on the sparsity in the outlier domain, outlier-aware robust K-means and probabilistic clustering approaches are proposed. Their novelty lies on identifying outliers while effecting sparsity in the outlier domain through carefully chosen regularization. A block coordinate descent approach is developed to obtain iterative algorithms with convergence guarantees and small excess computational complexity with respect to their non-robust counterparts. Kernelized versions of the robust clustering algorithms are also developed to efficiently handle high-dimensional data, identify nonlinearly separable clusters, or even cluster objects that are not represented by vectors. Numerical tests on both synthetic and real datasets validate the performance and applicability of the novel algorithms.
Convex Approaches to Model Wavelet Sparsity Patterns
Rao, Nikhil S, Nowak, Robert D., Wright, Stephen J., Kingsbury, Nick G.
Statistical dependencies among wavelet coefficients are commonly represented by graphical models such as hidden Markov trees(HMTs). However, in linear inverse problems such as deconvolution, tomography, and compressed sensing, the presence of a sensing or observation matrix produces a linear mixing of the simple Markovian dependency structure. This leads to reconstruction problems that are non-convex optimizations. Past work has dealt with this issue by resorting to greedy or suboptimal iterative reconstruction methods. In this paper, we propose new modeling approaches based on group-sparsity penalties that leads to convex optimizations that can be solved exactly and efficiently. We show that the methods we develop perform significantly better in deconvolution and compressed sensing applications, while being as computationally efficient as standard coefficient-wise approaches such as lasso.
Margin-adaptive model selection in statistical learning
Arlot, Sylvain, Bartlett, Peter L.
A classical condition for fast learning rates is the margin condition, first introduced by Mammen and Tsybakov. We tackle in this paper the problem of adaptivity to this condition in the context of model selection, in a general learning framework. Actually, we consider a weaker version of this condition that allows one to take into account that learning within a small model can be much easier than within a large one. Requiring this "strong margin adaptivity" makes the model selection problem more challenging. We first prove, in a general framework, that some penalization procedures (including local Rademacher complexities) exhibit this adaptivity when the models are nested. Contrary to previous results, this holds with penalties that only depend on the data. Our second main result is that strong margin adaptivity is not always possible when the models are not nested: for every model selection procedure (even a randomized one), there is a problem for which it does not demonstrate strong margin adaptivity.
Algorithms and Complexity Results for Persuasive Argumentation
Kim, Eun Jung, Ordyniak, Sebastian, Szeider, Stefan
The study of arguments as abstract entities and their interaction as introduced by Dung (Artificial Intelligence 177, 1995) has become one of the most active research branches within Artificial Intelligence and Reasoning. A main issue for abstract argumentation systems is the selection of acceptable sets of arguments. Value-based argumentation, as introduced by Bench-Capon (J. Logic Comput. 13, 2003), extends Dung's framework. It takes into account the relative strength of arguments with respect to some ranking representing an audience: an argument is subjectively accepted if it is accepted with respect to some audience, it is objectively accepted if it is accepted with respect to all audiences. Deciding whether an argument is subjectively or objectively accepted, respectively, are computationally intractable problems. In fact, the problems remain intractable under structural restrictions that render the main computational problems for non-value-based argumentation systems tractable. In this paper we identify nontrivial classes of value-based argumentation systems for which the acceptance problems are polynomial-time tractable. The classes are defined by means of structural restrictions in terms of the underlying graphical structure of the value-based system. Furthermore we show that the acceptance problems are intractable for two classes of value-based systems that where conjectured to be tractable by Dunne (Artificial Intelligence 171, 2007).
Fast redshift clustering with the Baire (ultra) metric
Murtagh, Fionn, Contreras, Pedro
The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. We apply the Baire distance to spectrometric and photometric redshifts from the Sloan Digital Sky Survey using, in this work, about half a million astronomical objects. We want to know how well the (more cos\ tly to determine) spectrometric redshifts can predict the (more easily obtained) photometric redshifts, i.e. we seek to regress the spectrometric on the photometric redshifts, and we develop a clusterwise nearest neighbor regression procedure for this.
Learning invariant features through local space contraction
Rifai, Salah, Muller, Xavier, Glorot, Xavier, Mesnil, Gregoire, Bengio, Yoshua, Vincent, Pascal
We present in this paper a novel approach for training deterministic auto-encoders. We show that by adding a well chosen penalty term to the classical reconstruction cost function, we can achieve results that equal or surpass those attained by other regularized auto-encoders as well as denoising auto-encoders on a range of datasets. This penalty term corresponds to the Frobenius norm of the Jacobian matrix of the encoder activations with respect to the input. We show that this penalty term results in a localized space contraction which in turn yields robust features on the activation layer. Furthermore, we show how this penalty term is related to both regularized auto-encoders and denoising encoders and how it can be seen as a link between deterministic and non-deterministic auto-encoders. We find empirically that this penalty helps to carve a representation that better captures the local directions of variation dictated by the data, corresponding to a lower-dimensional non-linear manifold, while being more invariant to the vast majority of directions orthogonal to the manifold. Finally, we show that by using the learned features to initialize a MLP, we achieve state of the art classification error on a range of datasets, surpassing other methods of pre-training.
A sufficient condition on monotonic increase of the number of nonzero entry in the optimizer of L1 norm penalized least-square problem
Duan, J., Soussen, Charles, Brie, David, Idier, Jerome, Wang, Y. -P.
The $\ell$-1 norm based optimization is widely used in signal processing, especially in recent compressed sensing theory. This paper studies the solution path of the $\ell$-1 norm penalized least-square problem, whose constrained form is known as Least Absolute Shrinkage and Selection Operator (LASSO). A solution path is the set of all the optimizers with respect to the evolution of the hyperparameter (Lagrange multiplier). The study of the solution path is of great significance in viewing and understanding the profile of the tradeoff between the approximation and regularization terms. If the solution path of a given problem is known, it can help us to find the optimal hyperparameter under a given criterion such as the Akaike Information Criterion. In this paper we present a sufficient condition on $\ell$-1 norm penalized least-square problem. Under this sufficient condition, the number of nonzero entries in the optimizer or solution vector increases monotonically when the hyperparameter decreases. We also generalize the result to the often used total variation case, where the $\ell$-1 norm is taken over the first order derivative of the solution vector. We prove that the proposed condition has intrinsic connections with the condition given by Donoho, et al \cite{Donoho08} and the positive cone condition by Efron {\it el al} \cite{Efron04}. However, the proposed condition does not need to assume the sparsity level of the signal as required by Donoho et al's condition, and is easier to verify than Efron, et al's positive cone condition when being used for practical applications.