AITopics

doi: 10.1504/IJAIP.2010.030531

0803.4074

Country:

North America > United States (0.28)
Asia > Japan > Honshū > Kantō (0.16)

Industry: Education (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
Information Technology > Data Science > Data Mining (0.68)

arXiv.org Artificial IntelligenceJan-15-2009

On Introspection, Metacognitive Control and Augmented Data Mining Live Cycles

Sonntag, Daniel

We discuss metacognitive modelling as an enhancement to cognitive modelling and computing. Metacognitive control mechanisms should enable AI systems to self-reflect, reason about their actions, and to adapt to new situations. In this respect, we propose implementation details of a knowledge taxonomy and an augmented data mining life cycle which supports a live integration of obtained models.

artificial intelligence, knowledge, neural network, (18 more...)

0807.4417

Country: North America > United States > California (0.28)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.89)
(2 more...)

Zhou, Shuheng, Ligett, Katrina, Wasserman, Larry

Differential Privacy with Compression

arXiv.org Machine LearningJan-10-2009

This work studies formal utility and privacy guarantees for a simple multiplicative database transformation, where the data are compressed by a random linear or affine transformation, reducing the number of data records substantially, while preserving the number of original input variables. We provide an analysis framework inspired by a recent concept known as differential privacy (Dwork 06). Our goal is to show that, despite the general difficulty of achieving the differential privacy guarantee, it is possible to publish synthetic data that are useful for a number of common statistical learning applications. This includes high dimensional sparse regression (Zhou et al. 07), principal component analysis (PCA), and other statistical measures (Liu et al. 06) based on the covariance of the initial data.

artificial intelligence, machine learning, privacy, (17 more...)

arXiv.org Machine Learning

0901.1365

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.54)

Oba, Shigeyuki, Kawanabe, Motoaki, Müller, Klaus-Robert, Ishii, Shin

Heterogeneous Component Analysis

In bioinformatics it is often desirable to combine data from various measurement sources and thus structured feature vectors are to be analyzed that possess different intrinsic blocking characteristics (e.g., different patterns of missing values, observation noiselevels, effective intrinsic dimensionalities). We propose a new machine learning tool, heterogeneous component analysis (HCA), for feature extraction in order to better understand the factors that underlie such complex structured heterogeneous data. HCA is a linear block-wise sparse Bayesian PCA based not only on a probabilistic model with block-wise residual variance terms but also on a Bayesian treatment of a block-wise sparse factor-loading matrix. We study various algorithmsthat implement our HCA concept extracting sparse heterogeneous structure by obtaining common components for the blocks and specific components withineach block. Simulations on toy and bioinformatics data underline the usefulness of the proposed structured matrix factorization concept.

artificial intelligence, health & medicine, matrix, (17 more...)

Country:

Asia > Japan (0.14)
Europe > Germany (0.14)
North America > United States (0.14)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.88)
Information Technology > Data Science > Data Mining > Feature Extraction (0.55)

Langford, John, Zhang, Tong

The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information

We present Epoch-Greedy, an algorithm for multi-armed bandits with observable side information. Epoch-Greedy has the following properties: No knowledge of a time horizon $T$ is necessary. The regret incurred by Epoch-Greedy is controlled by a sample complexity bound for a hypothesis class. The regret scales as $O(T^{2/3} S^{1/3})$ or better (sometimes, much better). Here $S$ is the complexity term in a sample complexity bound for standard supervised learning.

artificial intelligence, big data, epoch-greedy, (17 more...)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Burghouts, Gertjan, Smeulders, Arnold, Geusebroek, Jan-mark

The Distribution Family of Similarity Distances

Assessing similarity between features is a key step in object recognition and scene categorization tasks. We argue that knowledge on the distribution of distances generated by similarity functions is crucial in deciding whether features are similar or not. Intuitively one would expect that similarities between features could arise from any distribution. In this paper, we will derive the contrary, and report the theoretical result that $L_p$-norms --a class of commonly applied distance metrics-- from one feature vector to other vectors are Weibull-distributed if the feature values are correlated and non-identically distributed. Besides these assumptions being realistic for images, we experimentally show them to hold for various popular feature extraction algorithms, for a diverse range of images. This fundamental insight opens new directions in the assessment of feature similarity, with projected improvements in object and scene recognition algorithms. Erratum: The authors of paper have declared that they have become convinced that the reasoning in the reference is too simple as a proof of their claims. As a consequence, they withdraw their theorems.

artificial intelligence, data mining, statistics, (19 more...)

Country: Europe > Netherlands (0.14)

Genre: Research Report (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Data Science > Data Mining > Feature Extraction (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.42)

Kearns, Michael, Tan, Jinsong, Wortman, Jennifer

Privacy-Preserving Belief Propagation and Sampling

We provide provably privacy-preserving versions of belief propagation, Gibbs sampling, and other local algorithms -- distributed multiparty protocols in which each party or vertex learns only its final local value, and absolutely nothing else.

bayesian inference, belief revision, protocol, (21 more...)

Country: North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)

Industry: Information Technology > Security & Privacy (0.94)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.63)
Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (0.63)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)

Lelu, Alain, Cuxac, Pascal, Johansson, Joel

Classification dynamique d'un flux documentaire : une \'evaluation statique pr\'ealable de l'algorithme GERMEN

arXiv.org Artificial IntelligenceNov-4-2008

Data-stream clustering is an ever-expanding subdomain of knowledge extraction. Most of the past and present research effort aims at efficient scaling up for the huge data repositories. Our approach focuses on qualitative improvement, mainly for "weak signals" detection and precise tracking of topical evolutions in the framework of information watch - though scalability is intrinsically guaranteed in a possibly distributed implementation. Our GERMEN algorithm exhaustively picks up the whole set of density peaks of the data at time t, by identifying the local perturbations induced by the current document vector, such as changing cluster borders, or new/vanishing clusters. Optimality yields from the uniqueness 1) of the density landscape for any value of our zoom parameter, 2) of the cluster allocation operated by our border propagation rule. This results in a rigorous independence from the data presentation ranking or any initialization parameter. We present here as a first step the only assessment of a static view resulting from one year of the CNRS/INIST Pascal database in the field of geotechnics.

artificial intelligence, classe, data mining, (17 more...)

0811.0602

Country:

North America > United States (0.28)
Europe (0.28)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Lelu, Alain, Cadot, Martine, Cuxac, Pascal

Document stream clustering: experimenting an incremental algorithm and AR-based tools for highlighting dynamic trends

arXiv.org Artificial IntelligenceNov-3-2008

We address here two major challenges presented by dynamic data mining: 1) the stability challenge: we have implemented a rigorous incremental density-based clustering algorithm, independent from any initial conditions and ordering of the data-vectors stream, 2) the cognitive challenge: we have implemented a stringent selection process of association rules between clusters at time t-1 and time t for directly generating the main conclusions about the dynamics of a data-stream. We illustrate these points with an application to a two years and 2600 documents scientific information database.

artificial intelligence, itemset, upstream oil & gas, (15 more...)

0811.0340

Country:

Europe (0.68)
North America > United States > California > San Mateo County > Menlo Park (0.14)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas > Upstream (0.47)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Artificial IntelligenceOct-30-2008

A Novel Clustering Algorithm Based on a Modified Model of Random Walk

Li, Qiang, He, Yan, Jiang, Jing-ping

We introduce a modified model of random walk, and then develop two novel clustering algorithms based on it. In the algorithms, each data point in a dataset is considered as a particle which can move at random in space according to the preset rules in the modified model. Further, this data point may be also viewed as a local control subsystem, in which the controller adjusts its transition probability vector in terms of the feedbacks of all data points, and then its transition direction is identified by an event-generating function. Finally, the positions of all data points are updated. As they move in space, data points collect gradually and some separating parts emerge among them automatically. As a consequence, data points that belong to the same class are located at a same position, whereas those that belong to different classes are away from one another. Moreover, the experimental results have demonstrated that data points in the test datasets are clustered reasonably and efficiently, and the comparison with other algorithms also provides an indication of the effectiveness of the proposed algorithms.

artificial intelligence, data mining, particle, (14 more...)

0810.5484

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.74)