AITopics | Data Science

Collaborating Authors

Data Science

News Overviews Instructional Materials AI-Alerts Classics

PAC-Bayesian Analysis of the Exploration-Exploitation Trade-off

Seldin, Yevgeny, Cesa-Bianchi, Nicolò, Laviolette, François, Auer, Peter, Shawe-Taylor, John, Peters, Jan

arXiv.org Machine LearningMay-23-2011

We develop a coherent framework for integrative simultaneous analysis of the exploration-exploitation and model order selection trade-offs. We improve over our preceding results on the same subject (Seldin et al., 2011) by combining PAC-Bayesian analysis with Bernstein-type inequality for martingales. Such a combination is also of independent interest for studies of multiple simultaneously evolving martingales.

big data, pac-bayesian analysis, upstream oil & gas, (22 more...)

arXiv.org Machine Learning

1105.4585

Country: Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas > Upstream (0.72)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback

Graph-Based Knowledge Discovery: Compression versus Frequency

Eberle, William (Tennessee Tech University) | Holder, Lawrence B. (Washington State University )

AAAI ConferencesMay-18-2011

There are two primary types of graph-based data miners: frequent subgraph and compression-based miners. With frequent subgraph miners, the most interesting substructure is the largest one (or ones) that meet the minimum support. Whereas, compression-based graph miners discover those subgraphs that maximize the amount of compression that a particular substructure provides a graph. The algorithms associated with these two approaches are not only different, but they also may result in dramatic performance differences, as well as in the normative patterns being discovered. In order to compare these two types of graph-based approaches to knowledge discovery, in the following sections we will compare two publicly available applications: GASTON and SUBDUE.

artificial intelligence, data mining, substructure, (17 more...)

AAAI Conferences

Twenty-Fourth International FLAIRS Conference

Country:

North America > United States > Washington (0.15)
North America > United States > Tennessee (0.15)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.72)
Information Technology > Data Science > Data Mining > Knowledge Discovery (0.61)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.49)

Add feedback

When Optimal Is Just Not Good Enough: Learning Fast Informative Action Cost Partitionings

Karpas, Erez (Technion) | Katz, Michael (Technion) | Markovitch, Shaul (Technion)

AAAI ConferencesMay-18-2011

Several recent heuristics for domain independent planning adopt some action cost partitioning scheme to derive admissible heuristic estimates. Given a state, two methods for obtaining an action cost partitioning have been proposed: optimal cost partitioning, which results in the best possible heuristic estimate for that state, but requires a substantial computational effort, and ad-hoc (uniform) cost partitioning, which is much faster, but is usually less informative. These two methods represent almost opposite points in the tradeoff between heuristic accuracy and heuristic computation time. One compromise that has been proposed between these two is using an optimal cost partitioning for the initial state to evaluate all states. In this paper, we propose a novel method for deriving a fast, informative cost-partitioning scheme, that is based on computing optimal action cost partitionings for a small set of states, and using these to derive heuristic estimates for all states. Our method provides greater control over the accuracy/computation-time tradeoff, which, as our empirical evaluation shows, can result in better performance.

artificial intelligence, optimal cost, planning & scheduling, (18 more...)

AAAI Conferences

Twenty-First International Conference on Automated Planning and Scheduling

Country: Oceania > Australia (0.14)

Genre: Research Report (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)
Information Technology > Data Science (0.68)

Add feedback

Using Decision Trees to Find Patterns in an Ophthalmology Dataset

Imberman, Susan (College of Staten Island, City University of New York) | Ludwig, Irene (City University of New York) | Zelikovitz, Sarah (College of Staten Island, City University of New York)

AAAI ConferencesMay-18-2011

We present research in decision tree analysis that studies a data set and finds new patterns that were not obvious using statistical methods. Our method is applied to a database of accommodative esotropic patients. Accommodative esotropia is an eye disease that when left untreated leads to blindness. Patients whose muscles deteriorate often need corrective surgery, since less invasive methods of treatment tend to fail in these patients. Using a learn and prune methodology, decision tree analysis of 354 accommodative esotropic patients led to the discovery of two conjunctive variables that predicted deterioration in the initial year of treatment better than what was previously determined using standard statistical methods.

decision tree learning, deterioration, health & medicine, (16 more...)

AAAI Conferences

Twenty-Fourth International FLAIRS Conference

Country: North America > United States > New York (0.16)

Industry: Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.86)

Add feedback

Efficient Descriptive Community Mining

Atzmueller, Martin (University of Kassel) | Mitzlaff, Folke (University of Kassel)

AAAI ConferencesMay-18-2011

Community mining is applied in order to identify groups of users which share, e.g., common interests or expertise. This paper presents an approach for mining descriptive patterns in order to characterize communities in terms of their distinctive features: For an efficient discovery approach, we introduce optimistic estimates for obtaining an upper bound for the community quality. We present an evaluation using data from the real-world social bookmarking system BibSonomy.

artificial intelligence, data mining, optimistic estimate, (19 more...)

AAAI Conferences

Twenty-Fourth International FLAIRS Conference

Country:

North America > United States (0.14)
Asia (0.14)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.96)
Information Technology > Data Science > Data Mining (0.95)

Add feedback

Co-Occurrence-Based Error Correction Approach to Word Segmentation

Chaowicharat, Ekawat (Mahidol University) | Naruedomkul, Kanlaya (Mahidol University)

AAAI ConferencesMay-18-2011

To overcome the problems in Thai word segmentation, a number of word segmentation has been proposed during the long period of time until today. We propose a novel Thai word segmentation approach so called Co-occurrence-Based Error Correction (CBEC). CBEC generates all possible segmentation candidates using the classical maximal matching algorithm and then selects the most accurate segmentation based on co-occurrence and an error correction algorithm. CBEC was trained and evaluated on BEST 2009 corpus.

artificial intelligence, data quality, segmentation, (16 more...)

AAAI Conferences

Twenty-Fourth International FLAIRS Conference

Country: Asia > Thailand (0.14)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Data Science > Data Quality > Data Cleaning (0.83)

Add feedback

Doubly Robust Policy Evaluation and Learning

Dudik, Miroslav, Langford, John, Li, Lihong

arXiv.org Artificial IntelligenceMay-5-2011

We study decision making in environments where the reward is only partially observed, but can be modeled as a function of an action and an observed context. This setting, known as contextual bandits, encompasses a wide variety of applications including health-care policy and Internet advertising. A central task is evaluation of a new policy given historic data consisting of contexts, actions and received rewards. The key challenge is that the past data typically does not faithfully represent proportions of actions taken by a new policy. Previous approaches rely either on models of rewards or models of the past policy. The former are plagued by a large bias whereas the latter have a large variance. In this work, we leverage the strength and overcome the weaknesses of the two approaches by applying the doubly robust technique to the problems of policy evaluation and optimization. We prove that this approach yields accurate value estimates when we have either a good (but not necessarily consistent) model of rewards or a good (but not necessarily consistent) model of past policy. Extensive empirical comparison demonstrates that the doubly robust approach uniformly improves over existing techniques, achieving both lower variance in value estimation and better policies. As such, we expect the doubly robust approach to become common practice.

artificial intelligence, estimator, health & medicine, (18 more...)

arXiv.org Artificial Intelligence

1103.4601

Country: North America > United States (0.46)

Industry:

Health & Medicine (0.48)
Marketing (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

GANC: Greedy Agglomerative Normalized Cut

Tabatabaei, Seyed Salim, Coates, Mark, Rabbat, Michael

arXiv.org Artificial IntelligenceMay-5-2011

This paper describes a graph clustering algorithm that aims to minimize the normalized cut criterion and has a model order selection procedure. The performance of the proposed algorithm is comparable to spectral approaches in terms of minimizing normalized cut. However, unlike spectral approaches, the proposed algorithm scales to graphs with millions of nodes and edges. The algorithm consists of three components that are processed sequentially: a greedy agglomerative hierarchical clustering procedure, model order selection, and a local refinement. For a graph of n nodes and O(n) edges, the computational complexity of the algorithm is O(n log^2 n), a major improvement over the O(n^3) complexity of spectral methods. Experiments are performed on real and synthetic networks to demonstrate the scalability of the proposed approach, the effectiveness of the model order selection procedure, and the performance of the proposed algorithm in terms of minimizing the normalized cut metric.

algorithm, artificial intelligence, health & medicine, (20 more...)

arXiv.org Artificial Intelligence

1105.0974

Country:

Europe (0.93)
North America > United States > New York > New York County > New York City (0.14)
North America > Canada > Quebec > Montreal (0.14)
(2 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Robust Clustering Using Outlier-Sparsity Regularization

Forero, Pedro A., Kekatos, Vassilis, Giannakis, Georgios B.

arXiv.org Machine LearningApr-22-2011

Notwithstanding the popularity of conventional clustering algorithms such as K-means and probabilistic clustering, their clustering results are sensitive to the presence of outliers in the data. Even a few outliers can compromise the ability of these algorithms to identify meaningful hidden structures rendering their outcome unreliable. This paper develops robust clustering algorithms that not only aim to cluster the data, but also to identify the outliers. The novel approaches rely on the infrequent presence of outliers in the data which translates to sparsity in a judiciously chosen domain. Capitalizing on the sparsity in the outlier domain, outlier-aware robust K-means and probabilistic clustering approaches are proposed. Their novelty lies on identifying outliers while effecting sparsity in the outlier domain through carefully chosen regularization. A block coordinate descent approach is developed to obtain iterative algorithms with convergence guarantees and small excess computational complexity with respect to their non-robust counterparts. Kernelized versions of the robust clustering algorithms are also developed to efficiently handle high-dimensional data, identify nonlinearly separable clusters, or even cluster objects that are not represented by vectors. Numerical tests on both synthetic and real datasets validate the performance and applicability of the novel algorithms.

algorithm, artificial intelligence, health & medicine, (20 more...)

arXiv.org Machine Learning

doi: 10.1109/TSP.2012.2196696

1104.4512

Country:

North America > United States > Ohio (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)

Genre: Research Report (0.84)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Convex Approaches to Model Wavelet Sparsity Patterns

Rao, Nikhil S, Nowak, Robert D., Wright, Stephen J., Kingsbury, Nick G.

arXiv.org Machine LearningApr-22-2011

Statistical dependencies among wavelet coefficients are commonly represented by graphical models such as hidden Markov trees(HMTs). However, in linear inverse problems such as deconvolution, tomography, and compressed sensing, the presence of a sensing or observation matrix produces a linear mixing of the simple Markovian dependency structure. This leads to reconstruction problems that are non-convex optimizations. Past work has dealt with this issue by resorting to greedy or suboptimal iterative reconstruction methods. In this paper, we propose new modeling approaches based on group-sparsity penalties that leads to convex optimizations that can be solved exactly and efficiently. We show that the methods we develop perform significantly better in deconvolution and compressed sensing applications, while being as computationally efficient as standard coefficient-wise approaches such as lasso.

artificial intelligence, data quality, penalty, (16 more...)

arXiv.org Machine Learning

1104.4385

Country: North America > United States > Wisconsin (0.14)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.50)
Information Technology > Data Science > Data Quality > Data Transformation (0.39)

Add feedback