AITopics

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.58)

AAAI ConferencesMay-8-2016

TAO: System for Table Detection and Extraction from PDF Documents

Perez-Arriaga, Martha O. (University of New Mexico) | Estrada, Trilce (University of New Mexico) | Abad-Mota, Soraya (University of New Mexico)

Digital documents present knowledge in most areas of study, exchanging and communicating information in a portable way. To better use the knowledge embedded in an ever-growing information source, effective tools for automatic information extraction are needed. Tables are crucial information elements in documents of scientific nature. Most publications use tables to represent and report concrete findings of research. Current methods used to extract table data from PDF documents lack precision in detecting, extracting, and representing data from diverse layouts. We present the system TAble Organization (TAO) to automatically detect, extract and organize information from tables in PDF documents. TAO uses a processing, based on the k-nearest neighbor method and layout heuristics, to detect tables within a document and to extract table information. This system generates an enriched representation of the data extracted from tables in the PDF documents. TAO’s performance is comparable to other table extraction methods, but it overcomes some related work limitations and proves to be more robust in experiments with diverse document layouts.

pdf document, table detection and extraction, tao

The Twenty-Ninth International Flairs Conference

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.53)

Rahal, Imad (College of Saint Benedict and Saint John's University) | Furst, Emily (University of Washington) | Haraty, Ramzi (Lebanese American University)

Parallelizing Instance-Based Data Classifiers

AAAI ConferencesMay-8-2016

In the age of BigData, producing results quickly while operating over vast volumes of data has become a vital requirement for data mining and machine learning applications to a degree that traditional serial algorithms can no longer keep up with these constraints. This paper applies different forms of parallelization techniques to popular instance-based classifiers–namely, a special form of naive Bayes and k-nearest neighbors–in an attempt to compare performance and make broad conclusions applicable to instance-based classifiers. Overall, our experimental results strongly indicate that parallelism over test instances provides the most speedup in most cases compared to other forms of parallelism.

parallelizing instance-based data classifier

The Twenty-Ninth International Flairs Conference

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.53)

AAAI ConferencesMay-8-2016

EigenTransitions with Hypothesis Testing: The Anatomy of Urban Mobility

Zhang, Ke (University of Pittsburgh) | Lin, Yu-Ru (University of Pittsburgh) | Pelechrinis, Konstantinos (University of Pittsburgh)

Identifying the patterns in urban mobility is important for a variety of tasks such as transportation planning, urban resource allocation, emergency planning etc. This is evident from the large body of research on the topic, which has exploded with the vast amount of geo-tagged user-generated content from online social media. However, most of the existing work focuses on a specific setting, taking a statistical approach to describe and model the observed patterns. On the contrary in this work we introduce EigenTransitions, a spectrum-based, generic framework for analyzing spatio-temporal mobility datasets. EigenTransitions capture the anatomy of the aggregate and/or individuals’ mobility as a compact set of latent mobility patterns. Using a large corpus of geo-tagged content collected from Twitter, we utilize EigenTransitions to analyze the structure of urban mobility. In particular, we identify the EigenTransitions of a flow network between urban areas and derive hypothesis testing framework to evaluate urban mobility from both temporal and demographic perspectives. We further show how EigenTransitions not only identify latent mobility patterns, but also have the potential to support applications such as mobility prediction and inter-city comparisons. In particular, by identifying neighbors with similar latent mobility patterns and incorporating their historical transition behaviors, we proposed an EigenTransitions-based k-nearest neighbor algorithm, which can significantly improve the performance of individual mobility prediction. The proposed method is especially effective in “cold-start” scenarios where traditional methods are known to perform poorly.

hypothesis testing, machine learning, scientific discovery, (4 more...)

Tenth International AAAI Conference on Web and Social Media

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.60)

#artificialintelligenceMay-6-2016, 13:46:23 GMT

Recognizing Snacks using SimpleCV

This article aims to provide the basic knowledge of how to recognize snacks by using Python and SimpleCV. Readers will gain practical programming knowledge via experimentation with the Python scripts included in the Snack Classifier open source project. To illustrate with a snacks recognition app, the Snack Watcher watches any snacks present on the snack table. For Snack Watcher to determine if there was an interesting event, it needs to process the image into a set of image "Blobs". For each "Blob", Snack Watcher compares the "Blob" with it's previous state to determine if the "Blob" was added, removed or stationary.

artificial intelligence, classifier, machine learning, (17 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.32)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.30)

#artificialintelligenceMay-1-2016, 15:51:11 GMT

K Nearest Neighbors Application - Practical Machine Learning Tutorial with Python p.14

In the last part we introduced Classification, which is a supervised form of machine learning, and explained the K Nearest Neighbors algorithm intuition. In this tutorial, we're actually going to apply a simple example of the algorithm using Scikit-Learn, and then in the subsquent tutorials we'll build our own algorithm to learn more about how it works under the hood. To exemplify classification, we're going to use a Breast Cancer Dataset, which is a dataset donated to the University of California, Irvine (UCI) collection from the University of Wisconsin-Madison. UCI has a large Machine Learning Repository.

artificial intelligence, nearest neighbor application, practical machine learning tutorial, (2 more...)

Country:

North America > United States > Wisconsin > Dane County > Madison (0.32)
North America > United States > California > Orange County > Irvine (0.32)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

@machinelearnbotApr-22-2016, 10:00:27 GMT

Clustering idea for very large datasets

Let's say you have to cluster 10 million points, for instance keywords. So, in short, you can perform k-NN (k-nearest neighbors) clustering or some other types of clustering, which typically is O(n 2) or worse, from a computational complexity point of view. Has anyone ever used a clustering method based on sampling? The idea is to start by sampling 1% (or less) of the 100,000,000 entries, and perform clustering on these pairs of keywords, to create a "seed" or "baseline" cluster structure. The next step is to browse sequentially your 10,000,000 keywords, and for each keyword, find the closest cluster from the baseline cluster structure.

artificial intelligence, keyword, machine learning, (7 more...)

@machinelearnbot

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.58)

#artificialintelligenceApr-20-2016, 05:55:25 GMT

DIY Recommendation Engines for Mom and Pop Ecommerce Shops

Of course we have all heard about machine learning and recommendation engines in big business ecommerce. For quite some time, massive ecommerce businesses like Netflix, Amazon, and Ebay have been leveraging the power of data science to improve customer service and boost sales. Where once this technology was cost-prohibitive to all but the major players, recently things have changed. Thanks to multi-channel ecommerce platforms like Shopify, and the developers who are building custom machine learning add-ons, now mom and pop online businesses get the chance to infuse their operations with the power of data science. In this article I introduce how machine learning algorithms work to produce recommendation systems for small business ecommerce.

artificial intelligence, machine learning, recommendation system, (13 more...)

Industry: Information Technology > Services > e-Commerce Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.38)

AAAI ConferencesApr-19-2016

Flattening the Density Gradient for Eliminating Spatial Centrality to Reduce Hubness

Hara, Kazuo (National Institute of Genetics) | Suzuki, Ikumi (Yamagata University) | Kobayashi, Kei (The Institute of Statistical Mathematics) | Fukumizu, Kenji (The Institute of Statistical Mathematics) | Radovanovic, Milos (University of Novi Sad)

Spatial centrality, whereby samples closer to the center of a dataset tend to be closer to all other samples, is regarded as one source of hubness. Hubness is well known to degrade k-nearest-neighbor (k-NN) classification. Spatial centrality can be removed by centering, i.e., shifting the origin to the global center of the dataset, in cases where inner product similarity is used. However, when Euclidean distance is used, centering has no effect on spatial centrality because the distance between the samples is the same before and after centering. As described in this paper, we propose a solution for the hubness problem when Euclidean distance is considered. We provide a theoretical explanation to demonstrate how the solution eliminates spatial centrality and reduces hubness. We then present some discussion of the reason the proposed solution works, from a viewpoint of density gradient, which is regarded as the origin of spatial centrality and hubness. We demonstrate that the solution corresponds to flattening the density gradient. Using real-world datasets, we demonstrate that the proposed method improves k-NN classification performance and outperforms an existing hub-reduction method.

artificial intelligence, hubness, machine learning, (15 more...)

Thirtieth AAAI Conference on Artificial Intelligence

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre: Research Report (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

@machinelearnbotApr-18-2016, 00:30:20 GMT

Variable dimension data? • /r/MachineLearning

You could do K-nearest neighbor's interpolation to give the empty 0 values a "guess" to how they would look like to the nearest neighbors. How well this would work is really just based on the properties of the data. If dimension k can be predicted by some association with a dimension j, and this relationship with k and j is fairly strong throughout the data, then it's worth trying. If it's all over the place, this hack won't help at all, perhaps it would even make very unreliable predictions.

artificial intelligence, machine learning, variable dimension data, (2 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.75)