AITopics

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.32)

AAAI ConferencesMar-6-2015

Mining User Intents in Twitter: A Semi-Supervised Approach to Inferring Intent Categories for Tweets

Wang, Jinpeng (Peking University) | Cong, Gao (Nanyang Technological University) | Zhao, Xin Wayne (Renmin University of China) | Li, Xiaoming (Peking University)

In this paper, we propose to study the problem of identifying and classifying tweets into intent categories. For example, a tweet “I wanna buy a new car” indicates the user’s intent for buying a car. Identifying such intent tweets will have great commercial value among others. In particular, it is important that we can distinguish different types of intent tweets. We propose to classify intent tweets into six categories, namely Food & Drink, Travel, Career & Education, Goods & Services, Event and Activities and Trifle. We propose a semisupervised learning approach to categorizing intent tweets into the six categories.We construct a test collection by using a bootstrap method. Our experimental results show that our approach is effective in inferring intent categories for tweets.

information retrieval, machine learning, natural language, (21 more...)

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: Asia (0.15)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Services (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.67)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.46)

AAAI ConferencesMar-6-2015

Kernel Density Estimation for Text-Based Geolocation

Hulden, Mans (University of Colorado Boulder) | Silfverberg, Miikka (University of Helsinki) | Francom, Jerid (Wake Forest University)

Text-based geolocation classifiers often operate with a grid-based view of the world. Predicting document location of origin based on text content on a geodesic grid is computationally attractive since many standard methods for supervised document classification carry over unchanged to geolocation in the form of predicting a most probable grid cell for a document. However, the grid-based approach suffers from sparse data problems if one wants to improve classification accuracy by moving to smaller cell sizes. In this paper we investigate an enhancement of common methods for determining the geographic point of origin of a text document by kernel density estimation. For geolocation of tweets we obtain a improvements upon non-kernel methods on datasets of U.S. and global Twitter content.

machine learning, natural language, tweet, (16 more...)

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: North America > United States > Colorado (0.14)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Waegeman, Willem, Dembczynski, Krzysztof, Jachnik, Arkadiusz, Cheng, Weiwei, Hullermeier, Eyke

On the Bayes-optimality of F-measure maximizers

arXiv.org Machine LearningMar-6-2015

The F-measure, which has originally been introduced in information retrieval, is nowadays routinely used as a performance metric for problems such as binary classification, multi-label classification, and structured output prediction. Optimizing this measure is a statistically and computationally challenging problem, since no closed-form solution exists. Adopting a decision-theoretic perspective, this article provides a formal and experimental analysis of different approaches for maximizing the F-measure. We start with a Bayes-risk analysis of related loss functions, such as Hamming loss and subset zero-one loss, showing that optimizing such losses as a surrogate of the F-measure leads to a high worst-case regret. Subsequently, we perform a similar type of analysis for F-measure maximizing algorithms, showing that such algorithms are approximate, while relying on additional assumptions regarding the statistical distribution of the binary response variables. Furthermore, we present a new algorithm which is not only computationally efficient but also Bayes-optimal, regardless of the underlying distribution. To this end, the algorithm requires only a quadratic (with respect to the number of binary responses) number of parameters of the joint distribution. We illustrate the practical performance of all analyzed methods by means of experiments with multi-label classification problems.

information retrieval, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

1310.4849

Country:

Europe (0.92)
North America > United States (0.27)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.68)
(3 more...)

arXiv.org Machine LearningMar-3-2015

The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification

Kim, Been, Rudin, Cynthia, Shah, Julie

We present the Bayesian Case Model (BCM), a general framework for Bayesian case-based reasoning (CBR) and prototype classification and clustering. BCM brings the intuitive power of CBR to a Bayesian generative framework. The BCM learns prototypes, the "quintessential" observations that best represent clusters in a dataset, by performing joint inference on cluster labels, prototypes and important features. Simultaneously, BCM pursues sparsity by learning subspaces, the sets of features that play important roles in the characterization of the prototypes. The prototype and subspace representation provides quantitative benefits in interpretability while preserving classification accuracy. Human subject experiments verify statistically significant improvements to participants' understanding when using explanations produced by BCM, compared to those given by prior art.

artificial intelligence, machine learning, prototype, (18 more...)

arXiv.org Machine Learning

1503.01161

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Film (0.69)
Leisure & Entertainment (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

A Noise Scaled Semi Parametric Gaussian Process Model for Real Time Water Network Leak Detection in the Presence of Heteroscedasticity

Malik, Obaid (University of Southampton) | Ghosh, Siddhartha (University of Southampton) | Rogers, Alex (University of Southampton)

The timely detection of leaks in water distribution systems is critical to the sustainable provision of clean water to consumers. Increasingly, water companies are deploying remote sensors to measure water flow in real-time in order to detect such leaks. However, in practice, for typical District Metering Zones (DMZ), financial constraints limit the number of deployable real time flow sensors/meters to one or two, thus constraining leak detection to be based on the aggregated flow being monitored at these point. Such aggregated flow data typically exhibits input signal dependence whereby both noise and leaks are dependent on the flow being measured. This limited monitoring and input signal dependance make conventional approaches based on simple thresholds unreliable for real time leak detection. To address this, we propose a Gaussian process (GP) model with an additive diagonal noise covariance that is able to handle the input dependant noise observed in this setting. A parameterised mean step change function is used to detect leaks and to estimate their size. Using prior water distribution systems (WDS) knowledge we dynamically bound and discretize the detection parameters of the step change mean function, reducing and pruning the parameter search space considerably. We evaluate the proposed noise scaled GP (NSGP) against both the latest researchwork on GP based fault detection methods and the current state of the art and applied leak detection approaches in water distribution systems. We show that our proposed method outperforms other approaches, on real water network data with synthetically generatedvtime varying leaks, with a detection accuracy of 99%, almost zero false positive detections and the lowest root mean squared error in leak magnitude estimation (0.065 l/s).

leak, survey article, upstream oil & gas, (20 more...)

Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: Europe > United Kingdom (0.14)

Industry:

Water & Waste Management > Water Management (1.00)
Energy > Oil & Gas > Upstream (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.55)

Discovering Hotspots and Coldspots of Species Richness in eBird Data

Moore, Travis (Oregon State University) | Wong, Weng-Keen (Oregon State University)

Quantifying biodiversity is an important task related to ecological research. One way to measure biodiversity is through species richness, which measures the number of unique species found in an area. Recently, citizen science biodiversity datasets such as eBird allow the calculation of species richness over an unprecedented spatial and temporal extent. However, several confounding factors associated with the unstructured observation process, such as observer effort, affect the number of species reported by citizen scientists. In this work, we develop an algorithm for discovering hotspots and coldspots of species richness using eBird data while accounting for these confounding factors.

artificial intelligence, checklist, machine learning, (14 more...)

Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Oregon (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Bari, Nima, Vichr, Roman, Kowsari, Kamran, Berkovich, Simon Y.

Novel Metaknowledge-based Processing Technique for Multimedia Big Data clustering challenges

arXiv.org Artificial IntelligenceMar-1-2015

Past research has challenged us with the task of showing relational patterns between text-based data and then clustering for predictive analysis using Golay Code technique. We focus on a novel approach to extract metaknowledge in multimedia datasets. Our collaboration has been an on-going task of studying the relational patterns between datapoints based on metafeatures extracted from metaknowledge in multimedia datasets. Those selected are significant to suit the mining technique we applied, Golay Code algorithm. In this research paper we summarize findings in optimization of metaknowledge representation for 23-bit representation of structured and unstructured multimedia data in order to

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/BigMM.2015.78

1503.00245

Country:

North America > United States > District of Columbia > Washington (0.05)
North America > United States > Virginia > Fairfax County > Fairfax (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.75)

What Predicts Media Coverage of Health Science Articles?

Wallace, Byron C. (University of Texas at Austin) | Paul, Michael J. (Johns Hopkins University) | Elhadad, Noémie (Columbia University)

An important aspect of health science is communicating research findings to the public. The media is a critical instrument in disseminating research. Yet the process by which a scientific article becomes “newsworthy” is not well understood. In this study, we use large-scale text analysis to characterize the content features of articles that are predictive of newsworthiness. We experiment with two novel corpora: (i) 28,910 articles from a di- verse range of biomedical and health journals, of which 1,343 were covered by the news agency Reuters, and (ii) 10,760 articles from the JAMA journals, of which 846 were given press releases by the journal editors. We show that media coverage can be predicted reasonably well: logistic regression achieves mean AUCs of 0.783 and 0.882 on the Reuters and JAMA datasets, respec- tively. We present and discuss interesting findings con- cerning the most predictive content features.

corpus, dataset, press release, (13 more...)

Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Media > News (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Privacy-Utility Trade-Off for Time-Series with Application to Smart-Meter Data

Erdogdu, Murat A. (Stanford University) | Fawaz, Nadia (Technicolor) | Montanari, Andrea (Stanford University)

We consider the online setting where a user would like to continuously release a time-series of data that is correlated with his private data, to a service provider in the hope of deriving some utility. Due to correlations, the continual observation of the released time-series puts the user at risk of inference of his private data by an adversary. To protect the user from inference attacks on his private data, the time-series is randomized prior to its release according to a probabilistic privacy mapping. The privacy mapping should be designed in a way that balances privacy and utility requirements over time.Our contributions are threefold. First, we formalize the framework for the design of utility-aware privacy mappings for time-series data, under both online and batch models. We provide a sequential scheme that allows to design online privacy mappings at scale, that account for privacy risk from the history of released data and future releases to come. Second, we prove the equivalence of the optimal mappings under the batch and the online models, in the case where the time-series samples are independent across time. We further show that there exists a gap between optimal batch and online privacy mappings when certain conditions are not satisfied.Finally, we evaluate the performance of the framework over synthetic and real-world time-series data. In particular, we show that smart-meter data can be randomized for privacy purposes to prevent disaggregation of per-device energy consumption, while preserving the utility.

inference, privacy mapping, time-series data, (15 more...)

Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Santa Clara County > Los Altos (0.04)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Energy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)