AITopics

arXiv.org Machine LearningAug-10-2009

Statistical ranking and combinatorial Hodge theory

Jiang, Xiaoye, Lim, Lek-Heng, Yao, Yuan, Ye, Yinyu

We propose a number of techniques for obtaining a global ranking from data that may be incomplete and imbalanced -- characteristics almost universal to modern datasets coming from e-commerce and internet applications. We are primarily interested in score or rating-based cardinal data. From raw ranking data, we construct pairwise rankings, represented as edge flows on an appropriate graph. Our statistical ranking method uses the graph Helmholtzian, the graph theoretic analogue of the Helmholtz operator or vector Laplacian, in much the same way the graph Laplacian is an analogue of the Laplace operator or scalar Laplacian. We study the graph Helmholtzian using combinatorial Hodge theory: we show that every edge flow representing pairwise ranking can be resolved into two orthogonal components, a gradient flow that represents the L2-optimal global ranking and a divergence-free flow (cyclic) that measures the validity of the global ranking obtained -- if this is large, then the data does not have a meaningful global ranking. This divergence-free flow can be further decomposed orthogonally into a curl flow (locally cyclic) and a harmonic flow (locally acyclic but globally cyclic); these provides information on whether inconsistency arises locally or globally. An obvious advantage over the NP-hard Kemeny optimization is that discrete Hodge decomposition may be computed via a linear least squares regression. We also investigated the L1-projection of edge flows, showing that this is dual to correlation maximization over bounded divergence-free flows, and the L1-approximate sparse cyclic ranking, showing that this is dual to correlation maximization over bounded curl-free flows. We discuss relations with Kemeny optimization, Borda count, and Kendall-Smith consistency index from social choice theory and statistics.

artificial intelligence, banking & finance, ranking, (20 more...)

arXiv.org Machine Learning

0811.1067

Country:

North America > United States > California > Santa Clara County (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report > New Finding (0.92)

Industry:

Banking & Finance (1.00)
Leisure & Entertainment (0.94)
Media > Film (0.68)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.34)

arXiv.org Machine LearningAug-9-2009

Discrete Temporal Models of Social Networks

Hanneke, Steve, Fu, Wenjie, Xing, Eric

We propose a family of statistical models for social network evolution over time, which represents an extension of Exponential Random Graph Models (ERGMs). Many of the methods for ERGMs are readily adapted for these models, including maximum likelihood estimation algorithms. We discuss models of this type and their properties, and give examples, as well as a demonstration of their use for hypothesis testing and classification. We believe our temporal ERG models represent a useful new framework for modeling time-evolving social networks, and rewiring networks from other domains such as gene regulation circuitry, and communication networks.

artificial intelligence, bayesian inference, statistics, (21 more...)

arXiv.org Machine Learning

0908.1258

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology > Services (0.82)
Government (0.68)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.87)

arXiv.org Artificial IntelligenceAug-7-2009

Node discovery problem for a social network

Maeno, Yoshiharu

Methods to solve a node discovery problem for a social network are presented. Covert nodes refer to the nodes which are not observable directly. They transmit the influence and affect the resulting collaborative activities among the persons in a social network, but do not appear in the surveillance logs which record the participants of the collaborative activities. Discovering the covert nodes is identifying the suspicious logs where the covert nodes would appear if the covert nodes became overt. The performance of the methods is demonstrated with a test dataset generated from computationally synthesized networks and a real organization.

law enforcement, node, public safety, (21 more...)

arXiv.org Artificial Intelligence

0710.4975

Country: North America > United States (0.46)

Industry:

Information Technology > Services (0.84)
Law Enforcement & Public Safety > Terrorism (0.69)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications > Social Media (0.94)
(2 more...)

arXiv.org Artificial IntelligenceJul-27-2009

Fact Sheet on Semantic Web

Sure, York

The report gives an overview about activities on the topic Semantic Web. It has been released as technical report for the project "KTweb -- Connecting Knowledge Technologies Communities" in 2003.

application, semantic web, semanticweb, (17 more...)

arXiv.org Artificial Intelligence

0907.4561

Country:

North America > United States (0.29)
Europe > Austria > Vienna (0.14)

Industry: Information Technology (0.48)

Technology:

Information Technology > Communications > Web > Semantic Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)

Practical Attacks Against Authorship Recognition Techniques

Brennan, Michael Robert (Drexel University) | Greenstadt, Rachel (Drexel University)

The use of statistical AI techniques in authorship recognition (or stylometry) has contributed to literary and historical breakthroughs. These successes have led to the use of these techniques in criminal investigations and prosecutions. However, few have studied adversarial attacks and their devastating effect on the robustness of existing classification methods. This paper presents a framework for adversarial attacks including obfuscation attacks, where a subject attempts to hide their identity imitation attacks, where a subject attempts to frame another subject by imitating their writing style. The major contribution of this research is that it demonstrates that both attacks work very well. The obfuscation attack reduces the effectiveness of the techniques to the level of random guessing and the imitation attack succeeds with 68-91% probability depending on the stylometric technique used. These results are made more significant by the fact that the experimental subjects were unfamiliar with stylometric techniques, without specialized knowledge in linguistics, and spent little time on the attacks. This paper also provides another significant contribution to the field in using human subjects to empirically validate the claim of high accuracy for current techniques (without attacks) by reproducing results for three representative stylometric methods.

stylometry, survey article, text processing, (18 more...)

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)

Real-time Automatic Price Prediction for eBay Online Trading

Raykhel, Ilya (Brigham Young University) | Ventura, Dan (Brigham Young University)

We develop a system for attribute-based prediction of final (online) auction pricing, focusing on the eBay laptop category. The system implements a feature-weighted k -NN algorithm, using evolutionary computation to determine feature weights, with prior trades used as training data. The resulting average prediction error is 16%. Mostly automatic trading using the system greatly reduces the time a reseller needs to spend on trading activities, since the bulk of market research is now done automatically with the help of the learned model. The result is a 562% increase in trading efficiency (measured as profit/hour).

artificial intelligence, banking & finance, laptop, (17 more...)

Industry:

Information Technology > Services (1.00)
Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Bidlack, Clint R. (ActivePrime Inc.)

Enabling Data Quality with Lightweight Ontologies

As the volume and interconnectedness of corporate data grows, data quality is becoming a business competency essential to success. Existing methods for managing data quality do not scale up to large volumes of data in a way that is directly manageable by the owner of the data. For the past two years a new breed of data quality products, built on applied AI techniques, are empowering non-technical users. Over 150 businesses are benefiting from these products including NASDAQ, Visa, Experian, Oracle, Fidelity, Bank of America, Volvo, Dell, Sabic, and Dassault Systems. The applied AI techniques described include lightweight ontologies to efficiently find inexact textual matches in large data sets.

information technology software, it software, ontology, (18 more...)

Country: North America > United States > California (0.14)

Industry: Information Technology > Software (0.68)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)

Trading Robustness for Privacy in Decentralized Recommender Systems

Cheng, Zunping (University College Dublin) | Hurley, Neil (University College Dublin)

Collaborative filtering (CF) recommender systems are very popular and successful in commercial application fields. One end-user concern is the privacy of the personal data required by such systems in order to make personalized recommendations. Recently, peer-to-peer decentralized architectures have been proposed to address this privacy issue. On the other hand system managers must be concerned about system robustness. In particular, it has been shown that recommender systems are vulnerable to profile injection, although model-based CF algorithms show greater stability against malicious attacks that have been studied in the state-of-the-art. In this paper we generalize the generic model for decentralized recommendation and discuss the trade-off between robustness and privacy. In this context, we argue that exposing knowledge of the model parameters allows new, highly effective, model-based attack strategies to be considered. We conclude that the security concerns of privacy and robustness stand in opposition to each other and are difficult to satisfy simultaneously.

algorithm, artificial intelligence, mobasher, (17 more...)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)

Automating Art Print Authentication Using Metric Learning

Parker, Charles Lincoln (Eastman Kodak Company) | Messier, Paul (Paul Messier, LLC)

An important problem in the world of art historians is determining the type of paper on which a photograph is printed. One way to determine the paper type is to capture a highly magnified image of the paper, then to compare this image to a database of known paper images. Traditionally, this process is carried out by a human and is generally time-intensive. Here we propose an automated solution to this problem, using wavelet decomposition techniques from image processing, as well as metric learning from the machine learning area. We show, on a collection of real-world images of photographic paper, that the use of machine learning techniques produces a much better solution than image processing alone.

artificial intelligence, machine learning, query, (15 more...)

Country: North America > United States (0.46)

Industry: Information Technology > Security & Privacy (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)