AITopics | Faloutsos, Christos

Collaborating Authors

Faloutsos, Christos

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Real-Time Streaming Anomaly Detection in Dynamic Graphs

Bhatia, Siddharth, Liu, Rui, Hooi, Bryan, Yoon, Minji, Shin, Kijung, Faloutsos, Christos

arXiv.org Machine LearningSep-17-2020

Given a stream of graph edges from a dynamic graph, how can we assign anomaly scores to edges in an online manner, for the purpose of detecting unusual behavior, using constant time and memory? Existing approaches aim to detect individually surprising edges. In this work, we propose MIDAS, which focuses on detecting microcluster anomalies, or suddenly arriving groups of suspiciously similar edges, such as lockstep behavior, including denial of service attacks in network traffic data. We further propose MIDAS-F, to solve the problem by which anomalies are incorporated into the algorithm's internal states, creating a 'poisoning' effect which can allow future anomalies to slip through undetected. MIDAS-F introduces two modifications: 1) We modify the anomaly scoring function, aiming to reduce the 'poisoning' effect of newly arriving edges; 2) We introduce a conditional merge step, which updates the algorithm's data structures after each time tick, but only if the anomaly score is below a threshold value, also to reduce the `poisoning' effect. Experiments show that MIDAS-F has significantly higher accuracy than MIDAS. MIDAS has the following properties: (a) it detects microcluster anomalies while providing theoretical guarantees about its false positive probability; (b) it is online, thus processing each edge in constant time and constant memory, and also processes the data 130 to 929 times faster than state-of-the-art approaches; (c) it provides 41% to 55% higher accuracy (in terms of ROC-AUC) than state-of-the-art approaches.

law enforcement, public safety, time tick, (20 more...)

arXiv.org Machine Learning

2009.08452

Country:

North America > United States (0.69)
Asia > Middle East > Palestine (0.14)

Genre:

Research Report (1.00)
Overview (0.86)

Industry:

Law Enforcement & Public Safety (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.90)

Add feedback

AutoKnow: Self-Driving Knowledge Collection for Products of Thousands of Types

Dong, Xin Luna, He, Xiang, Kan, Andrey, Li, Xian, Liang, Yan, Ma, Jun, Xu, Yifan Ethan, Zhang, Chenwei, Zhao, Tong, Saldana, Gabriel Blanco, Deshpande, Saurabh, Manduca, Alexandre Michetti, Ren, Jay, Singh, Surender Pal, Xiao, Fan, Chang, Haw-Shiuan, Karamanolakis, Giannis, Mao, Yuning, Wang, Yaqing, Faloutsos, Christos, McCallum, Andrew, Han, Jiawei

arXiv.org Artificial IntelligenceJun-24-2020

Can one build a knowledge graph (KG) for all products in the world? Knowledge graphs have firmly established themselves as valuable sources of information for search and question answering, and it is natural to wonder if a KG can contain information about products offered at online retail sites. There have been several successful examples of generic KGs, but organizing information about products poses many additional challenges, including sparsity and noise of structured data for products, complexity of the domain with millions of product types and thousands of attributes, heterogeneity across large number of categories, as well as large and constantly growing number of products. We describe AutoKnow, our automatic (self-driving) system that addresses these challenges. The system includes a suite of novel techniques for taxonomy construction, product property identification, knowledge extraction, anomaly detection, and synonym discovery. AutoKnow is (a) automatic, requiring little human intervention, (b) multi-scalable, scalable in multiple dimensions (many domains, many products, and many attributes), and (c) integrative, exploiting rich customer behavior logs. AutoKnow has been operational in collecting product knowledge for over 11K product types.

artificial intelligence, product type, text processing, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3394486.3403323

2006.13473

Country: North America > United States (1.00)

Genre: Research Report > Promising Solution (0.34)

Industry:

Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.93)
Retail (0.88)

Technology:

Information Technology > Knowledge Management > Knowledge Engineering (1.00)
Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.70)
(4 more...)

Add feedback

MultiImport: Inferring Node Importance in a Knowledge Graph from Multiple Input Signals

Park, Namyong, Kan, Andrey, Dong, Xin Luna, Zhao, Tong, Faloutsos, Christos

arXiv.org Machine LearningJun-22-2020

Given multiple input signals, how can we infer node importance in a knowledge graph (KG)? Node importance estimation is a crucial and challenging task that can benefit a lot of applications including recommendation, search, and query disambiguation. A key challenge towards this goal is how to effectively use input from different sources. On the one hand, a KG is a rich source of information, with multiple types of nodes and edges. On the other hand, there are external input signals, such as the number of votes or pageviews, which can directly tell us about the importance of entities in a KG. While several methods have been developed to tackle this problem, their use of these external signals has been limited as they are not designed to consider multiple signals simultaneously. In this paper, we develop an end-to-end model MultiImport, which infers latent node importance from multiple, potentially overlapping, input signals. MultiImport is a latent variable model that captures the relation between node importance and input signals, and effectively learns from multiple signals with potential conflicts. Also, MultiImport provides an effective estimator based on attentive graph neural networks. We ran experiments on real-world KGs to show that MultiImport handles several challenges involved with inferring node importance from multiple input signals, and consistently outperforms existing methods, achieving up to 23.7% higher NDCG@100 than the state-of-the-art method.

artificial intelligence, neural network, node importance, (16 more...)

arXiv.org Machine Learning

doi: 10.1145/3394486.3403093

2006.12001

Country: North America > United States (0.68)

Genre: Research Report (0.84)

Industry: Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.62)

Add feedback

Estimating Node Importance in Knowledge Graphs Using Graph Neural Networks

Park, Namyong, Kan, Andrey, Dong, Xin Luna, Zhao, Tong, Faloutsos, Christos

arXiv.org Machine LearningJun-16-2019

How can we estimate the importance of nodes in a knowledge graph (KG)? A KG is a multi-relational graph that has proven valuable for many tasks including question answering and semantic search. In this paper, we present GENI, a method for tackling the problem of estimating node importance in KGs, which enables several downstream applications such as item recommendation and resource allocation. While a number of approaches have been developed to address this problem for general graphs, they do not fully utilize information available in KGs, or lack flexibility needed to model complex relationship between entities and their importance. To address these limitations, we explore supervised machine learning algorithms. In particular, building upon recent advancement of graph neural networks (GNNs), we develop GENI, a GNN-based method designed to deal with distinctive challenges involved with predicting node importance in KGs. Our method performs an aggregation of importance scores instead of aggregating node embeddings via predicate-aware attention mechanism and flexible centrality adjustment. In our evaluation of GENI and existing methods on predicting node importance in real-world KGs with different characteristics, GENI achieves 5-17% higher NDCG@100 than the state of the art.

importance score, neural network, survey article, (18 more...)

arXiv.org Machine Learning

doi: 10.1145/3292500.3330855

1905.08865

Country: North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report (0.50)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

LinkNBed: Multi-Graph Representation Learning with Entity Linkage

Trivedi, Rakshit, Sisman, Bunyamin, Ma, Jun, Faloutsos, Christos, Zha, Hongyuan, Dong, Xin Luna

arXiv.org Artificial IntelligenceJul-23-2018

Knowledge graphs have emerged as an important model for studying complex multi-relational data. This has given rise to the construction of numerous large scale but incomplete knowledge graphs encoding information extracted from various resources. An effective and scalable approach to jointly learn over multiple graphs and eventually construct a unified graph is a crucial next step for the success of knowledge-based inference for many downstream applications. To this end, we propose LinkNBed, a deep relational learning framework that learns entity and relationship representations across multiple graphs. We identify entity linkage across graphs as a vital component to achieve our goal. We design a novel objective that leverage entity linkage and build an efficient multi-task training procedure. Experiments on link prediction and entity linkage demonstrate substantial improvements over the state-of-the-art relational learning approaches.

deep learning, graph, neural network, (22 more...)

arXiv.org Artificial Intelligence

1807.08447

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

FairJudge: Trustworthy User Prediction in Rating Platforms

Kumar, Srijan, Hooi, Bryan, Makhija, Disha, Kumar, Mohit, Faloutsos, Christos, Subrahamanian, V. S.

arXiv.org Machine LearningMar-30-2017

Rating platforms enable large-scale collection of user opinion about items (products, other users, etc.). However, many untrustworthy users give fraudulent ratings for excessive monetary gains. In the paper, we present FairJudge, a system to identify such fraudulent users. We propose three metrics: (i) the fairness of a user that quantifies how trustworthy the user is in rating the products, (ii) the reliability of a rating that measures how reliable the rating is, and (iii) the goodness of a product that measures the quality of the product. Intuitively, a user is fair if it provides reliable ratings that are close to the goodness of the product. We formulate a mutually recursive definition of these metrics, and further address cold start problems and incorporate behavioral properties of users and products in the formulation. We propose an iterative algorithm, FairJudge, to predict the values of the three metrics. We prove that FairJudge is guaranteed to converge in a bounded number of iterations, with linear time complexity. By conducting five different experiments on five rating platforms, we show that FairJudge significantly outperforms nine existing algorithms in predicting fair and unfair users. We reported the 100 most unfair users in the Flipkart network to their review fraud investigators, and 80 users were correctly identified (80% accuracy). The FairJudge algorithm is already being deployed at Flipkart.

algorithm, banking & finance, bayesian inference, (19 more...)

arXiv.org Machine Learning

1703.10545

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Social Media (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Tensor Decomposition for Signal Processing and Machine Learning

Sidiropoulos, Nicholas D., De Lathauwer, Lieven, Fu, Xiao, Huang, Kejun, Papalexakis, Evangelos E., Faloutsos, Christos

arXiv.org Machine LearningDec-14-2016

Tensors or {\em multi-way arrays} are functions of three or more indices $(i,j,k,\cdots)$ -- similar to matrices (two-way arrays), which are functions of two indices $(r,c)$ for (row,column). Tensors have a rich history, stretching over almost a century, and touching upon numerous disciplines; but they have only recently become ubiquitous in signal and data analytics at the confluence of signal processing, statistics, data mining and machine learning. This overview article aims to provide a good starting point for researchers and practitioners interested in learning about and working with tensors. As such, it focuses on fundamentals and motivation (using various application examples), aiming to strike an appropriate balance of breadth {\em and depth} that will enable someone having taken first graduate courses in matrix algebra and probability to get started doing research and/or developing tensor algorithms and software. Some background in applied optimization is useful but not strictly required. The material covered includes tensor rank and rank decomposition; basic tensor factorization models and their relationships and properties (including fairly good coverage of identifiability); broad coverage of algorithms ranging from alternating optimization to stochastic gradient; statistical performance analysis; and applications ranging from source separation to collaborative filtering, mixture and topic modeling, classification, and multilinear subspace learning.

decomposition, educational setting, survey article, (18 more...)

arXiv.org Machine Learning

doi: 10.1109/TSP.2017.2690524

1607.01668

Country:

Europe (1.00)
Asia > China (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Overview (1.00)
Instructional Material > Course Syllabus & Notes (0.47)

Industry:

Education (0.67)
Government > Regional Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

BIRDNEST: Bayesian Inference for Ratings-Fraud Detection

Hooi, Bryan, Shah, Neil, Beutel, Alex, Gunnemann, Stephan, Akoglu, Leman, Kumar, Mohit, Makhija, Disha, Faloutsos, Christos

arXiv.org Artificial IntelligenceMar-7-2016

Review fraud is a pervasive problem in online commerce, in which fraudulent sellers write or purchase fake reviews to manipulate perception of their products and services. Fake reviews are often detected based on several signs, including 1) they occur in short bursts of time; 2) fraudulent user accounts have skewed rating distributions. However, these may both be true in any given dataset. Hence, in this paper, we propose an approach for detecting fraudulent reviews which combines these 2 approaches in a principled manner, allowing successful detection even when one of these signs is not present. To combine these 2 approaches, we formulate our Bayesian Inference for Rating Data (BIRD) model, a flexible Bayesian model of user rating behavior. Based on our model we formulate a likelihood-based suspiciousness metric, Normalized Expected Surprise Total (NEST). We propose a linear-time algorithm for performing Bayesian inference using our model and computing the metric. Experiments on real data show that BIRDNEST successfully spots review fraud in large, real-world graphs: the 50 most suspicious users of the Flipkart platform flagged by our algorithm were investigated and all identified as fraudulent by domain experts at Flipkart.

law enforcement, posterior distribution, public safety, (22 more...)

arXiv.org Artificial Intelligence

1511.0603

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Industry: Law Enforcement & Public Safety > Fraud (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

TribeFlow: Mining & Predicting User Trajectories

Figueiredo, Flavio, Ribeiro, Bruno, Almeida, Jussara, Faloutsos, Christos

arXiv.org Machine LearningFeb-19-2016

Which song will Smith listen to next? Which restaurant will Alice go to tomorrow? Which product will John click next? These applications have in common the prediction of user trajectories that are in a constant state of flux over a hidden network (e.g. website links, geographic location). What users are doing now may be unrelated to what they will be doing in an hour from now. Mindful of these challenges we propose TribeFlow, a method designed to cope with the complex challenges of learning personalized predictive models of non-stationary, transient, and time-heterogeneous user trajectories. TribeFlow is a general method that can perform next product recommendation, next song recommendation, next location prediction, and general arbitrary-length user trajectory prediction without domain-specific knowledge. TribeFlow is more accurate and up to 413x faster than top competitors.

artificial intelligence, bayesian inference, tribeflow, (21 more...)

arXiv.org Machine Learning

1511.01032

Country:

North America > United States (1.00)
North America > Canada > Quebec (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Information Technology (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(5 more...)

Add feedback

OMNI-Prop: Seamless Node Classification on Arbitrary Label Correlation

Yamaguchi, Yuto (University of Tsukuba) | Faloutsos, Christos (Carnegie Mellon University) | Kitagawa, Hiroyuki (University of Tsukuba)

AAAI ConferencesMar-6-2015

If we know most of Smith’s friends are from Boston, what can we say about the rest of Smith’s friends? In this paper, we focus on the node classification problem on networks, which is one of the most important topics in AI and Web communities. Our proposed algorithm which is referred to as OMNIProp has the following properties: (a) seamless and accurate; it works well on any label correlations (i.e., homophily, heterophily, and mixture of them) (b) fast; it is efficient and guaranteed to converge on arbitrary graphs (c) quasi-parameter free; it has just one well-interpretable parameter with heuristic default value of 1. We also prove the theoretical connections of our algorithm to the semi-supervised learning (SSL) algorithms and to random-walks. Experiments on four real, different network datasets demonstrate the benefits of the proposed algorithm, where OMNI-Prop outperforms the top competitors.

artificial intelligence, node, us government, (20 more...)

AAAI Conferences

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: North America > United States (0.46)

Genre: Research Report (0.93)

Industry: Government > Regional Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback