AITopics | Faloutsos, Christos

Collaborating Authors

Faloutsos, Christos

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GLEMOS: Benchmark for Instantaneous Graph Learning Model Selection

Park, Namyong, Rossi, Ryan, Wang, Xing, Simoulin, Antoine, Ahmed, Nesreen, Faloutsos, Christos

arXiv.org Artificial IntelligenceApr-1-2024

The choice of a graph learning (GL) model (i.e., a GL algorithm and its hyperparameter settings) has a significant impact on the performance of downstream tasks. However, selecting the right GL model becomes increasingly difficult and time consuming as more and more GL models are developed. Accordingly, it is of great significance and practical value to equip users of GL with the ability to perform a near-instantaneous selection of an effective GL model without manual intervention. Despite the recent attempts to tackle this important problem, there has been no comprehensive benchmark environment to evaluate the performance of GL model selection methods. To bridge this gap, we present GLEMOS in this work, a comprehensive benchmark for instantaneous GL model selection that makes the following contributions.

artificial intelligence, graph, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2404.01578

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry:

Health & Medicine (1.00)
Information Technology (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

NetInfoF Framework: Measuring and Exploiting Network Usable Information

Lee, Meng-Chieh, Yu, Haiyang, Zhang, Jian, Ioannidis, Vassilis N., Song, Xiang, Adeshina, Soji, Zheng, Da, Faloutsos, Christos

arXiv.org Artificial IntelligenceMar-20-2024

Given a node-attributed graph, and a graph task (link prediction or node classification), can we tell if a graph neural network (GNN) will perform well? More specifically, do the graph structure and the node features carry enough usable information for the task? Our goals are (1) to develop a fast tool to measure how much information is in the graph structure and in the node features, and (2) to exploit the information to solve the task, if there is enough. We propose NetInfoF, a framework including NetInfoF_Probe and NetInfoF_Act, for the measurement and the exploitation of network usable information (NUI), respectively. Given a graph data, NetInfoF_Probe measures NUI without any model training, and NetInfoF_Act solves link prediction and node classification, while two modules share the same backbone. In summary, NetInfoF has following notable advantages: (a) General, handling both link prediction and node classification; (b) Principled, with theoretical guarantee and closed-form solution; (c) Effective, thanks to the proposed adjustment to node similarity; (d) Scalable, scaling linearly with the input size. In our carefully designed synthetic datasets, NetInfoF correctly identifies the ground truth of NUI and is the only method being robust to all graph scenarios. Applied on real-world datasets, NetInfoF wins in 11 out of 12 times on link prediction compared to general GNN baselines.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2402.07999

Country: North America > United States > Texas (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Information Management > Search (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

McCatch: Scalable Microcluster Detection in Dimensional and Nondimensional Datasets

Vinces, Braulio V. Sánchez, Cordeiro, Robson L. F., Faloutsos, Christos

arXiv.org Artificial IntelligenceMar-12-2024

How could we have an outlier detector that works even with nondimensional data, and ranks together both singleton microclusters ('one-off' outliers) and nonsingleton microclusters by their anomaly scores? How to obtain scores that are principled in one scalable and 'hands-off' manner? Microclusters of outliers indicate coalition or repetition in fraud activities, etc.; their identification is thus highly desirable. This paper presents McCatch: a new algorithm that detects microclusters by leveraging our proposed 'Oracle' plot (1NN Distance versus Group 1NN Distance). We study 31 real and synthetic datasets with up to 1M data elements to show that McCatch is the only method that answers both of the questions above; and, it outperforms 11 other methods, especially when the data has nonsingleton microclusters or is nondimensional. We also showcase McCatch's ability to detect meaningful microclusters in graphs, fingerprints, logs of network connections, text data, and satellite imagery. For example, it found a 30-elements microcluster of confirmed 'Denial of Service' attacks in the network logs, taking only ~3 minutes for 222K data elements on a stock desktop.

atch, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2403.08027

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Automatic Question-Answer Generation for Long-Tail Knowledge

Kumar, Rohan, Kim, Youngmin, Ravi, Sunitha, Sun, Haitian, Faloutsos, Christos, Salakhutdinov, Ruslan, Yoon, Minji

arXiv.org Artificial IntelligenceMar-2-2024

Pretrained Large Language Models (LLMs) have gained significant attention for addressing open-domain Question Answering (QA). While they exhibit high accuracy in answering questions related to common knowledge, LLMs encounter difficulties in learning about uncommon long-tail knowledge (tail entities). Since manually constructing QA datasets demands substantial human resources, the types of existing QA datasets are limited, leaving us with a scarcity of datasets to study the performance of LLMs on tail entities. In this paper, we propose an automatic approach to generate specialized QA datasets for tail entities and present the associated research challenges. We conduct extensive experiments by employing pretrained LLMs on our newly generated long-tail QA datasets, comparing their performance with and without external resources including Wikipedia and Wikidata knowledge graphs.

large language model, natural language, question answering, (17 more...)

arXiv.org Artificial Intelligence

2403.01382

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > New York (0.14)

Genre: Research Report > New Finding (0.47)

Industry: Government (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

EBV: Electronic Bee-Veterinarian for Principled Mining and Forecasting of Honeybee Time Series

Hossain, Mst. Shamima, Faloutsos, Christos, Baer, Boris, Kim, Hyoseung, Tsotras, Vassilis J.

arXiv.org Artificial IntelligenceFeb-2-2024

Honeybees are vital for pollination and food production. Among many factors, extreme temperature (e.g., due to climate change) is particularly dangerous for bee health. Anticipating such extremities would allow beekeepers to take early preventive action. Thus, given sensor (temperature) time series data from beehives, how can we find patterns and do forecasting? Forecasting is crucial as it helps spot unexpected behavior and thus issue warnings to the beekeepers. In that case, what are the right models for forecasting? ARIMA, RNNs, or something else? We propose the EBV (Electronic Bee-Veterinarian) method, which has the following desirable properties: (i) principled: it is based on a) diffusion equations from physics and b) control theory for feedback-loop controllers; (ii) effective: it works well on multiple, real-world time sequences, (iii) explainable: it needs only a handful of parameters (e.g., bee strength) that beekeepers can easily understand and trust, and (iv) scalable: it performs linearly in time. We applied our method to multiple real-world time sequences, and found that it yields accurate forecasting (up to 49% improvement in RMSE compared to baselines), and segmentation. Specifically, discontinuities detected by EBV mostly coincide with domain expert's opinions, showcasing our approach's potential and practical feasibility. Moreover, EBV is scalable and fast, taking about 20 minutes on a stock laptop for reconstructing two months of sensor data.

artificial intelligence, inspection, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2402.01902

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.24)
North America > United States > California > Riverside County (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine (1.00)
Food & Agriculture > Agriculture (0.48)
Energy > Oil & Gas > Upstream (0.37)

Technology:

Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Which Examples to Annotate for In-Context Learning? Towards Effective and Efficient Selection

Mavromatis, Costas, Srinivasan, Balasubramaniam, Shen, Zhengyuan, Zhang, Jiani, Rangwala, Huzefa, Faloutsos, Christos, Karypis, George

arXiv.org Artificial IntelligenceOct-30-2023

Large Language Models (LLMs) can adapt to new tasks via in-context learning (ICL). ICL is efficient as it does not require any parameter updates to the trained LLM, but only few annotated examples as input for the LLM. In this work, we investigate an active learning approach for ICL, where there is a limited budget for annotating examples. We propose a model-adaptive optimization-free algorithm, termed AdaICL, which identifies examples that the model is uncertain about, and performs semantic diversity-based example selection. Diversity-based sampling improves overall effectiveness, while uncertainty sampling improves budget efficiency and helps the LLM learn new information. Moreover, AdaICL poses its sampling strategy as a Maximum Coverage problem, that dynamically adapts based on the model's feedback and can be approximately solved via greedy algorithms. Extensive experiments on nine datasets and seven LLMs show that AdaICL improves performance by 4.4% accuracy points over SOTA (7.7% relative improvement), is up to 3x more budget-efficient than performing annotations uniformly at random, while it outperforms SOTA with 2x fewer ICL examples.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2310.20046

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Mixed-Type Tabular Data Synthesis with Score-based Diffusion in Latent Space

Zhang, Hengrui, Zhang, Jiani, Srinivasan, Balasubramaniam, Shen, Zhengyuan, Qin, Xiao, Faloutsos, Christos, Rangwala, Huzefa, Karypis, George

arXiv.org Artificial IntelligenceOct-14-2023

Recent advances in tabular data generation have greatly enhanced synthetic data quality. However, extending diffusion models to tabular data is challenging due to the intricately varied distributions and a blend of data types of tabular data. This paper introduces TABSYN, a methodology that synthesizes tabular data by leveraging a diffusion model within a variational autoencoder (VAE) crafted latent space. The key advantages of the proposed TABSYN include (1) Generality: the ability to handle a broad spectrum of data types by converting them into a single unified space and explicitly capture inter-column relations; (2) Quality: optimizing the distribution of latent embeddings to enhance the subsequent training of diffusion models, which helps generate high-quality synthetic data, (3) Speed: much fewer number of reverse steps and faster synthesis speed than existing diffusion-based methods. Extensive experiments on six datasets with five metrics demonstrate that TABSYN outperforms existing methods. Specifically, it reduces the error rates by 86% and 67% for column-wise distribution and pair-wise column correlation estimations compared with the most competitive baselines.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2310.09656

Country: North America > United States > Illinois (0.14)

Genre: Research Report (0.81)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.66)

Add feedback

TouchUp-G: Improving Feature Representation through Graph-Centric Finetuning

Zhu, Jing, Song, Xiang, Ioannidis, Vassilis N., Koutra, Danai, Faloutsos, Christos

arXiv.org Artificial IntelligenceSep-25-2023

How can we enhance the node features acquired from Pretrained Models (PMs) to better suit downstream graph learning tasks? Graph Neural Networks (GNNs) have become the state-of-the-art approach for many high-impact, real-world graph applications. For feature-rich graphs, a prevalent practice involves utilizing a PM directly to generate features, without incorporating any domain adaptation techniques. Nevertheless, this practice is suboptimal because the node features extracted from PM are graph-agnostic and prevent GNNs from fully utilizing the potential correlations between the graph structure and node features, leading to a decline in GNNs performance. In this work, we seek to improve the node features obtained from a PM for downstream graph tasks and introduce TOUCHUP-G, which has several advantages. It is (a) General: applicable to any downstream graph task, including link prediction which is often employed in recommender systems; (b) Multi-modal: able to improve raw features of any modality (e.g. images, texts, audio); (c) Principled: it is closely related to a novel metric, feature homophily, which we propose to quantify the potential correlations between the graph structure and node features and we show that TOUCHUP-G can effectively shrink the discrepancy between the graph structure and node features; (d) Effective: achieving state-of-the-art results on four real-world datasets spanning different tasks and modalities.

machine learning, natural language, node feature, (17 more...)

arXiv.org Artificial Intelligence

2309.13885

Country: North America > United States > Michigan (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Discovery and Exploitation of Generalized Network Effects

Lee, Meng-Chieh, Shekhar, Shubhranshu, Yoo, Jaemin, Faloutsos, Christos

arXiv.org Artificial IntelligenceAug-28-2023

Given a large graph with few node labels, how can we (a) identify whether there is generalized network-effects (GNE) of the graph or not, (b) estimate GNE to explain the interrelations among node classes, and (c) exploit GNE to improve downstream tasks such as predicting the unknown labels accurately and efficiently? The knowledge of GNE is valuable for various tasks like node classification and targeted advertising. However, identifying and understanding GNE such as homophily, heterophily or their combination is challenging in real-world graphs due to limited availability of node labels and noisy edges. We propose NetEffect, a graph mining approach to address the above issues, enjoying the following properties: (i) Principled: a statistical test to determine the presence of GNE in a graph with few node labels; (ii) General and Explainable: a closed-form solution to estimate the specific type of GNE observed; and (iii) Accurate and Scalable: the integration of GNE for accurate and fast node classification. Applied on public, real-world graphs, NetEffect discovers the unexpected absence of GNE in numerous graphs, which previously thought to exhibit heterophily. Further, we show that incorporating GNE is effective on node classification. On a large real-world graph with 1.6M nodes and 22.3M edges, NetEffect achieves over 7 times speedup (14 minutes vs. 2 hours) compared to most competitors.

artificial intelligence, graph, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2301.0027

Country: North America > United States (0.68)

Genre: Research Report > Experimental Study (0.46)

Industry: Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Social Media (0.94)
Information Technology > Data Science (0.66)

Add feedback

Concept2Box: Joint Geometric Embeddings for Learning Two-View Knowledge Graphs

Huang, Zijie, Wang, Daheng, Huang, Binxuan, Zhang, Chenwei, Shang, Jingbo, Liang, Yan, Wang, Zhengyang, Li, Xian, Faloutsos, Christos, Sun, Yizhou, Wang, Wei

arXiv.org Artificial IntelligenceJul-4-2023

Knowledge graph embeddings (KGE) have been extensively studied to embed large-scale relational data for many real-world applications. Existing methods have long ignored the fact many KGs contain two fundamentally different views: high-level ontology-view concepts and fine-grained instance-view entities. They usually embed all nodes as vectors in one latent space. However, a single geometric representation fails to capture the structural differences between two views and lacks probabilistic semantics towards concepts' granularity. We propose Concept2Box, a novel approach that jointly embeds the two views of a KG using dual geometric representations. We model concepts with box embeddings, which learn the hierarchy structure and complex relations such as overlap and disjoint among them. Box volumes can be interpreted as concepts' granularity. Different from concepts, we model entities as vectors. To bridge the gap between concept box embeddings and entity vector embeddings, we propose a novel vector-to-box distance metric and learn both embeddings jointly. Experiments on both the public DBpedia KG and a newly-created industrial KG showed the effectiveness of Concept2Box.

artificial intelligence, concept2box, survey article, (19 more...)

arXiv.org Artificial Intelligence

2307.01933

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.72)

Add feedback