AITopics | Tung, Anthony K. H.

Collaborating Authors

Tung, Anthony K. H.

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A General Framework for Producing Interpretable Semantic Text Embeddings

Sun, Yiqun, Huang, Qiang, Tang, Yixuan, Tung, Anthony K. H., Yu, Jun

arXiv.org Artificial IntelligenceOct-4-2024

Semantic text embedding is essential to many tasks in Natural Language Processing (NLP). While black-box models are capable of generating high-quality embeddings, their lack of interpretability limits their use in tasks that demand transparency. Recent approaches have improved interpretability by leveraging domain-expert-crafted or LLM-generated questions, but these methods rely heavily on expert input or well-prompt design, which restricts their generalizability and ability to generate discriminative questions across a wide range of tasks. To address these challenges, we introduce CQG-MBQA (Contrastive Question Generation - Multi-task Binary Question Answering), a general framework for producing interpretable semantic text embeddings across diverse tasks. Our framework systematically generates highly discriminative, low cognitive load yes/no questions through the CQG method and answers them efficiently with the MBQA model, resulting in interpretable embeddings in a cost-effective manner. We validate the effectiveness and interpretability of CQG-MBQA through extensive experiments and ablation studies, demonstrating that it delivers embedding quality comparable to many advanced black-box models while maintaining inherently interpretability. Additionally, CQG-MBQA outperforms other interpretable text embedding methods across various downstream tasks. Text embedding is a cornerstone of Natural Language Processing (NLP), transforming texts--whether sentences, paragraphs, or full documents--into embedding vectors that capture their semantic meaning. In semantic embedding spaces, the similarity between texts is represented by the proximity of their embedding vectors, typically measured using distance measures like Euclidean distance, cosine distance, or inner product. Black-box text embedding methods, such as Sentence-BERT (Reimers & Gurevych, 2019), SimCSE (Gao et al., 2021), WhitenedCSE (Zhuo et al., 2023), and AnglE (Li & Li, 2024), excel at generating high-quality embeddings by training on vast amounts of data. These models are highly effective at capturing semantic similarities, making them indispensable for a variety of NLP tasks (Muennighoff et al., 2023). However, their black-box nature leaves the embeddings opaque to human users.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.03435

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.46)
Health & Medicine > Consumer Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Towards Controllable Time Series Generation

Bao, Yifan, Ang, Yihao, Huang, Qiang, Tung, Anthony K. H., Huang, Zhiyong

arXiv.org Artificial IntelligenceMar-6-2024

Time Series Generation (TSG) has emerged as a pivotal technique in synthesizing data that accurately mirrors real-world time series, becoming indispensable in numerous applications. Despite significant advancements in TSG, its efficacy frequently hinges on having large training datasets. This dependency presents a substantial challenge in data-scarce scenarios, especially when dealing with rare or unique conditions. To confront these challenges, we explore a new problem of Controllable Time Series Generation (CTSG), aiming to produce synthetic time series that can adapt to various external conditions, thereby tackling the data scarcity issue. In this paper, we propose \textbf{C}ontrollable \textbf{T}ime \textbf{S}eries (\textsf{CTS}), an innovative VAE-agnostic framework tailored for CTSG. A key feature of \textsf{CTS} is that it decouples the mapping process from standard VAE training, enabling precise learning of a complex interplay between latent features and external conditions. Moreover, we develop a comprehensive evaluation scheme for CTSG. Extensive experiments across three real-world time series datasets showcase \textsf{CTS}'s exceptional capabilities in generating high-quality, controllable outputs. This underscores its adeptness in seamlessly integrating latent features with external conditions. Extending \textsf{CTS} to the image domain highlights its remarkable potential for explainability and further reinforces its versatility across different modalities.

artificial intelligence, machine learning, time sery, (16 more...)

arXiv.org Artificial Intelligence

2403.03698

Country: Asia > China > Guangdong Province (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology (0.67)
Health & Medicine (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

TSGBench: Time Series Generation Benchmark

Ang, Yihao, Huang, Qiang, Bao, Yifan, Tung, Anthony K. H., Huang, Zhiyong

arXiv.org Artificial IntelligenceDec-7-2023

Synthetic Time Series Generation (TSG) is crucial in a range of applications, including data augmentation, anomaly detection, and privacy preservation. Although significant strides have been made in this field, existing methods exhibit three key limitations: (1) They often benchmark against similar model types, constraining a holistic view of performance capabilities. (2) The use of specialized synthetic and private datasets introduces biases and hampers generalizability. (3) Ambiguous evaluation measures, often tied to custom networks or downstream tasks, hinder consistent and fair comparison. To overcome these limitations, we introduce \textsf{TSGBench}, the inaugural Time Series Generation Benchmark, designed for a unified and comprehensive assessment of TSG methods. It comprises three modules: (1) a curated collection of publicly available, real-world datasets tailored for TSG, together with a standardized preprocessing pipeline; (2) a comprehensive evaluation measures suite including vanilla measures, new distance-based assessments, and visualization tools; (3) a pioneering generalization test rooted in Domain Adaptation (DA), compatible with all methods. We have conducted comprehensive experiments using \textsf{TSGBench} across a spectrum of ten real-world datasets from diverse domains, utilizing ten advanced TSG methods and twelve evaluation measures. The results highlight the reliability and efficacy of \textsf{TSGBench} in evaluating TSG methods. Crucially, \textsf{TSGBench} delivers a statistical analysis of the performance rankings of these methods, illuminating their varying performance across different datasets and measures and offering nuanced insights into the effectiveness of each method.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2309.03755

Country: Asia > China > Guangdong Province (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology (0.46)
Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.69)

Add feedback

From Zero to Hero: Detecting Leaked Data through Synthetic Data Injection and Model Querying

Wu, Biao, Huang, Qiang, Tung, Anthony K. H.

arXiv.org Artificial IntelligenceOct-6-2023

Safeguarding the Intellectual Property (IP) of data has become critically important as machine learning applications continue to proliferate, and their success heavily relies on the quality of training data. While various mechanisms exist to secure data during storage, transmission, and consumption, fewer studies have been developed to detect whether they are already leaked for model training without authorization. This issue is particularly challenging due to the absence of information and control over the training process conducted by potential attackers. In this paper, we concentrate on the domain of tabular data and introduce a novel methodology, Local Distribution Shifting Synthesis (\textsc{LDSS}), to detect leaked data that are used to train classification models. The core concept behind \textsc{LDSS} involves injecting a small volume of synthetic data--characterized by local shifts in class distribution--into the owner's dataset. This enables the effective identification of models trained on leaked data through model querying alone, as the synthetic data injection results in a pronounced disparity in the predictions of models trained on leaked and modified datasets. \textsc{LDSS} is \emph{model-oblivious} and hence compatible with a diverse range of classification models, such as Naive Bayes, Decision Tree, and Random Forest. We have conducted extensive experiments on seven types of classification models across five real-world datasets. The comprehensive results affirm the reliability, robustness, fidelity, security, and efficiency of \textsc{LDSS}.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2310.04145

Country: North America > United States (0.69)

Genre: Research Report > Promising Solution (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)

Add feedback

SAH: Shifting-aware Asymmetric Hashing for Reverse $k$-Maximum Inner Product Search

Huang, Qiang, Wang, Yanhao, Tung, Anthony K. H.

arXiv.org Artificial IntelligenceNov-23-2022

This paper investigates a new yet challenging problem called Reverse $k$-Maximum Inner Product Search (R$k$MIPS). Given a query (item) vector, a set of item vectors, and a set of user vectors, the problem of R$k$MIPS aims to find a set of user vectors whose inner products with the query vector are one of the $k$ largest among the query and item vectors. We propose the first subquadratic-time algorithm, i.e., Shifting-aware Asymmetric Hashing (SAH), to tackle the R$k$MIPS problem. To speed up the Maximum Inner Product Search (MIPS) on item vectors, we design a shifting-invariant asymmetric transformation and develop a novel sublinear-time Shifting-Aware Asymmetric Locality Sensitive Hashing (SA-ALSH) scheme. Furthermore, we devise a new blocking strategy based on the Cone-Tree to effectively prune user vectors (in a batch). We prove that SAH achieves a theoretical guarantee for solving the RMIPS problem. Experimental results on five real-world datasets show that SAH runs 4$\sim$8$\times$ faster than the state-of-the-art methods for R$k$MIPS while achieving F1-scores of over 90\%. The code is available at \url{https://github.com/HuangQiang/SAH}.

data mining, information retrieval, machine learning, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1609/aaai.v37i4.25550

2211.12751

Country:

Asia > Singapore (0.14)
Asia > China (0.14)
Europe > Belgium (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.93)
(2 more...)

Add feedback

Robust Federated Recommendation System

Chen, Chen, Zhang, Jingfeng, Tung, Anthony K. H., Kankanhalli, Mohan, Chen, Gang

arXiv.org Machine LearningJun-15-2020

Federated recommendation systems can provide good performance without collecting users' private data, making them attractive. However, they are susceptible to low-cost poisoning attacks that can degrade their performance. In this paper, we develop a novel federated recommendation technique that is robust against the poisoning attack where Byzantine clients prevail. We argue that the key to Byzantine detection is monitoring of gradients of the model parameters of clients. We then propose a robust learning strategy where instead of using model parameters, the central server computes and utilizes the gradients to filter out Byzantine clients. Theoretically, we justify our robust learning strategy by our proposed definition of Byzantine resilience. Empirically, we confirm the efficacy of our robust learning strategy employing four datasets in a federated recommendation system.

artificial intelligence, machine learning, model parameter, (18 more...)

arXiv.org Machine Learning

2006.08259

Country: Asia > Singapore (0.14)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback