AITopics | Wang, Yanhao

Collaborating Authors

Wang, Yanhao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PediaBench: A Comprehensive Chinese Pediatric Dataset for Benchmarking Large Language Models

Zhang, Qian, Chen, Panfeng, Li, Jiali, Feng, Linkun, Liu, Shuyu, Zhao, Heng, Chen, Mei, Li, Hui, Wang, Yanhao

arXiv.org Artificial IntelligenceDec-11-2024

The emergence of Large Language Models (LLMs) in the medical domain has stressed a compelling need for standard datasets to evaluate their question-answering (QA) performance. Although there have been several benchmark datasets for medical QA, they either cover common knowledge across different departments or are specific to another department rather than pediatrics. Moreover, some of them are limited to objective questions and do not measure the generation capacity of LLMs. Therefore, they cannot comprehensively assess the QA ability of LLMs in pediatrics. To fill this gap, we construct PediaBench, the first Chinese pediatric dataset for LLM evaluation. Specifically, it contains 4,565 objective questions and 1,632 subjective questions spanning 12 pediatric disease groups. It adopts an integrated scoring criterion based on different difficulty levels to thoroughly assess the proficiency of an LLM in instruction following, knowledge understanding, clinical case analysis, etc. Finally, we validate the effectiveness of PediaBench with extensive experiments on 20 open-source and commercial LLMs. Through an in-depth analysis of experimental results, we offer insights into the ability of LLMs to answer pediatric questions in the Chinese context, highlighting their limitations for further improvements. Our code and data are published at https://github.com/ACMISLab/PediaBench.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2412.06287

Country: Asia > China (0.46)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

DPSW-Sketch: A Differentially Private Sketch Framework for Frequency Estimation over Sliding Windows (Technical Report)

Wang, Yiping, Wang, Yanhao, Chen, Cen

arXiv.org Artificial IntelligenceJun-12-2024

The sliding window model of computation captures scenarios in which data are continually arriving in the form of a stream, and only the most recent $w$ items are used for analysis. In this setting, an algorithm needs to accurately track some desired statistics over the sliding window using a small space. When data streams contain sensitive information about individuals, the algorithm is also urgently needed to provide a provable guarantee of privacy. In this paper, we focus on the two fundamental problems of privately (1) estimating the frequency of an arbitrary item and (2) identifying the most frequent items (i.e., \emph{heavy hitters}), in the sliding window model. We propose \textsc{DPSW-Sketch}, a sliding window framework based on the count-min sketch that not only satisfies differential privacy over the stream but also approximates the results for frequency and heavy-hitter queries within bounded errors in sublinear time and space w.r.t.~$w$. Extensive experiments on five real-world and synthetic datasets show that \textsc{DPSW-Sketch} provides significantly better utility-privacy trade-offs than state-of-the-art methods.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3637528.3671694

2406.07953

Country: Asia > China (0.47)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

Spectral Normalized-Cut Graph Partitioning with Fairness Constraints

Li, Jia, Wang, Yanhao, Merchant, Arpit

arXiv.org Artificial IntelligenceJul-22-2023

Normalized-cut graph partitioning aims to divide the set of nodes in a graph into $k$ disjoint clusters to minimize the fraction of the total edges between any cluster and all other clusters. In this paper, we consider a fair variant of the partitioning problem wherein nodes are characterized by a categorical sensitive attribute (e.g., gender or race) indicating membership to different demographic groups. Our goal is to ensure that each group is approximately proportionally represented in each cluster while minimizing the normalized cut value. To resolve this problem, we propose a two-phase spectral algorithm called FNM. In the first phase, we add an augmented Lagrangian term based on our fairness criteria to the objective function for obtaining a fairer spectral node embedding. Then, in the second phase, we design a rounding scheme to produce $k$ clusters from the fair embedding that effectively trades off fairness and partition quality. Through comprehensive experiments on nine benchmark datasets, we demonstrate the superior performance of FNM compared with three baseline methods.

fairness constraint, spectral normalized-cut graph partitioning

arXiv.org Artificial Intelligence

doi: 10.3233/FAIA230416

2307.12065

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining (0.60)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.60)

Add feedback

Balancing Utility and Fairness in Submodular Maximization (Technical Report)

Wang, Yanhao, Li, Yuchen, Bonchi, Francesco, Wang, Ying

arXiv.org Artificial IntelligenceJun-19-2023

Submodular function maximization is a fundamental combinatorial optimization problem with plenty of applications -- including data summarization, influence maximization, and recommendation. In many of these problems, the goal is to find a solution that maximizes the average utility over all users, for each of whom the utility is defined by a monotone submodular function. However, when the population of users is composed of several demographic groups, another critical problem is whether the utility is fairly distributed across different groups. Although the \emph{utility} and \emph{fairness} objectives are both desirable, they might contradict each other, and, to the best of our knowledge, little attention has been paid to optimizing them jointly. To fill this gap, we propose a new problem called \emph{Bicriteria Submodular Maximization} (BSM) to balance utility and fairness. Specifically, it requires finding a fixed-size solution to maximize the utility function, subject to the value of the fairness function not being below a threshold. Since BSM is inapproximable within any constant factor, we focus on designing efficient instance-dependent approximation schemes. Our algorithmic proposal comprises two methods, with different approximation factors, obtained by converting a BSM instance into other submodular optimization problem instances. Using real-world and synthetic datasets, we showcase applications of our proposed methods in three submodular maximization problems: maximum coverage, influence maximization, and facility location.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.48786/edbt.2024.01

2211.0098

Country:

Asia (1.00)
North America > United States > New York (0.15)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Customized Text Sanitization Mechanism with Differential Privacy

Chen, Huimin, Mo, Fengran, Wang, Yanhao, Chen, Cen, Nie, Jian-Yun, Wang, Chengyu, Cui, Jamie

arXiv.org Artificial IntelligenceMay-23-2023

As privacy issues are receiving increasing attention within the Natural Language Processing (NLP) community, numerous methods have been proposed to sanitize texts subject to differential privacy. However, the state-of-the-art text sanitization mechanisms based on metric local differential privacy (MLDP) do not apply to non-metric semantic similarity measures and cannot achieve good trade-offs between privacy and utility. To address the above limitations, we propose a novel Customized Text (CusText) sanitization mechanism based on the original $\epsilon$-differential privacy (DP) definition, which is compatible with any similarity measure. Furthermore, CusText assigns each input token a customized output set of tokens to provide more advanced privacy protection at the token level. Extensive experiments on several benchmark datasets show that CusText achieves a better trade-off between privacy and utility than existing mechanisms. The code is available at https://github.com/sai4july/CusText.

machine learning, mechanism, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2023.findings-acl.355

2207.01193

Country:

North America (0.14)
Asia (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Add feedback

Max-Min Diversification with Fairness Constraints: Exact and Approximation Algorithms

Wang, Yanhao, Mathioudakis, Michael, Li, Jia, Fabbri, Francesco

arXiv.org Artificial IntelligenceJan-5-2023

This has raised concerns about the possibility that algorithms may produce unfair and discriminatory decisions for specific population groups, particularly in sensitive socio-computational domains such as voting, hiring, banking, education, and criminal justice [12, 25]. To alleviate such concerns, there has been a lot of research devoted to incorporating fairness into the algorithms for automated decision tasks, including classification [14], clustering [10], ranking [24, 32], matching [28], and data summarization [8, 20]. This paper considers the diversity maximization problem and addresses its fairness-aware variant. The problem consists in selecting a diverse subset of items from a given dataset and is encountered in data summarization [8, 23], web search [2], recommendation [21], feature selection [31], and elsewhere [34]. Existing literature on the problem of diversity maximization primarily focuses on two objectives, namely max-min diversification (MMD), which aims to maximize the minimum distance between any pair of selected items, and max-sum diversification (MSD), which seeks to maximize the sum of pairwise distances between selected items. As shown in Figure 1, MMD tends to cover the data range uniformly, while MSD tends to pick "outliers" and may include highly similar items in the solution. Since the notion of diversity captured by MMD better represents the property that data summarization, feature selection, and many other tasks target with their solutions, we will only consider MMD in this paper. To be precise, given a set V of n items in a metric space and a positive integer k n, MMD asks for a size-k subset S of V to maximize the minimum pairwise distance within S. In particular, we study the fair max-min diversification (FMMD) problem, a variant of MMD that aims not only to maximize the diversity measure defined above but also to guarantee the satisfaction of group fairness constraints as described below.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1137/1.9781611977653.ch11

2301.02053

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)

Add feedback

SAH: Shifting-aware Asymmetric Hashing for Reverse $k$-Maximum Inner Product Search

Huang, Qiang, Wang, Yanhao, Tung, Anthony K. H.

arXiv.org Artificial IntelligenceNov-23-2022

This paper investigates a new yet challenging problem called Reverse $k$-Maximum Inner Product Search (R$k$MIPS). Given a query (item) vector, a set of item vectors, and a set of user vectors, the problem of R$k$MIPS aims to find a set of user vectors whose inner products with the query vector are one of the $k$ largest among the query and item vectors. We propose the first subquadratic-time algorithm, i.e., Shifting-aware Asymmetric Hashing (SAH), to tackle the R$k$MIPS problem. To speed up the Maximum Inner Product Search (MIPS) on item vectors, we design a shifting-invariant asymmetric transformation and develop a novel sublinear-time Shifting-Aware Asymmetric Locality Sensitive Hashing (SA-ALSH) scheme. Furthermore, we devise a new blocking strategy based on the Cone-Tree to effectively prune user vectors (in a batch). We prove that SAH achieves a theoretical guarantee for solving the RMIPS problem. Experimental results on five real-world datasets show that SAH runs 4$\sim$8$\times$ faster than the state-of-the-art methods for R$k$MIPS while achieving F1-scores of over 90\%. The code is available at \url{https://github.com/HuangQiang/SAH}.

data mining, information retrieval, machine learning, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1609/aaai.v37i4.25550

2211.12751

Country: Asia (0.68)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.93)
(2 more...)

Add feedback

Graph Summarization via Node Grouping: A Spectral Algorithm

Merchant, Arpit, Mathioudakis, Michael, Wang, Yanhao

arXiv.org Artificial IntelligenceNov-8-2022

Graph summarization via node grouping is a popular method to build concise graph representations by grouping nodes from the original graph into supernodes and encoding edges into superedges such that the loss of adjacency information is minimized. Such summaries have immense applications in large-scale graph analytics due to their small size and high query processing efficiency. In this paper, we reformulate the loss minimization problem for summarization into an equivalent integer maximization problem. By initially allowing relaxed (fractional) solutions for integer maximization, we analytically expose the underlying connections to the spectral properties of the adjacency matrix. Consequently, we design an algorithm called SpecSumm that consists of two phases. In the first phase, motivated by spectral graph theory, we apply k-means clustering on the k largest (in magnitude) eigenvectors of the adjacency matrix to assign nodes to supernodes. In the second phase, we propose a greedy heuristic that updates the initial assignment to further improve summary quality. Finally, via extensive experiments on 11 datasets, we show that SpecSumm efficiently produces high-quality summaries compared to state-of-the-art summarization algorithms and scales to graphs with millions of nodes.

data mining, eigenvector, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3539597.3570441

2211.04169

Country:

Asia (0.46)
North America > United States > New York (0.15)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

Streaming Algorithms for Diversity Maximization with Fairness Constraints

Wang, Yanhao, Fabbri, Francesco, Mathioudakis, Michael

arXiv.org Artificial IntelligenceJul-30-2022

Diversity maximization is a fundamental problem with wide applications in data summarization, web search, and recommender systems. Given a set $X$ of $n$ elements, it asks to select a subset $S$ of $k \ll n$ elements with maximum \emph{diversity}, as quantified by the dissimilarities among the elements in $S$. In this paper, we focus on the diversity maximization problem with fairness constraints in the streaming setting. Specifically, we consider the max-min diversity objective, which selects a subset $S$ that maximizes the minimum distance (dissimilarity) between any pair of distinct elements within it. Assuming that the set $X$ is partitioned into $m$ disjoint groups by some sensitive attribute, e.g., sex or race, ensuring \emph{fairness} requires that the selected subset $S$ contains $k_i$ elements from each group $i \in [1,m]$. A streaming algorithm should process $X$ sequentially in one pass and return a subset with maximum \emph{diversity} while guaranteeing the fairness constraint. Although diversity maximization has been extensively studied, the only known algorithms that can work with the max-min diversity objective and fairness constraints are very inefficient for data streams. Since diversity maximization is NP-hard in general, we propose two approximation algorithms for fair diversity maximization in data streams, the first of which is $\frac{1-\varepsilon}{4}$-approximate and specific for $m=2$, where $\varepsilon \in (0,1)$, and the second of which achieves a $\frac{1-\varepsilon}{3m+2}$-approximation for an arbitrary $m$. Experimental results on real-world and synthetic datasets show that both algorithms provide solutions of comparable quality to the state-of-the-art algorithms while running several orders of magnitude faster in the streaming setting.

artificial intelligence, information management, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICDE53745.2022.00008

2208.00194

Country:

Europe > Spain (0.28)
Asia > China (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Information Management (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

StoryBuddy: A Human-AI Collaborative Chatbot for Parent-Child Interactive Storytelling with Flexible Parental Involvement

Zhang, Zheng, Xu, Ying, Wang, Yanhao, Yao, Bingsheng, Ritchie, Daniel, Wu, Tongshuang, Yu, Mo, Wang, Dakuo, Li, Toby Jia-Jun

arXiv.org Artificial IntelligenceFeb-12-2022

Despite its benefits for children's skill development and parent-child bonding, many parents do not often engage in interactive storytelling by having story-related dialogues with their child due to limited availability or challenges in coming up with appropriate questions. While recent advances made AI generation of questions from stories possible, the fully-automated approach excludes parent involvement, disregards educational goals, and underoptimizes for child engagement. Informed by need-finding interviews and participatory design (PD) results, we developed StoryBuddy, an AI-enabled system for parents to create interactive storytelling experiences. StoryBuddy's design highlighted the need for accommodating dynamic user needs between the desire for parent involvement and parent-child bonding and the goal of minimizing parent intervention when busy. The PD revealed varied assessment and educational goals of parents, which StoryBuddy addressed by supporting configuring question types and tracking child progress. A user study validated StoryBuddy's usability and suggested design insights for future parent-AI collaboration systems.

machine learning, natural language, storybuddy, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3491102.3517479

2202.06205

Country:

Europe > United Kingdom (0.67)
North America > United States > Pennsylvania (0.28)
North America > United States > New York > New York County > New York City (0.15)
North America > United States > California > Santa Clara County (0.14)

Genre:

Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)
Instructional Material (1.00)
(2 more...)

Industry:

Education > Educational Setting (1.00)
Education > Curriculum > Subject-Specific Education (0.67)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback