AITopics | Yu, Zhe

Collaborating Authors

Yu, Zhe

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Testing Relative Fairness in Human Decisions With Machine Learning

Yu, Zhe, Xi, Xiaoyin

arXiv.org Artificial IntelligenceDec-17-2023

Fairness in decision-making has been a long-standing issue in our society. Compared to algorithmic fairness, fairness in human decisions is even more important since there are processes where humans make the final decisions and that machine learning models inherit bias from the human decisions they were trained on. However, the standard for fairness in human decisions are highly subjective and contextual. This leads to the difficulty for testing "absolute" fairness in human decisions. To bypass this issue, this work aims to test relative fairness in human decisions. That is, instead of defining what are "absolute" fair decisions, we check the relative fairness of one decision set against another. An example outcome can be: Decision Set A favors female over male more than Decision Set B. Such relative fairness has the following benefits: (1) it avoids the ambiguous and contradictory definition of "absolute" fair decisions; (2) it reveals the relative preference and bias between different human decisions; (3) if a reference set of decisions is provided, relative fairness of other decision sets against this reference set can reflect whether those decision sets are fair by the standard of that reference set. We define the relative fairness with statistical tests (null hypothesis and effect size tests) of the decision differences across each sensitive group. Furthermore, we show that a machine learning model trained on the human decisions can inherit the bias/preference and therefore can be utilized to estimate the relative fairness between two decision sets made on different data.

artificial intelligence, machine learning, relative fairness, (17 more...)

arXiv.org Artificial Intelligence

2112.11279

Country: North America > United States (0.14)

Genre:

Research Report > Experimental Study (0.95)
Research Report > New Finding (0.90)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

A Data-driven Region Generation Framework for Spatiotemporal Transportation Service Management

Chen, Liyue, Fang, Jiangyi, Yu, Zhe, Tong, Yongxin, Cao, Shaosheng, Wang, Leye

arXiv.org Artificial IntelligenceJun-5-2023

MAUP (modifiable areal unit problem) is a fundamental problem for spatial data management and analysis. As an instantiation of MAUP in online transportation platforms, region generation (i.e., specifying the areal unit for service operations) is the first and vital step for supporting spatiotemporal transportation services such as ride-sharing and freight transport. Most existing region generation methods are manually specified (e.g., fixed-size grids), suffering from poor spatial semantic meaning and inflexibility to meet service operation requirements. In this paper, we propose RegionGen, a data-driven region generation framework that can specify regions with key characteristics (e.g., good spatial semantic meaning and predictability) by modeling region generation as a multi-objective optimization problem. First, to obtain good spatial semantic meaning, RegionGen segments the whole city into atomic spatial elements based on road networks and obstacles (e.g., rivers). Then, it clusters the atomic spatial elements into regions by maximizing various operation characteristics, which is formulated as a multi-objective optimization problem. For this optimization problem, we propose a multi-objective co-optimization algorithm. Extensive experiments verify that RegionGen can generate more suitable regions than traditional methods for spatiotemporal service management.

artificial intelligence, machine learning, proceedings, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3580305.3599760

2306.02806

Country:

Asia > China (1.00)
North America > United States > New York (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Passenger (1.00)
Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

FairBalance: How to Achieve Equalized Odds With Data Pre-processing

Yu, Zhe, Chakraborty, Joymallya, Menzies, Tim

arXiv.org Artificial IntelligenceApr-26-2023

This research seeks to benefit the software engineering society by providing a simple yet effective pre-processing approach to achieve equalized odds fairness in machine learning software. Fairness issues have attracted increasing attention since machine learning software is increasingly used for high-stakes and high-risk decisions. Amongst all the existing fairness notions, this work specifically targets "equalized odds" given its advantage in always allowing perfect classifiers. Equalized odds requires that members of every demographic group do not receive disparate mistreatment. Prior works either optimize for an equalized odds related metric during the learning process like a black-box, or manipulate the training data following some intuition. This work studies the root cause of the violation of equalized odds and how to tackle it. We found that equalizing the class distribution in each demographic group with sample weights is a necessary condition for achieving equalized odds without modifying the normal training process. In addition, an important partial condition for equalized odds (zero average odds difference) can be guaranteed when the class distributions are weighted to be not only equal but also balanced (1:1). Based on these analyses, we proposed FairBalance, a pre-processing algorithm which balances the class distribution in each demographic group by assigning calculated weights to the training data. On eight real-world datasets, our empirical results show that, at low computational overhead, the proposed pre-processing algorithm FairBalance can significantly improve equalized odds without much, if any damage to the utility. FairBalance also outperforms existing state-of-the-art approaches in terms of equalized odds. To facilitate reuse, reproduction, and validation, we made our scripts available at https://github.com/hil-se/FairBalance.

artificial intelligence, equalized odds, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2107.0831

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.88)

Industry:

Education (0.54)
Health & Medicine (0.46)
Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Better Technical Debt Detection via SURVEYing

Fahid, Fahmid M., Yu, Zhe, Menzies, Tim

arXiv.org Artificial IntelligenceMay-20-2019

Software analytics can be improved by surveying; i.e. rechecking and (possibly) revising the labels offered by prior analysis. Surveying is a time-consuming task and effective surveyors must carefully manage their time. Specifically, they must balance the cost of further surveying against the additional benefits of that extra effort. This paper proposes SURVEY0, an incremental Logistic Regression estimation method that implements cost/benefit analysis. Some classifier is used to rank the as-yet-unvisited examples according to how interesting they might be. Humans then review the most interesting examples, after which their feedback is used to update an estimator for estimating how many examples are remaining. This paper evaluates SURVEY0 in the context of self-admitted technical debt. As software project mature, they can accumulate "technical debt" i.e. developer decisions which are sub-optimal and decrease the overall quality of the code. Such decisions are often commented on by programmers in the code; i.e. it is self-admitted technical debt (SATD). Recent results show that text classifiers can automatically detect such debt. We find that we can significantly outperform prior results by SURVEYing the data. Specifically, for ten open-source JAVA projects, we can find 83% of the technical debt via SURVEY0 using just 16% of the comments (and if higher levels of recall are required, SURVEY0can adjust towards that with some additional effort).

artificial intelligence, software engineering, technical debt, (20 more...)

arXiv.org Artificial Intelligence

1905.08297

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

Finding Better Active Learners for Faster Literature Reviews

Yu, Zhe, Kraft, Nicholas A., Menzies, Tim

arXiv.org Artificial IntelligenceFeb-2-2018

Literature reviews can be time-consuming and tedious to complete. By cataloging and refactoring three state-of-the-art active learning techniques from evidence-based medicine and legal electronic discovery, this paper finds and implements FASTREAD, a faster technique for studying a large corpus of documents. This paper assesses FASTREAD using datasets generated from existing SE literature reviews (Hall, Wahono, Radjenovi\'c, Kitchenham et al.). Compared to manual methods, FASTREAD lets researchers find 95% relevant studies after reviewing an order of magnitude fewer papers. Compared to other state-of-the-art automatic methods, FASTREAD reviews 20-50% fewer studies while finding same number of relevant primary studies in a systematic literature review.

dataset, litigation, survey article, (18 more...)

arXiv.org Artificial Intelligence

1612.03224

Country:

North America > United States > North Carolina (0.14)
North America > United States > Wisconsin (0.14)
North America > United States > Texas (0.14)

Industry: Law > Litigation (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

Add feedback