culotta
Learning from Label Proportions and Covariate-shifted Instances
Singh, Sagalpreet, Sharma, Navodita, Havaldar, Shreyas, Saket, Rishi, Raghuveer, Aravindan
In many applications, especially due to lack of supervision or privacy concerns, the training data is grouped into bags of instances (feature-vectors) and for each bag we have only an aggregate label derived from the instance-labels in the bag. In learning from label proportions (LLP) the aggregate label is the average of the instance-labels in a bag, and a significant body of work has focused on training models in the LLP setting to predict instance-labels. In practice however, the training data may have fully supervised albeit covariate-shifted source data, along with the usual target data with bag-labels, and we wish to train a good instance-level predictor on the target domain. We call this the covariate-shifted hybrid LLP problem. Fully supervised covariate shifted data often has useful training signals and the goal is to leverage them for better predictive performance in the hybrid LLP setting. To achieve this, we develop methods for hybrid LLP which naturally incorporate the target bag-labels along with the source instance-labels, in the domain adaptation framework. Apart from proving theoretical guarantees bounding the target generalization error, we also conduct experiments on several publicly available datasets showing that our methods outperform LLP and domain adaptation baselines as well techniques from previous related work.
- North America > United States (0.05)
- Europe > France (0.05)
- Europe > Italy (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Enhancing Model Robustness and Fairness with Causality: A Regularization Approach
Wang, Zhao, Shu, Kai, Culotta, Aron
Recent work has raised concerns on the risk of spurious correlations and unintended biases in statistical machine learning models that threaten model robustness and fairness. In this paper, we propose a simple and intuitive regularization approach to integrate causal knowledge during model training and build a robust and fair model by emphasizing causal features and de-emphasizing spurious features. Specifically, we first manually identify causal and spurious features with principles inspired from the counterfactual framework of causal inference. Then, we propose a regularization approach to penalize causal and spurious features separately. By adjusting the strength of the penalty for each type of feature, we build a predictive model that relies more on causal features and less on non-causal features. We conduct experiments to evaluate model robustness and fairness on three datasets with multiple metrics. Empirical results show that the new models built with causal awareness significantly improve model robustness with respect to counterfactual texts and model fairness with respect to sensitive attributes.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Media > Film (0.47)
- Education > Curriculum > Subject-Specific Education (0.31)
AI might be the new electricity
Someday you might have a significant relationship with your toaster. With a few silicon chips and the right programming, it'll use its considerable downtime to compose original musical interludes to play while your English muffin is browning. It'll text you Haikus designed to make you smile: This change won't happen by itself. Students are working hard to master the art and science of designing machines that learn, make decisions, create, think. Starting this fall, the Illinois Institute of Technology -- in recent years branding itself as the more brawny "Illinois Tech" -- became the only college in the Midwest to offer an undergraduate major in artificial intelligence, creating the systems that will guide everything from robots to trucks to medical care.
- North America > United States > Illinois > Cook County > Chicago (0.40)
- Asia > India (0.05)
Illinois Tech Becomes 1st University in Midwest to Offer Degree in Artificial Intelligence
This fall, students at the Illinois Institute of Technology will be among the first in the country to have the option of pursuing an undergraduate degree in AI. We want to train a workforce that can tackle the challenges and opportunities of the future, which includes AI and machine learning," said Aron Culotta, associate professor of computer science and director of Illinois Tech's Bachelor of Science in Artificial Intelligence program. Historically, AI has been taught at the graduate level because it was more of a research area rather than a core component of computer science. But as the field has matured, Illinois Tech decided it was time to offer an undergraduate degree course. "We thought it was time to move some of these courses and concepts down to the undergraduate level so that when they graduate they will have both the traditional computational and design aspects as well as a good command of a number of these AI approaches," said Culotta.
Predicting Demographics of High-Resolution Geographies with Geotagged Tweets
Montasser, Omar (Pennsylvania State University) | Kifer, Daniel (Pennsylvania State University)
In this paper, we consider the problem of predicting demographics of geographic units given geotagged Tweets that are composed within these units. Traditional survey methods that offer demographics estimates are usually limited in terms of geographic resolution, geographic boundaries, and time intervals. Thus, it would be highly useful to develop computational methods that can complement traditional survey methods by offering demographics estimates at finer geographic resolutions, with flexible geographic boundaries (i.e. not confined to administrative boundaries), and at different time intervals. While prior work has focused on predicting demographics and health statistics at relatively coarse geographic resolutions such as the county-level or state-level, we introduce an approach to predict demographics at finer geographic resolutions such as the blockgroup-level. For the task of predicting gender and race/ethnicity counts at the blockgroup-level, an approach adapted from prior work to our problem achieves an average correlation of 0.389 (gender) and 0.569 (race) on a held-out test dataset. Our approach outperforms this prior approach with an average correlation of 0.671 (gender) and 0.692 (race).
- North America > United States > Minnesota (0.04)
- Europe > Ireland (0.04)
- Asia > Middle East > Jordan (0.04)
- (4 more...)
Predicting Twitter User Demographics using Distant Supervision from Website Traffic Data
Culotta, Aron, Ravi, Nirmal Kumar, Cutler, Jennifer
Understanding the demographics of users of online social networks has important applications for health, marketing, and public messaging. Whereas most prior approaches rely on a supervised learning approach, in which individual users are labeled with demographics for training, we instead create a distantly labeled dataset by collecting audience measurement data for 1,500 websites (e.g., 50% of visitors to gizmodo.com are estimated to have a bachelor's degree). We then fit a regression model to predict these demographics from information about the followers of each website on Twitter. Using patterns derived both from textual content and the social network of each user, our final model produces an average held-out correlation of .77 across seven different variables (age, gender, education, ethnicity, income, parental status, and political preference). We then apply this model to classify individual Twitter users by ethnicity, gender, and political preference, finding performance that is surprisingly competitive with a fully supervised approach.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Asia > Middle East > Jordan (0.04)
- (12 more...)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.68)
- Information Technology > Services (1.00)
- Leisure & Entertainment > Games > Computer Games (0.46)