Goto

Collaborating Authors

 culotta


Learning from Label Proportions and Covariate-shifted Instances

Singh, Sagalpreet, Sharma, Navodita, Havaldar, Shreyas, Saket, Rishi, Raghuveer, Aravindan

arXiv.org Artificial Intelligence

In many applications, especially due to lack of supervision or privacy concerns, the training data is grouped into bags of instances (feature-vectors) and for each bag we have only an aggregate label derived from the instance-labels in the bag. In learning from label proportions (LLP) the aggregate label is the average of the instance-labels in a bag, and a significant body of work has focused on training models in the LLP setting to predict instance-labels. In practice however, the training data may have fully supervised albeit covariate-shifted source data, along with the usual target data with bag-labels, and we wish to train a good instance-level predictor on the target domain. We call this the covariate-shifted hybrid LLP problem. Fully supervised covariate shifted data often has useful training signals and the goal is to leverage them for better predictive performance in the hybrid LLP setting. To achieve this, we develop methods for hybrid LLP which naturally incorporate the target bag-labels along with the source instance-labels, in the domain adaptation framework. Apart from proving theoretical guarantees bounding the target generalization error, we also conduct experiments on several publicly available datasets showing that our methods outperform LLP and domain adaptation baselines as well techniques from previous related work.


Enhancing Model Robustness and Fairness with Causality: A Regularization Approach

Wang, Zhao, Shu, Kai, Culotta, Aron

arXiv.org Artificial Intelligence

Recent work has raised concerns on the risk of spurious correlations and unintended biases in statistical machine learning models that threaten model robustness and fairness. In this paper, we propose a simple and intuitive regularization approach to integrate causal knowledge during model training and build a robust and fair model by emphasizing causal features and de-emphasizing spurious features. Specifically, we first manually identify causal and spurious features with principles inspired from the counterfactual framework of causal inference. Then, we propose a regularization approach to penalize causal and spurious features separately. By adjusting the strength of the penalty for each type of feature, we build a predictive model that relies more on causal features and less on non-causal features. We conduct experiments to evaluate model robustness and fairness on three datasets with multiple metrics. Empirical results show that the new models built with causal awareness significantly improve model robustness with respect to counterfactual texts and model fairness with respect to sensitive attributes.


AI might be the new electricity

#artificialintelligence

Someday you might have a significant relationship with your toaster. With a few silicon chips and the right programming, it'll use its considerable downtime to compose original musical interludes to play while your English muffin is browning. It'll text you Haikus designed to make you smile: This change won't happen by itself. Students are working hard to master the art and science of designing machines that learn, make decisions, create, think. Starting this fall, the Illinois Institute of Technology -- in recent years branding itself as the more brawny "Illinois Tech" -- became the only college in the Midwest to offer an undergraduate major in artificial intelligence, creating the systems that will guide everything from robots to trucks to medical care.


Illinois Tech Becomes 1st University in Midwest to Offer Degree in Artificial Intelligence

#artificialintelligence

This fall, students at the Illinois Institute of Technology will be among the first in the country to have the option of pursuing an undergraduate degree in AI. We want to train a workforce that can tackle the challenges and opportunities of the future, which includes AI and machine learning," said Aron Culotta, associate professor of computer science and director of Illinois Tech's Bachelor of Science in Artificial Intelligence program. Historically, AI has been taught at the graduate level because it was more of a research area rather than a core component of computer science. But as the field has matured, Illinois Tech decided it was time to offer an undergraduate degree course. "We thought it was time to move some of these courses and concepts down to the undergraduate level so that when they graduate they will have both the traditional computational and design aspects as well as a good command of a number of these AI approaches," said Culotta.


Predicting Demographics of High-Resolution Geographies with Geotagged Tweets

Montasser, Omar (Pennsylvania State University) | Kifer, Daniel (Pennsylvania State University)

AAAI Conferences

In this paper, we consider the problem of predicting demographics of geographic units given geotagged Tweets that are composed within these units. Traditional survey methods that offer demographics estimates are usually limited in terms of geographic resolution, geographic boundaries, and time intervals. Thus, it would be highly useful to develop computational methods that can complement traditional survey methods by offering demographics estimates at finer geographic resolutions, with flexible geographic boundaries (i.e. not confined to administrative boundaries), and at different time intervals. While prior work has focused on predicting demographics and health statistics at relatively coarse geographic resolutions such as the county-level or state-level, we introduce an approach to predict demographics at finer geographic resolutions such as the blockgroup-level. For the task of predicting gender and race/ethnicity counts at the blockgroup-level, an approach adapted from prior work to our problem achieves an average correlation of 0.389 (gender) and 0.569 (race) on a held-out test dataset. Our approach outperforms this prior approach with an average correlation of 0.671 (gender) and 0.692 (race).


Predicting Twitter User Demographics using Distant Supervision from Website Traffic Data

Culotta, Aron, Ravi, Nirmal Kumar, Cutler, Jennifer

Journal of Artificial Intelligence Research

Understanding the demographics of users of online social networks has important applications for health, marketing, and public messaging. Whereas most prior approaches rely on a supervised learning approach, in which individual users are labeled with demographics for training, we instead create a distantly labeled dataset by collecting audience measurement data for 1,500 websites (e.g., 50% of visitors to gizmodo.com are estimated to have a bachelor's degree). We then fit a regression model to predict these demographics from information about the followers of each website on Twitter. Using patterns derived both from textual content and the social network of each user, our final model produces an average held-out correlation of .77 across seven different variables (age, gender, education, ethnicity, income, parental status, and political preference). We then apply this model to classify individual Twitter users by ethnicity, gender, and political preference, finding performance that is surprisingly competitive with a fully supervised approach.