Goto

Collaborating Authors

 us census bureau



32e54441e6382a7fbacbbbaf3c450059-Supplemental.pdf

Neural Information Processing Systems

We only included a candidate variable if the nearest neighbor match was exact, i.e., we could find We compared the "fnlwgt" data to all weight variables "UH_WGTS_A1", which has a similar distribution. Since we did not identify an exact match for "fnlwgt" and the variable is not a property of an individual, we do not utilize it further in We vary the threshold from 6,000 to 72,000. Concretely, for a given threshold, e.g. In our experiments, as the "unconstrained" base classifier, we use the gradient boosted decision tree B.1 ACSIncome Predict whether US working adults' yearly income is above $50,000. T arget: PINCP (Total person's income): an individual's label is 1 if PINCP > 50000, otherwise 0. ACS PUMS data differently, and construct a new prediction task. Features: AGEP (Age): Range of values: - 0 - 99 (integers) - 0 indicates less than 1 year old.


Vehicle Localization in GPS-Denied Scenarios Using Arc-Length-Based Map Matching

Javed, Nur Uddin, Singh, Yuvraj, Ahmed, Qadeer

arXiv.org Artificial Intelligence

Automated driving systems face challenges in GPS-denied situations. To address this issue, kinematic dead reckoning is implemented using measurements from the steering angle, steering rate, yaw rate, and wheel speed sensors onboard the vehicle. However, dead reckoning methods suffer from drift. This paper provides an arc-length-based map matching method that uses a digital 2D map of the scenario in order to correct drift in the dead reckoning estimate. The kinematic model's prediction is used to introduce a temporal notion to the spatial information available in the map data. Results show reliable improvement in drift for all GPS-denied scenarios tested in this study. This innovative approach ensures that automated vehicles can maintain continuous and reliable navigation, significantly enhancing their safety and operational reliability in environments where GPS signals are compromised or unavailable.


The 2020 United States Decennial Census Is More Private Than You (Might) Think

Su, Buxin, Su, Weijie J., Wang, Chendi

arXiv.org Machine Learning

The U.S. Decennial Census serves as the foundation for many high-profile policy decision-making processes, including federal funding allocation and redistricting. In 2020, the Census Bureau adopted differential privacy to protect the confidentiality of individual responses through a disclosure avoidance system that injects noise into census data tabulations. The Bureau subsequently posed an open question: Could sharper privacy guarantees be obtained for the 2020 U.S. Census compared to their published guarantees, or equivalently, had the nominal privacy budgets been fully utilized? In this paper, we affirmatively address this open problem by demonstrating that between 8.50% and 13.76% of the privacy budget for the 2020 U.S. Census remains unused for each of the eight geographical levels, from the national level down to the block level. This finding is made possible through our precise tracking of privacy losses using $f$-differential privacy, applied to the composition of private queries across various geographical levels. Our analysis indicates that the Census Bureau introduced unnecessarily high levels of injected noise to achieve the claimed privacy guarantee for the 2020 U.S. Census. Consequently, our results enable the Bureau to reduce noise variances by 15.08% to 24.82% while maintaining the same privacy budget for each geographical level, thereby enhancing the accuracy of privatized census statistics. We empirically demonstrate that reducing noise injection into census statistics mitigates distortion caused by privacy constraints in downstream applications of private census data, illustrated through a study examining the relationship between earnings and education.


Exploring the Effects of Population and Employment Characteristics on Truck Flows: An Analysis of NextGen NHTS Origin-Destination Data

Uddin, Majbah, Liu, Yuandong, Lim, Hyeonsup

arXiv.org Artificial Intelligence

Truck transportation remains the dominant mode of US freight transportation because of its advantages, such as the flexibility of accessing pickup and drop-off points and faster delivery. Because of the massive freight volume transported by trucks, understanding the effects of population and employment characteristics on truck flows is critical for better transportation planning and investment decisions. The US Federal Highway Administration published a truck travel origin-destination data set as part of the Next Generation National Household Travel Survey program. This data set contains the total number of truck trips in 2020 within and between 583 predefined zones encompassing metropolitan and nonmetropolitan statistical areas within each state and Washington, DC. In this study, origin-destination-level truck trip flow data was augmented to include zone-level population and employment characteristics from the US Census Bureau. Census population and County Business Patterns data were included. The final data set was used to train a machine learning algorithm-based model, Extreme Gradient Boosting (XGBoost), where the target variable is the number of total truck trips. Shapley Additive ExPlanation (SHAP) was adopted to explain the model results. Results showed that the distance between the zones was the most important variable and had a nonlinear relationship with truck flows.


The Census Is Broken. Can AI Fix It?

#artificialintelligence

Getting a census count wrong can cost communities big. A March 10 report from the US Census Bureau showed an overcount of white and Asian people and an undercount of people who identify as Black, Hispanic or Latino, or multiracial in 2020, a failure that has led to renewed calls to modernize the census. Progress reaching historically undercounted groups has been slow, and the stakes are high. The once-a-decade endeavor informs the distribution of federal tax dollars and apportions members of the House of Representatives for each state, potentially redrawing the political map. According to emails obtained through a records request, Trump administration officials interfered in the population count to produce outcomes beneficial to Republicans, but problems with the census go back much further.


The Census Is Broken. Can AI Fix It?

WIRED

Getting a census count wrong can cost communities big. A March 10 report from the US Census Bureau showed an overcount of white and Asian people and an undercount of people who identify as Black, Hispanic or Latino, or multiracial in 2020, a failure that has led to renewed calls to modernize the census. Progress reaching historically undercounted groups has been slow, and the stakes are high. The once-a-decade endeavor informs the distribution of federal tax dollars and apportions members of the House of Representatives for each state, potentially redrawing the political map. According to emails obtained through a records request, Trump administration officials interfered in the population count to produce outcomes beneficial to Republicans, but problems with the census go back much further.


Retiring Adult: New Datasets for Fair Machine Learning

Ding, Frances, Hardt, Moritz, Miller, John, Schmidt, Ludwig

arXiv.org Machine Learning

Although the fairness community has recognized the importance of data, researchers in the area primarily rely on UCI Adult when it comes to tabular data. Derived from a 1994 US Census survey, this dataset has appeared in hundreds of research papers where it served as the basis for the development and comparison of many algorithmic fairness interventions. We reconstruct a superset of the UCI Adult data from available US Census sources and reveal idiosyncrasies of the UCI Adult dataset that limit its external validity. Our primary contribution is a suite of new datasets derived from US Census surveys that extend the existing data ecosystem for research on fair machine learning. We create prediction tasks relating to income, employment, health, transportation, and housing. The data span multiple years and all states of the United States, allowing researchers to study temporal shift and geographic variation. We highlight a broad initial sweep of new empirical insights relating to trade-offs between fairness criteria, performance of algorithmic interventions, and the role of distribution shift based on our new datasets. Our findings inform ongoing debates, challenge some existing narratives, and point to future research directions. Our datasets are available at https://github.com/zykls/folktables.