labor statistic
America Isn't Ready for What AI Will Do to Jobs
This story appears in the March 2026 print edition. While some stories from this issue are not yet available to read online, you can explore more from the magazine . Get our editors' guide to what matters in the world, delivered to your inbox every weekday. America Isn't Ready for What AI Will Do to Jobs Does anyone have a plan for what happens next? In 1869, a group of Massachusetts reformers persuaded the state to try a simple idea: counting. The Second Industrial Revolution was belching its way through New England, teaching mill and factory owners a lesson most M.B.A. students now learn in their first semester: that efficiency gains tend to come from somewhere, and that somewhere is usually somebody else. They were operating at speeds that the human body--an elegant piece of engineering designed over millions of years for entirely different purposes--simply wasn't built to match. The owners knew this, just as they knew that there's a limit to how much misery people are willing to tolerate before they start setting fire to things. Still, the machines pressed on. Check out more from this issue and find your next story to read. So Massachusetts created the nation's first Bureau of Statistics of Labor, hoping that data might accomplish what conscience could not. By measuring work hours, conditions, wages, and what economists now call "negative externalities" but were then called "children's arms torn off," policy makers figured they might be able to produce reasonably fair outcomes for everyone. A few years later, with federal troops shooting at striking railroad workers and wealthy citizens funding private armories--leading indicators that things in your society aren't going great--Congress decided that this idea might be worth trying at scale and created the Bureau of Labor Statistics. Measurement doesn't abolish injustice; it rarely even settles arguments. But the act of counting--of trying to see clearly, of committing the government to a shared set of facts--signals an intention to be fair, or at least to be caught trying. It's one way a republic earns the right to be believed in. The BLS remains a small miracle of civilization.
GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks
Patwardhan, Tejal, Dias, Rachel, Proehl, Elizabeth, Kim, Grace, Wang, Michele, Watkins, Olivia, Fishman, Simón Posada, Aljubeh, Marwan, Thacker, Phoebe, Fauconnet, Laurance, Kim, Natalie S., Chao, Patrick, Miserendino, Samuel, Chabot, Gildas, Li, David, Sharman, Michael, Barr, Alexandra, Glaese, Amelia, Tworek, Jerry
We introduce GDPval, a benchmark evaluating AI model capabilities on real-world economically valuable tasks. GDPval covers the majority of U.S. Bureau of Labor Statistics Work Activities for 44 occupations across the top 9 sectors contributing to U.S. GDP (Gross Domestic Product). Tasks are constructed from the representative work of industry professionals with an average of 14 years of experience. We find that frontier model performance on GDPval is improving roughly linearly over time, and that the current best frontier models are approaching industry experts in deliverable quality. We analyze the potential for frontier models, when paired with human oversight, to perform GDPval tasks cheaper and faster than unaided experts. We also demonstrate that increased reasoning effort, increased task context, and increased scaffolding improves model performance on GDPval. Finally, we open-source a gold subset of 220 tasks and provide a public automated grading service at evals.openai.com to facilitate future research in understanding real-world model capabilities.
Letters from Our Readers
Readers respond to John Seabrook's piece on floods, Eyal Press's article on the National Restaurant Association, and Adam Gopnik's essay on the history of gambling in New York. John Seabrook's piece on the increasing frequency and formidable power of river flooding is both moving and scientifically instructive (" The Flood Will Come, " July 28th). I served as Vermont's commissioner of health for eight years, during which time I participated in the state's annual flood-disaster response, and I believe it's important to expand the public-safety discussion so that it includes the protection of human health and wellness. Climate change poses the biggest threat to public health, and storms and floods have abundant immediate impacts: drinking-water contamination; mold damage to homes and businesses; the spread of infectious disease; soil erosion that affects food quality; and limitations on recreation, transportation, and medical-care access. Climate change is also a major source of stress on the population's mental health, and on the country's already fragile mental-health system.
Unemployment Dynamics Forecasting with Machine Learning Regression Models
In this paper, I explored how a range of regression and machine learning techniques can be applied to monthly U.S. unemployment data to produce timely forecasts. I compared seven models: Linear Regression, SGDRegressor, Random Forest, XGBoost, CatBoost, Support Vector Regression, and an LSTM network, training each on a historical span of data and then evaluating on a later hold-out period. Input features include macro indicators (GDP growth, CPI), labor market measures (job openings, initial claims), financial variables (interest rates, equity indices), and consumer sentiment. I tuned model hyperparameters via cross-validation and assessed performance with standard error metrics and the ability to predict the correct unemployment direction. Across the board, tree-based ensembles (and CatBoost in particular) deliver noticeably better forecasts than simple linear approaches, while the LSTM captures underlying temporal patterns more effectively than other nonlinear methods. SVR and SGDRegressor yield modest gains over standard regression but don't match the consistency of the ensemble and deep-learning models. Interpretability tools ,feature importance rankings and SHAP values, point to job openings and consumer sentiment as the most influential predictors across all methods. By directly comparing linear, ensemble, and deep-learning approaches on the same dataset, our study shows how modern machine-learning techniques can enhance real-time unemployment forecasting, offering economists and policymakers richer insights into labor market trends. In the comparative evaluation of the models, I employed a dataset comprising thirty distinct features over the period from January 2020 through December 2024.
Complement or substitute? How AI increases the demand for human skills
Mäkelä, Elina, Stephany, Fabian
The question of whether AI substitutes or complements human work is central to debates on the future of work. This paper examines the impact of AI on skill demand and compensation in the U.S. economy, analysing 12 million online job vacancies from 2018 to 2023. It investigates internal effects (within-job substitution and complementation) and external effects (across occupations, industries, and regions). Our findings reveal a significant increase in demand for AI-complementary skills, such as digital literacy, teamwork, and resilience, alongside rising wage premiums for these skills in AI roles like Data Scientist. Conversely, substitute skills, including customer service and text review, have declined in both demand and value within AI-related positions. Examining external effects, we find a notable rise in demand for complementary skills in non-AI roles linked to the growth of AI-related jobs in specific industries or regions. At the same time, there is a moderate decline in non-AI roles requiring substitute skills. Overall, AI's complementary effect is up to 50% larger than its substitution effect, resulting in net positive demand for skills. These results, replicated for the UK and Australia, highlight AI's transformative impact on workforce skill requirements. They suggest reskilling efforts should prioritise not only technical AI skills but also complementary skills like ethics and digital literacy.
Are Female Carpenters like Blue Bananas? A Corpus Investigation of Occupation Gender Typicality
Ju, Da, Ulrich, Karen, Williams, Adina
People tend to use language to mention surprising properties of events: for example, when a banana is blue, we are more likely to mention color than when it is yellow. This fact is taken to suggest that yellowness is somehow a typical feature of bananas, and blueness is exceptional. Similar to how a yellow color is typical of bananas, there may also be genders that are typical of occupations. In this work, we explore this question using information theoretic techniques coupled with corpus statistic analysis. In two distinct large corpora, we do not find strong evidence that occupations and gender display the same patterns of mentioning as do bananas and color. Instead, we find that gender mentioning is correlated with femaleness of occupation in particular, suggesting perhaps that woman-dominated occupations are seen as somehow ``more gendered'' than male-dominated ones, and thereby they encourage more gender mentioning overall.
The Male CEO and the Female Assistant: Gender Biases in Text-To-Image Generation of Dual Subjects
Recent large-scale T2I models like DALLE-3 have made progress on improving fairness in single-subject generation, i.e. generating a one-person image. However, we reveal that these improved models still demonstrate considerable biases when simply generating two people. To systematically evaluate T2I models in this challenging generation setting, we propose the Paired Stereotype Test (PST) framework, established as a dual-subject generation task, i.e. generating two people in the same image. The setting in PST is especially challenging, as the two individuals are described with social identities that are male-stereotyped and female-stereotyped, respectively, e.g. "a CEO" and "an Assistant". It is easy for T2I models to unfairly follow gender stereotypes in this contrastive setting. We establish a metric, Stereotype Score (SS), to quantitatively measure the adherence to gender stereotypes in generated images. Using PST, we evaluate two aspects of gender biases in DALLE-3 -- the widely-identified bias in gendered occupation, as well as a novel aspect: bias in organizational power. Results show that despite generating seemingly fair or even anti-stereotype single-person images, DALLE-3 still shows notable biases under PST -- for instance, in experiments on gender-occupational stereotypes, over 74% model generations demonstrate biases. Moreover, compared to single-person settings, DALLE-3 is more likely to perpetuate male-associated stereotypes under PST. Our work pioneers the research on bias in dual-subject generation, and our proposed PST framework can be easily extended for further experiments, establishing a valuable contribution.
The Resume Paradox: Greater Language Differences, Smaller Pay Gaps
Minot, Joshua R., Maier, Marc, Demarest, Bradford, Cheney, Nicholas, Danforth, Christopher M., Dodds, Peter Sheridan, Frank, Morgan R.
Over the past decade, the gender pay gap has remained steady with women earning 84 cents for every dollar earned by men on average. Many studies explain this gap through demand-side bias in the labor market represented through employers' job postings. However, few studies analyze potential bias from the worker supply-side. Here, we analyze the language in millions of US workers' resumes to investigate how differences in workers' self-representation by gender compare to differences in earnings. Across US occupations, language differences between male and female resumes correspond to 11% of the variation in gender pay gap. This suggests that females' resumes that are semantically similar to males' resumes may have greater wage parity. However, surprisingly, occupations with greater language differences between male and female resumes have lower gender pay gaps. A doubling of the language difference between female and male resumes results in an annual wage increase of $2,797 for the average female worker. This result holds with controls for gender-biases of resume text and we find that per-word bias poorly describes the variance in wage gap. The results demonstrate that textual data and self-representation are valuable factors for improving worker representations and understanding employment inequities.
Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale
Bianchi, Federico, Kalluri, Pratyusha, Durmus, Esin, Ladhak, Faisal, Cheng, Myra, Nozza, Debora, Hashimoto, Tatsunori, Jurafsky, Dan, Zou, James, Caliskan, Aylin
Machine learning models that convert user-written text descriptions into images are now widely available online and used by millions of users to generate millions of images a day. We investigate the potential for these models to amplify dangerous and complex stereotypes. We find a broad range of ordinary prompts produce stereotypes, including prompts simply mentioning traits, descriptors, occupations, or objects. For example, we find cases of prompting for basic traits or social roles resulting in images reinforcing whiteness as ideal, prompting for occupations resulting in amplification of racial and gender disparities, and prompting for objects resulting in reification of American norms. Stereotypes are present regardless of whether prompts explicitly mention identity and demographic language or avoid such language. Moreover, stereotypes persist despite mitigation strategies; neither user attempts to counter stereotypes by requesting images with specific counter-stereotypes nor institutional attempts to add system ``guardrails'' have prevented the perpetuation of stereotypes. Our analysis justifies concerns regarding the impacts of today's models, presenting striking exemplars, and connecting these findings with deep insights into harms drawn from social scientific and humanist disciplines. This work contributes to the effort to shed light on the uniquely complex biases in language-vision models and demonstrates the ways that the mass deployment of text-to-image generation models results in mass dissemination of stereotypes and resulting harms.
A.I. Is Coming for Lawyers, Again - The New York Times
The impact, Mr. Allgrove said, will be to force everyone in the profession, from paralegals to $1,000-an-hour partners, to move up the skills ladder to stay ahead of the technology. The work of humans, he said, will increasingly be to focus on developing industry expertise, exercising judgment in complex legal matters, and offering strategic guidance and building trusted relationships with clients. Technology has eliminated large numbers of jobs in recent years, and not just robots taking over factories. Personal computers, productivity software and the internet have made office work more efficient, replacing many workers. Office and administrative support occupations, including secretaries, clerks, bill collectors and office assistants, employ 1.3 million fewer workers than in 1990, according to an analysis by the Bureau of Labor Statistics.