Goto

Collaborating Authors

 Harrisburg


Inside the Dirty, Dystopian World of AI Data Centers

The Atlantic - Technology

This story appears in the April 2026 print edition. While some stories from this issue are not yet available to read online, you can explore more from the magazine . Get our editors' guide to what matters in the world, delivered to your inbox every weekday. The race to power AI is already remaking the physical world. Three Mile Island's cooling towers have until recently served as grave markers for America's nuclear-power industry. A s we drove through southwest Memphis, KeShaun Pearson told me to keep my window down--our destination was best tasted, not viewed. Along the way, we passed an abandoned coal plant to our right, then an active power plant to our left, equipped with enormous natural-gas turbines. Pearson, who directs the nonprofit Memphis Community Against Pollution, was bringing me to his hometown's latest industrial megaproject.


Three Mile Island nuclear plant makes comeback with 1B in federal backing to meet increasing energy demands

FOX News

Microsoft and Constellation Energy partner to restart Three Mile Island nuclear reactor with $1 billion federal loan to power artificial intelligence operations.


That New Hit Song on Spotify? It Was Made by A.I.

The New Yorker

That New Hit Song on Spotify? Aspiring musicians are churning out tracks using generative artificial intelligence. Some are topping the charts. Nick Arter, a thirty-five-year-old in Washington, D.C., never quite managed to become a professional musician the old-fashioned way. He grew up in Harrisburg, Pennsylvania, in a music-loving family.


The study of short texts in digital politics: Document aggregation for topic modeling

arXiv.org Artificial Intelligence

Statistical topic modeling is widely used in political science to study text. Researchers examine documents of varying lengths, from tweets to speeches. There is ongoing debate on how document length affects the interpretability of topic models. We investigate the effects of aggregating short documents into larger ones based on natural units that partition the corpus. In our study, we analyze one million tweets by U.S. state legislators from April 2016 to September 2020. We find that for documents aggregated at the account level, topics are more associated with individual states than when using individual tweets. This finding is replicated with Wikipedia pages aggregated by birth cities, showing how document definitions can impact topic modeling results.


Short-term Streamflow and Flood Forecasting based on Graph Convolutional Recurrent Neural Network and Residual Error Learning

arXiv.org Artificial Intelligence

Accurate short-term streamflow and flood forecasting are critical for mitigating river flood impacts, especially given the increasing climate variability. Machine learning-based streamflow forecasting relies on large streamflow datasets derived from rating curves. Uncertainties in rating curve modeling could introduce errors to the streamflow data and affect the forecasting accuracy. This study proposes a streamflow forecasting method that addresses these data errors, enhancing the accuracy of river flood forecasting and flood modeling, thereby reducing flood-related risk. A convolutional recurrent neural network is used to capture spatiotemporal patterns, coupled with residual error learning and forecasting. The neural network outperforms commonly used forecasting models over 1-6 hours of forecasting horizons, and the residual error learners can further correct the residual errors. This provides a more reliable tool for river flood forecasting and climate adaptation in this critical 1-6 hour time window for flood risk mitigation efforts.


A BERT-based Empirical Study of Privacy Policies' Compliance with GDPR

arXiv.org Artificial Intelligence

Since its implementation in May 2018, the General Data Protection Regulation (GDPR) has prompted businesses to revisit and revise their data handling practices to ensure compliance. The privacy policy, which serves as the primary means of informing users about their privacy rights and the data practices of companies, has been significantly updated by numerous businesses post-GDPR implementation. However, many privacy policies remain packed with technical jargon, lengthy explanations, and vague descriptions of data practices and user rights. This makes it a challenging task for users and regulatory authorities to manually verify the GDPR compliance of these privacy policies. In this study, we aim to address the challenge of compliance analysis between GDPR (Article 13) and privacy policies for 5G networks. We manually collected privacy policies from almost 70 different 5G MNOs, and we utilized an automated BERT-based model for classification. We show that an encouraging 51$\%$ of companies demonstrate a strong adherence to GDPR. In addition, we present the first study that provides current empirical evidence on the readability of privacy policies for 5G network. we adopted readability analysis toolset that incorporates various established readability metrics. The findings empirically show that the readability of the majority of current privacy policies remains a significant challenge. Hence, 5G providers need to invest considerable effort into revising these documents to enhance both their utility and the overall user experience.


Analyzing the factors that are involved in length of inpatient stay at the hospital for diabetes patients

arXiv.org Artificial Intelligence

The paper investigates the escalating concerns surrounding the surge in diabetes cases, exacerbated by the COVID-19 pandemic, and the subsequent strain on medical resources. The research aims to construct a predictive model quantifying factors influencing inpatient hospital stay durations for diabetes patients, offering insights to hospital administrators for improved patient management strategies. The literature review highlights the increasing prevalence of diabetes, emphasizing the need for continued attention and analysis of urban-rural disparities in healthcare access. International studies underscore the financial implications and healthcare burden associated with diabetes-related hospitalizations and complications, emphasizing the significance of effective management strategies. The methodology involves a quantitative approach, utilizing a dataset comprising 10,000 observations of diabetic inpatient encounters in U.S. hospitals from 1999 to 2008. Predictive modeling techniques, particularly Generalized Linear Models (GLM), are employed to develop a model predicting hospital stay durations based on patient demographics, admission types, medical history, and treatment regimen. The results highlight the influence of age, medical history, and treatment regimen on hospital stay durations for diabetes patients. Despite model limitations, such as heteroscedasticity and deviations from normality in residual analysis, the findings offer valuable insights for hospital administrators in patient management. The paper concludes with recommendations for future research to address model limitations and explore the implications of predictive models on healthcare management strategies, ensuring equitable patient care and resource allocation.


CORI: CJKV Benchmark with Romanization Integration -- A step towards Cross-lingual Transfer Beyond Textual Scripts

arXiv.org Artificial Intelligence

Naively assuming English as a source language may hinder cross-lingual transfer for many languages by failing to consider the importance of language contact. Some languages are more well-connected than others, and target languages can benefit from transferring from closely related languages; for many languages, the set of closely related languages does not include English. In this work, we study the impact of source language for cross-lingual transfer, demonstrating the importance of selecting source languages that have high contact with the target language. We also construct a novel benchmark dataset for close contact Chinese-Japanese-Korean-Vietnamese (CJKV) languages to further encourage in-depth studies of language contact. To comprehensively capture contact between these languages, we propose to integrate Romanized transcription beyond textual scripts via Contrastive Learning objectives, leading to enhanced cross-lingual representations and effective zero-shot cross-lingual transfer.


Variance Reduction in Monte-Carlo Tree Search

Neural Information Processing Systems

Monte-Carlo Tree Search (MCTS) has proven to be a powerful, generic planning technique for decision-making in single-agent and adversarial environments. The stochastic nature of the Monte-Carlo simulations introduces errors in the value estimates, both in terms of bias and variance. Whilst reducing bias (typically through the addition of domain knowledge) has been studied in the MCTS literature, comparatively little effort has focused on reducing variance. This is somewhat surprising, since variance reduction techniques are a well-studied area in classical statistics. In this paper, we examine the application of some standard techniques for variance reduction in MCTS, including common random numbers, antithetic variates and control variates. We demonstrate how these techniques can be applied to MCTS and explore their efficacy on three different stochastic, single-agent settings: Pig, Can't Stop and Dominion.


LLMs Among Us: Generative AI Participating in Digital Discourse

arXiv.org Artificial Intelligence

The emergence of Large Language Models (LLMs) has great potential to reshape the landscape of many social media platforms. While this can bring promising opportunities, it also raises many threats, such as biases and privacy concerns, and may contribute to the spread of propaganda by malicious actors. We developed the "LLMs Among Us" experimental framework on top of the Mastodon social media platform for bot and human participants to communicate without knowing the ratio or nature of bot and human participants. We built 10 personas with three different LLMs, GPT-4, LLama 2 Chat, and Claude. We conducted three rounds of the experiment and surveyed participants after each round to measure the ability of LLMs to pose as human participants without human detection. We found that participants correctly identified the nature of other users in the experiment only 42% of the time despite knowing the presence of both bots and humans. We also found that the choice of persona had substantially more impact on human perception than the choice of mainstream LLMs.