Goto

Collaborating Authors

 government data


Planning bids for new homes soar but building remains low - how is your area affected?

BBC News

The number of planning applications for new homes in England is at its highest level for four years, new data shared with BBC Verify suggests. Applications for 335,000 homes outside London were lodged in 2025, up by 60% on 2024, according to Planning Portal, the service people use to request permission. But there are warnings that more needs to be done to meet Labour's target of building 1.5 million homes by 2029, as separate government data released on Thursday suggests there has been a decrease in house building. The Ministry of Housing, Communities and Local Government said it had overhauled the planning system and removed long-standing barriers that have held back housebuilding. The increase in planning applications for new homes in England follows controversial reforms introduced by Labour, which allow development on some lower-quality green belt land, known as grey belt .


CCFC: Core & Core-Full-Core Dual-Track Defense for LLM Jailbreak Protection

Hu, Jiaming, Wang, Haoyu, Mukherjee, Debarghya, Paschalidis, Ioannis Ch.

arXiv.org Artificial Intelligence

Jailbreak attacks pose a serious challenge to the safe deployment of large language models (LLMs). We introduce CCFC (Core & Core-Full-Core), a dual-track, prompt-level defense framework designed to mitigate LLMs' vulnerabilities from prompt injection and structure-aware jailbreak attacks. CCFC operates by first isolating the semantic core of a user query via few-shot prompting, and then evaluating the query using two complementary tracks: a core-only track to ignore adversarial distractions (e.g., toxic suffixes or prefix injections), and a core-full-core (CFC) track to disrupt the structural patterns exploited by gradient-based or edit-based attacks. The final response is selected based on a safety consistency check across both tracks, ensuring robustness without compromising on response quality. We demonstrate that CCFC cuts attack success rates by 50-75% versus state-of-the-art defenses against strong adversaries (e.g., DeepInception, GCG), without sacrificing fidelity on benign queries. Our method consistently outperforms state-of-the-art prompt-level defenses, offering a practical and effective solution for safer LLM deployment.


American Panopticon

The Atlantic - Technology

If you have tips about DOGE and its data collection, you can contact Ian and Charlie on Signal at @ibogost.47 and @cwarzel.92. If you were tasked with building a panopticon, your design might look a lot like the information stores of the U.S. federal government--a collection of large, complex agencies, each making use of enormous volumes of data provided by or collected from citizens. The federal government is a veritable cosmos of information, made up of constellations of databases: The IRS gathers comprehensive financial and employment information from every taxpayer; the Department of Labor maintains the National Farmworker Jobs Program (NFJP) system, which collects the personal information of many workers; the Department of Homeland Security amasses data about the movements of every person who travels by air commercially or crosses the nation's borders; the Drug Enforcement Administration tracks license plates scanned on American roads. More obscure agencies, such as the recently gutted Consumer Financial Protection Bureau, keep records of corporate trade secrets, credit reports, mortgage information, and other sensitive data, including lists of people who have fallen on financial hardship. A fragile combination of decades-old laws, norms, and jungly bureaucracy has so far prevented repositories such as these from assembling into a centralized American surveillance state. But that appears to be changing. Since Donald Trump's second inauguration, Elon Musk and the Department of Government Efficiency have systematically gained access to sensitive data across the federal government, and in ways that people in several agencies have described to us as both dangerous and disturbing.


Public Programs Are Only as Good as Their Data

WIRED

Data scientists will have a bumper year in 2023 as governments invest heavily in applying AI and algorithms to public policy. The European Commission has committed €1.3 billion ($1.38 billion) to research and innovation under the Digital Europe Programme. The UK government is funding £117 million ($143.6 million) for PhDs in AI, and it's already on the second year of its 10-year plan to "make Britain a global AI superpower." Examples of ongoing initiatives include the National Health Service's use of AI to identify abnormalities in CT scans and the Department for Work and Pensions' efforts to detect fraud in universal credit applications. This story is from the WIRED World in 2023, our annual trends briefing.


Why Trust Matters for the National Artificial Intelligence Research Resource Task Force

#artificialintelligence

It is true that artificial intelligence (AI) will come to influence almost every aspect of our lives. In the scramble to realize the potential economic and societal benefits promised by AI, the ready availability of massive, complex, and assumed-to-be generalizable datasets with which to train and test new algorithms is vital. The interaction of governments with their citizens throughout their lives generates huge volumes of diverse information, and these continuously expanding repositories of data are now seen as a public good, providing the raw material for AI industries. In passing the National Artificial Intelligence Initiative Act of 2020 (NAIIA), the United States has adopted a path similar to that of the European Union, as defined within the European Commission's Coordinated Plan on Artificial Intelligence 2021 Review. Under the provisions of the NAIIA, the National Artificial Intelligence Research Resource Task Force (NAIRRTF) has been constituted to make recommendations to Congress on, among other things, the capabilities necessary to create shared computing infrastructure for use by AI researchers and potential solutions in respect to "barriers to the dissemination and use of high-quality government data sets."


Now Streaming: Government Data

#artificialintelligence

The concept of data streaming is not new. But one of the most critical emerging uses for streaming data is in the public sector, where government agencies are eyeing its game-changing capability to advance everything from battlefield decision-making to constituent experience. IDC predicts that the collective sum of the world's data will grow 33%, to 175 zettabytes, by 2025. For context, at today's average internet connection speeds, 175 zettabytes would take 1.8 billion years for one person to download. Streaming has only further accelerated the velocity of data growth.


Artificial Intelligence as an Anti-Corruption Tool (AI-ACT) -- Potentials and Pitfalls for Top-down and Bottom-up Approaches

Köbis, Nils, Starke, Christopher, Rahwan, Iyad

arXiv.org Artificial Intelligence

Corruption continues to be one of the biggest societal challenges of our time. New hope is placed in Artificial Intelligence (AI) to serve as an unbiased anti-corruption agent. Ever more available (open) government data paired with unprecedented performance of such algorithms render AI the next frontier in anti-corruption. Summarizing existing efforts to use AI-based anti-corruption tools (AI-ACT), we introduce a conceptual framework to advance research and policy. It outlines why AI presents a unique tool for top-down and bottom-up anti-corruption approaches. For both approaches, we outline in detail how AI-ACT present different potentials and pitfalls for (a) input data, (b) algorithmic design, and (c) institutional implementation. Finally, we venture a look into the future and flesh out key questions that need to be addressed to develop AI-ACT while considering citizens' views, hence putting "society in the loop".


AI Startups Need Data, and the Government Needs Help - ReadWrite

#artificialintelligence

Due to their unique oversight, governments have a surplus of data at their fingertips. Used properly, this available data could enable them to create beneficial programs that tackle problems in economics, policy, transportation, and civic life. Unfortunately, the majority of that data is untapped. Here are the facts about AI startups needing data, and how that helps governments. All hope is not lost, though.


Dominic Cummings wants 'weirdos' to help run the UK. Will it work?

New Scientist

Dominic Cummings, a senior adviser to UK prime minister Boris Johnson, has said he wants the UK government to hire "weirdos and misfits with odd skills" to apply science to the civil service. While primarily a quirky job ad, his blog post also offers a glimpse into how he sees scientific research transforming the government. As well as listing categories of people he would like to hire – including mathematicians and physicists – the blog post also focuses on the utility of data science, artificial intelligence and the "science of prediction". But does his vision make sense? Can policy-making really be improved by building digital models of reality, or applying machine-learning to government data, as Cummings appears keen on?


Finding Public Data for Your Machine Learning Pipelines

#artificialintelligence

The goal of the article is to help you find a dataset from public data that you can use for your machine learning pipeline, whether it be for a machine learning demo, proof-of-concept, or research project. It may not always be possible to collect your own data, but by using public data, you can create machine learning pipelines that can be useful for a large number of applications. Without data you cannot be sure a machine learning model works. However, the data you need may not always be readily available. Data may not have been collected or labeled yet or may not be readily available for machine learning model development because of technological, budgetary, privacy, or security concerns. Especially in a business contexts, stakeholders want to see how a machine learning system will work before investing the time and money in collecting, labeling, and moving data into such a system. This makes finding substitute data necessary. This article wants to provide some light into how to find and use public data for various machine learning applications such as machine learning demos, proofs-of-concept, or research projects.