AITopics

2412.09947

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Mexico > Yucatán > Mérida (0.04)
Europe > Slovakia (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.48)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Clavell, Gemma Galdon, González-Sendino, Rubén

What we learned while automating bias detection in AI hiring systems for compliance with NYC Local Law 144

arXiv.org Artificial IntelligenceDec-13-2024

Since July 5, 2023, New York City's Local Law 144 requires employers to conduct independent bias audits for any automated employment decision tools (AEDTs) used in hiring processes. The law outlines a minimum set of bias tests that AI developers and implementers must perform to ensure compliance. Over the past few months, we have collected and analyzed audits conducted under this law, identified best practices, and developed a software tool to streamline employer compliance. Our tool, ITACA_144, tailors our broader bias auditing framework to meet the specific requirements of Local Law 144. While automating these legal mandates, we identified several critical challenges that merit attention to ensure AI bias regulations and audit methodologies are both effective and practical. This document presents the insights gained from automating compliance with NYC Local Law 144. It aims to support other cities and states in crafting similar legislation while addressing the limitations of the NYC framework. The discussion focuses on key areas including data requirements, demographic inclusiveness, impact ratios, effective bias, metrics, and data reliability.

artificial intelligence, audit, local law 144, (15 more...)

2501.10371

Country:

North America > United States > New York (0.25)
North America > United States > Alaska (0.05)

Genre: Research Report (0.50)

Industry: Law > Government & the Courts (1.00)

Technology: Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.34)

arXiv.org Artificial IntelligenceDec-13-2024

AI and the Future of Digital Public Squares

Goldberg, Beth, Acosta-Navas, Diana, Bakker, Michiel, Beacock, Ian, Botvinick, Matt, Buch, Prateek, DiResta, Renée, Donthi, Nandika, Fast, Nathanael, Iyer, Ravi, Jalan, Zaria, Konya, Andrew, Danciu, Grace Kwak, Landemore, Hélène, Marwick, Alice, Miller, Carl, Ovadya, Aviv, Saltz, Emily, Schirch, Lisa, Shalom, Dalit, Siddarth, Divya, Sieker, Felix, Small, Christopher, Stray, Jonathan, Tang, Audrey, Tessler, Michael Henry, Zhang, Amy

Two substantial technological advances have reshaped the public square in recent decades: first with the advent of the internet and second with the recent introduction of large language models (LLMs). LLMs offer opportunities for a paradigm shift towards more decentralized, participatory online spaces that can be used to facilitate deliberative dialogues at scale, but also create risks of exacerbating societal schisms. Here, we explore four applications of LLMs to improve digital public squares: collective dialogue systems, bridging systems, community moderation, and proof-of-humanity systems. Building on the input from over 70 civil society experts and technologists, we argue that LLMs both afford promising opportunities to shift the paradigm for conversations at scale and pose distinct risks for digital public squares. We lay out an agenda for future research and investments in AI that will strengthen digital public squares and safeguard against potential misuses of AI.

moderation, participant, platform, (16 more...)

2412.09988

Country:

Asia > Middle East > Republic of Türkiye > Konya Province > Konya (0.04)
North America > United States > New Jersey (0.04)
North America > United States > New York (0.04)
(10 more...)

Genre: Research Report (1.00)

Industry:

Law > Civil Rights & Constitutional Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

EngadgetDec-12-2024, 13:30:30 GMT

ACLU highlights the rise of AI-generated police reports -- what could go wrong?

The American Civil Liberties Association (ACLU) is sounding a warning about the use of AI in creating police reports, saying the tech could produce errors that affect evidence and court cases. The nonprofit highlighted the dangers of the tech in a white paper, following news that police departments in California are using a program called Draft One from Axon to transcribe body camera recording and create a first draft of police reports. One police department in Fresno said that it's using Draft One under a pilot program, but only for misdemeanor reports. "It's nothing more than a template," deputy chief Rob Beckwith told Industry Insider. "It's not designed to have an officer push a button and generate a report." He said that the department has seen any errors with transcriptions and that it consulted with the Fresno County DA's office in training the force, However, the ACLU noted four issues with the use of AI.

ai-generated police report, artificial intelligence, police report, (3 more...)

Engadget

Country: North America > United States > California > Fresno County (0.27)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)

Technology: Information Technology > Artificial Intelligence (1.00)

Fenoaltea, Enrico Maria, Mazzilli, Dario, Patelli, Aurelio, Sbardella, Angelica, Tacchella, Andrea, Zaccaria, Andrea, Trombetti, Marco, Pietronero, Luciano

Follow the money: a startup-based measure of AI exposure across occupations, industries and regions

The integration of artificial intelligence (AI) into the workplace is advancing rapidly, necessitating robust metrics to evaluate its tangible impact on the labour market. Existing measures of AI occupational exposure largely focus on AI's theoretical potential to substitute or complement human labour on the basis of technical feasibility, providing limited insight into actual adoption and offering inadequate guidance for policymakers. To address this gap, we introduce the AI Startup Exposure (AISE) index-a novel metric based on occupational descriptions from O*NET and AI applications developed by startups funded by the Y Combinator accelerator. Our findings indicate that while high-skilled professions are theoretically highly exposed according to conventional metrics, they are heterogeneously targeted by startups. Roles involving routine organizational tasks-such as data analysis and office management-display significant exposure, while occupations involving tasks that are less amenable to AI automation due to ethical or high-stakes, more than feasibility, considerations -- such as judges or surgeons -- present lower AISE scores. By focusing on venture-backed AI applications, our approach offers a nuanced perspective on how AI is reshaping the labour market. It challenges the conventional assumption that high-skilled jobs uniformly face high AI risks, highlighting instead the role of today's AI players' societal desirability-driven and market-oriented choices as critical determinants of AI exposure. Contrary to fears of widespread job displacement, our findings suggest that AI adoption will be gradual and shaped by social factors as much as by the technical feasibility of AI applications. This framework provides a dynamic, forward-looking tool for policymakers and stakeholders to monitor AI's evolving impact and navigate the changing labour landscape.

large language model, machine learning, natural language, (18 more...)

2412.04924

Country:

North America > United States > California > San Francisco County > San Francisco (0.04)
North America > United States > Tennessee > Davidson County > Nashville (0.04)
North America > United States > Wisconsin (0.04)
(13 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(4 more...)

Allouah, Youssef, Kazdan, Joshua, Guerraoui, Rachid, Koyejo, Sanmi

The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning

Machine unlearning, the process of selectively removing data from trained models, is increasingly crucial for addressing privacy concerns and knowledge gaps post-deployment. Despite this importance, existing approaches are often heuristic and lack formal guarantees. In this paper, we analyze the fundamental utility, time, and space complexity trade-offs of approximate unlearning, providing rigorous certification analogous to differential privacy. For in-distribution forget data -- data similar to the retain set -- we show that a surprisingly simple and general procedure, empirical risk minimization with output perturbation, achieves tight unlearning-utility-complexity trade-offs, addressing a previous theoretical gap on the separation from unlearning "for free" via differential privacy, which inherently facilitates the removal of such data. However, such techniques fail with out-of-distribution forget data -- data significantly different from the retain set -- where unlearning time complexity can exceed that of retraining, even for a single sample. To address this, we propose a new robust and noisy gradient descent variant that provably amortizes unlearning time complexity without compromising utility.

artificial intelligence, complexity, machine learning, (15 more...)

2412.09119

Country:

North America > United States > California (0.04)
Europe > Switzerland (0.04)
Europe > Italy (0.04)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.93)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)

LCFO: Long Context and Long Form Output Dataset and Benchmarking

Costa-jussà, Marta R., Andrews, Pierre, Meglioli, Mariano Coria, Chen, Joy, Chuang, Joe, Dale, David, Ropers, Christophe, Mourachko, Alexandre, Sánchez, Eduardo, Schwenk, Holger, Tran, Tuan, Turkatenko, Arina, Wood, Carleigh

This paper presents the Long Context and Form Output (LCFO) benchmark, a novel evaluation framework for assessing gradual summarization and summary expansion capabilities across diverse domains. LCFO consists of long input documents (5k words average length), each of which comes with three summaries of different lengths (20%, 10%, and 5% of the input text), as well as approximately 15 questions and answers (QA) related to the input content. Notably, LCFO also provides alignments between specific QA pairs and corresponding summaries in 7 domains. The primary motivation behind providing summaries of different lengths is to establish a controllable framework for generating long texts from shorter inputs, i.e. summary expansion. To establish an evaluation metric framework for summarization and summary expansion, we provide human evaluation scores for human-generated outputs, as well as results from various state-of-the-art large language models (LLMs). GPT-4o-mini achieves best human scores among automatic systems in both summarization and summary expansion tasks (~ +10% and +20%, respectively). It even surpasses human output quality in the case of short summaries (~ +7%). Overall automatic metrics achieve low correlations with human evaluation scores (~ 0.4) but moderate correlation on specific evaluation aspects such as fluency and attribution (~ 0.6). The LCFO benchmark offers a standardized platform for evaluating summarization and summary expansion performance, as well as corresponding automatic metrics, thereby providing an important evaluation framework to advance generative AI.

large language model, machine learning, natural language, (19 more...)

2412.08268

Country:

North America > United States (1.00)
Asia > Thailand > Bangkok > Bangkok (0.04)
Asia > Singapore (0.04)
(10 more...)

Genre:

Overview (0.93)
Research Report > New Finding (0.45)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

AI Red-Teaming is a Sociotechnical System. Now What?

Gillespie, Tarleton, Shaw, Ryland, Gray, Mary L., Suh, Jina

Whether tapped directly on the web, or embedded in software suites, search engines, and social media platforms, LLMs are everywhere. When a technology jumps this quickly from theoretical plaything to consumer service, many other elements are also settling in around it, without much forethought: interfaces, policies, business models, labor arrangements, infrastructural assurances, complementary technologies, public claims, advertising campaigns, regulations. Researchers studying the workings and implications of these technologies, across computer science, engineering, the social sciences, humanities, and law, must gear up just as fast to study not just the core technology, but the sociotechnical system taking shape around it[19]. Many of these decisions, arrangements, and infrastructures may turn out to be as consequential for users and the broader public as the core technology itself. But the boisterous promises and debates that surround a new technology can obscure these other essential elements that make technologies always more than the sum of their engineered parts. In this essay, we hope to call upon computer scientists and social scientists alike to pay closer, critical attention to thephenomenonof"red-teaming."

large language model, machine learning, natural language, (18 more...)

2412.09751

Country:

North America > United States > California (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Washington > King County > Redmond (0.04)
(7 more...)

Genre: Research Report (0.51)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Information Technology > Security & Privacy (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.30)

Perron, Yohann, Sydorov, Vladyslav, Wijker, Adam P., Evans, Damian, Pottier, Christophe, Landrieu, Loic

Archaeoscape: Bringing Aerial Laser Scanning Archaeology to the Deep Learning Era

Airborne Laser Scanning (ALS) technology has transformed modern archaeology by unveiling hidden landscapes beneath dense vegetation. However, the lack of expert-annotated, open-access resources has hindered the analysis of ALS data using advanced deep learning techniques. We address this limitation with Archaeoscape (available at https://archaeoscape.ai/data/2024/), a novel large-scale archaeological ALS dataset spanning 888 km$^2$ in Cambodia with 31,141 annotated archaeological features from the Angkorian period. Archaeoscape is over four times larger than comparable datasets, and the first ALS archaeology resource with open-access data, annotations, and models. We benchmark several recent segmentation models to demonstrate the benefits of modern vision techniques for this problem and highlight the unique challenges of discovering subtle human-made structures under dense jungle canopies. By making Archaeoscape available in open access, we hope to bridge the gap between traditional archaeology and modern computer vision methods.

artificial intelligence, deep learning, machine learning, (18 more...)

2412.05203

Country:

Asia > Cambodia (0.34)
Asia > Thailand (0.14)
Europe > Netherlands (0.05)
(16 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Law (1.00)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios

Zhou, Ruiwen, Hua, Wenyue, Pan, Liangming, Cheng, Sitao, Wu, Xiaobao, Yu, En, Wang, William Yang

This paper introduces RuleArena, a novel and challenging benchmark designed to evaluate the ability of large language models (LLMs) to follow complex, real-world rules in reasoning. Covering three practical domains -- airline baggage fees, NBA transactions, and tax regulations -- RuleArena assesses LLMs' proficiency in handling intricate natural language instructions that demand long-context understanding, logical reasoning, and accurate mathematical computation. Two key attributes distinguish RuleArena from traditional rule-based reasoning benchmarks: (1) it extends beyond standard first-order logic representations, and (2) it is grounded in authentic, practical scenarios, providing insights into the suitability and reliability of LLMs for real-world applications. Our findings reveal several notable limitations in LLMs: (1) they struggle to identify and apply the appropriate rules, frequently becoming confused by similar but distinct regulations, (2) they cannot consistently perform accurate mathematical computations, even when they correctly identify the relevant rules, and (3) in general, they perform poorly in the benchmark. These results highlight significant challenges in advancing LLMs' rule-guided reasoning capabilities in real-life applications.

large language model, machine learning, natural language, (20 more...)

2412.08972

Country:

North America > Canada (0.14)
North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
(2 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Transportation > Passenger (1.00)
Transportation > Air (1.00)
Law > Taxation Law (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)