Law
My Precious Crash Data: Barriers and Opportunities in Encouraging Autonomous Driving Companies to Share Safety-Critical Data
Sandhaus, Hauke, Hwang, Angel Hsing-Chi, Ju, Wendy, Yang, Qian
Safety-critical data, such as crash and near-crash records, are crucial to improving autonomous vehicle (AV) design and development. Sharing such data across AV companies, academic researchers, regulators, and the public can help make all AVs safer. However, AV companies rarely share safety-critical data externally. This paper aims to pinpoint why AV companies are reluctant to share safety-critical data, with an eye on how these barriers can inform new approaches to promote sharing. We interviewed twelve AV company employees who actively work with such data in their day-to-day work. Findings suggest two key, previously unknown barriers to data sharing: (1) Datasets inherently embed salient knowledge that is key to improving AV safety and are resource-intensive. Therefore, data sharing, even within a company, is fraught with politics. (2) Interviewees believed AV safety knowledge is private knowledge that brings competitive edges to their companies, rather than public knowledge for social good. We discuss the implications of these findings for incentivizing and enabling safety-critical AV data sharing, specifically, implications for new approaches to (1) debating and stratifying public and private AV safety knowledge, (2) innovating data tools and data sharing pipelines that enable easier sharing of public AV safety data and knowledge; (3) offsetting costs of curating safety-critical data and incentivizing data sharing.
Learning High-dimensional Gaussians from Censored Data
Bhattacharyya, Arnab, Daskalakis, Constantinos, Gouleakis, Themis, Wang, Yuhao
We provide efficient algorithms for the problem of distribution learning from high-dimensional Gaussian data where in each sample, some of the variable values are missing. We suppose that the variables are missing not at random (MNAR). The missingness model, denoted by $S(y)$, is the function that maps any point $y$ in $R^d$ to the subsets of its coordinates that are seen. In this work, we assume that it is known. We study the following two settings: (i) Self-censoring: An observation $x$ is generated by first sampling the true value $y$ from a $d$-dimensional Gaussian $N(\mu*, \Sigma*)$ with unknown $\mu*$ and $\Sigma*$. For each coordinate $i$, there exists a set $S_i$ subseteq $R^d$ such that $x_i = y_i$ if and only if $y_i$ in $S_i$. Otherwise, $x_i$ is missing and takes a generic value (e.g., "?"). We design an algorithm that learns $N(\mu*, \Sigma*)$ up to total variation (TV) distance epsilon, using $poly(d, 1/\epsilon)$ samples, assuming only that each pair of coordinates is observed with sufficiently high probability. (ii) Linear thresholding: An observation $x$ is generated by first sampling $y$ from a $d$-dimensional Gaussian $N(\mu*, \Sigma)$ with unknown $\mu*$ and known $\Sigma$, and then applying the missingness model $S$ where $S(y) = {i in [d] : v_i^T y <= b_i}$ for some $v_1, ..., v_d$ in $R^d$ and $b_1, ..., b_d$ in $R$. We design an efficient mean estimation algorithm, assuming that none of the possible missingness patterns is very rare conditioned on the values of the observed coordinates and that any small subset of coordinates is observed with sufficiently high probability.
Mitigating Bias in Facial Recognition Systems: Centroid Fairness Loss Optimization
Conti, Jean-Rรฉmy, Clรฉmenรงon, Stรฉphan
The urging societal demand for fair AI systems has put pressure on the research community to develop predictive models that are not only globally accurate but also meet new fairness criteria, reflecting the lack of disparate mistreatment with respect to sensitive attributes ($\textit{e.g.}$ gender, ethnicity, age). In particular, the variability of the errors made by certain Facial Recognition (FR) systems across specific segments of the population compromises the deployment of the latter, and was judged unacceptable by regulatory authorities. Designing fair FR systems is a very challenging problem, mainly due to the complex and functional nature of the performance measure used in this domain ($\textit{i.e.}$ ROC curves) and because of the huge heterogeneity of the face image datasets usually available for training. In this paper, we propose a novel post-processing approach to improve the fairness of pre-trained FR models by optimizing a regression loss which acts on centroid-based scores. Beyond the computational advantages of the method, we present numerical experiments providing strong empirical evidence of the gain in fairness and of the ability to preserve global accuracy.
Pressure grows on State Bar of California to revert to national exam format in July after botched exam
An influential California legislator is pressuring the State Bar of California to ditch its new multiple-choice questions after a February bar exam debacle and revert to the traditional test format in July. "Given the catastrophe of the February bar, I think that going back to the methods that have been used for the last 50 years -- until we can adequately test what new methods may be employed -- is the appropriate way to go," Sen. Tom Umberg (D-Orange), chair of the state Senate Judiciary Committee, told The Times. Thousands of test takers seeking to practice law in California typically take the two-day bar exam in July. Reverting to the national system by the National Conference of Bar Examiners, which California has used since 1972, would be a major retreat for the embattled State Bar. Its new exam was rolled out this year as a cost-cutting measure and "historic agreement" that would offer test takers the choice of remote testing.
Heartbreaking: Elon Musk Just Made a Great Point About Free Speech
Sign up for the Slatest to get the most insightful analysis, criticism, and advice out there, delivered to your inbox daily. "Free speech" was the battering ram that Elon Musk used to justify his pursuit of Twitter in 2022. He talked about the platform as the new digital town square. He said social media companies' moderation policies should be no more restrictive than national laws. "I hope that even my worst critics remain on Twitter, because that is what free speech means," he wrote after agreeing to a 44 billion takeover. In the three years since making the deal, Musk has continued to cloak himself in the armor of a free speech warrior, out there fighting for the rest of us.
The vultures are circling for Chrome
Google has a monopoly, and that's the official line of the US federal government. In fact, it has two of them, losing two separate antitrust cases that threaten to cripple the tech giant. The Department of Justice has proposed forcing Google to sell or otherwise divest itself of the Chrome browser as its first and preferred remedy. But who would buy it? Unsurprisingly, there are beaucoup business beaus lining up around the block for this browser bachelorette.
Elon Musk's xAI accused of pollution over Memphis supercomputer
Elon Musk's artificial intelligence company is stirring controversy in Memphis, Tennessee. That's where he's building a massive supercomputer to power his company xAI. Community residents and environmental activists say that since the supercomputer was fired up last summer it has become one of the biggest air polluters in the county. But some local officials have championed the billionaire, saying he's investing in Memphis. The first public hearing with the health department is scheduled for Friday, where county officials will hear from all sides of the debate.
Japan's Lower House passes AI promotion bill
The House of Representatives, Japan's lower chamber of parliament, passed a bill on Thursday to promote the development of artificial intelligence technology and take steps to mitigate its risks. The legislation is expected to be enacted during the current parliamentary session set to end in June after deliberations at the House of Councilors, the upper chamber. AI "will be the foundation of economic and social development and is an important technology from the viewpoint of security," the bill said.
The Malicious Technical Ecosystem: Exposing Limitations in Technical Governance of AI-Generated Non-Consensual Intimate Images of Adults
Ding, Michelle L., Suresh, Harini
In this paper, we adopt a survivor-centered approach to locate and dissect the role of sociotechnical AI governance in preventing AI-Generated Non-Consensual Intimate Images (AIG-NCII) of adults, colloquially known as "deep fake pornography." We identify a "malicious technical ecosystem" or "MTE," comprising of open-source face-swapping models and nearly 200 "nudifying" software programs that allow non-technical users to create AIG-NCII within minutes. Then, using the National Institute of Standards and Technology (NIST) AI 100-4 report as a reflection of current synthetic content governance methods, we show how the current landscape of practices fails to effectively regulate the MTE for adult AIG-NCII, as well as flawed assumptions explaining these gaps.
Towards a comprehensive taxonomy of online abusive language informed by machine leaning
Moghaddam, Samaneh Hosseini, Lyons, Kelly, Regehr, Cheryl, Goel, Vivek, Regehr, Kaitlyn
The proliferation of abusive language in online communications has posed significant risks to the health and wellbeing of individuals and communities. The growing concern regarding online abuse and its consequences necessitates methods for identifying and mitigating harmful content and facilitating continuous monitoring, moderation, and early intervention. This paper presents a taxonomy for distinguishing key characteristics of abusive language within online text. Our approach uses a systematic method for taxonomy development, integrating classification systems of 18 existing multi-label datasets to capture key characteristics relevant to online abusive language classification. The resulting taxonomy is hierarchical and faceted, comprising 5 categories and 17 dimensions. It classifies various facets of online abuse, including context, target, intensity, directness, and theme of abuse. This shared understanding can lead to more cohesive efforts, facilitate knowledge exchange, and accelerate progress in the field of online abuse detection and mitigation among researchers, policy makers, online platform owners, and other stakeholders.