Law
EMMA-X: An EM-like Multilingual Pre-training Algorithm for Cross-lingual Representation Learning
Expressing universal semantics common to all languages is helpful in understanding the meanings of complex and culture-specific sentences. The research theme underlying this scenario focuses on learning universal representations across languages with the usage of massive parallel corpora. However, due to the sparsity and scarcity of parallel data, there is still a big challenge in learning authentic "universals" for any two languages. In this paper, we propose EMMA-X: an EM-like Multilingual pre-training Algorithm, to learn (X)Cross-lingual universals with the aid of excessive multilingual non-parallel data.
Demographic Parity Constrained Minimax Optimal Regression under Linear Model
We explore the minimax optimal error associated with a demographic parityconstrained regression problem within the context of a linear model. Our proposed model encompasses a broader range of discriminatory bias sources compared to the model presented by Chzhen and Schreuder [6]. Our analysis reveals that the minimax optimal error for the demographic parity-constrained regression problem under our model is characterized by ฮ(dM/n), where ndenotes the sample size, d represents the dimensionality, and M signifies the number of demographic groups arising from sensitive attributes. Moreover, we demonstrate that the minimax error increases in conjunction with a larger bias present in the model.
Demographic Parity Constrained Minimax Optimal Regression under Linear Model
We explore the minimax optimal error associated with a demographic parityconstrained regression problem within the context of a linear model. Our proposed model encompasses a broader range of discriminatory bias sources compared to the model presented by Chzhen and Schreuder [6]. Our analysis reveals that the minimax optimal error for the demographic parity-constrained regression problem under our model is characterized by ฮ(dM/n), where ndenotes the sample size, d represents the dimensionality, and M signifies the number of demographic groups arising from sensitive attributes. Moreover, we demonstrate that the minimax error increases in conjunction with a larger bias present in the model.
Regulating algorithmic filtering on social media
By filtering the content that users see, social media platforms have the ability to influence users' perceptions and decisions, from their dining choices to their voting preferences. This influence has drawn scrutiny, with many calling for regulations on filtering algorithms, but designing and enforcing regulations remains challenging. In this work, we examine three questions. First, given a regulation, how would one design an audit to enforce it? Second, does the audit impose a performance cost on the platform?
Discord Sleuths Gained Unauthorized Access to Anthropic's Mythos
Plus: Spy firms tap into a global telecom weakness to track targets, 500,000 UK health records go up for sale on Alibaba, Apple patches a revealing notification bug, and more. As researchers and practitioners debate the impact that new AI models will have on cybersecurity, Mozilla said on Tuesday it used early access to Anthropic's Mythos Preview to find and fix 271 vulnerabilities in its new Firefox 150 browser release. Meanwhile, researchers identified a group of moderately successful North Korean hackers using AI for everything from vibe coding malware to creating fake company websites--stealing up to $12 million in three months. Researchers have finally cracked disruptive malware known as Fast16 that predates Stuxnet and may have been used to target Iran's nuclear program. It was created in 2005 and was likely deployed by the US or an ally.
Retiring Adult: New Datasets for Fair Machine Learning
Although the fairness community has recognized the importance of data, re-searchers in the area primarily rely on UCIAdult when it comes to tabular data. Derived from a 1994 USCensus survey, this dataset has appeared in hundreds of research papers where it served as the basis for the development and comparison of many algorithmic fairness interventions. We reconstruct a superset of the UCI Adult data from available USCensus sources and reveal idiosyncrasies of the UCIAdult dataset that limit its external validity. Our primary contribution is asuite of new datasets derived from USCensus surveys that extend the existing data ecosystem for research on fair machine learning. We create prediction tasks relating to income, employment, health, transportation, and housing. The data span multiple years and all states of the United States, allowing researchers to studytemporal shift and geographic variation. We highlight a broad initial sweep of new empirical insights relating to trade-offs between fairness criteria, performance of algorithmic interventions, and the role of distribution shift based on our new datasets. Our findings inform ongoing debates, challenge some existing narratives, and point to future research directions.