Law
Fair Sequential Selection Using Supervised Learning Models
We consider a selection problem where sequentially arrived applicants apply for a limited number of positions/jobs. At each time step, a decision maker accepts or rejects the given applicant using a pre-trained supervised learning model until all the vacant positions are filled. In this paper, we discuss whether the fairness notions (e.g., equal opportunity, statistical parity, etc.) that are commonly used in classification problems are suitable for the sequential selection problems. In particular, we show that even with a pre-trained model that satisfies the common fairness notions, the selection outcomes may still be biased against certain demographic groups. This observation implies that the fairness notions used in classification problems are not suitable for a selection problem where the applicants compete for a limited number of positions. We introduce a new fairness notion, "Equal Selection (ES)," suitable for sequential selection problems and propose a post-processing approach to satisfy the ES fairness notion. We also consider a setting where the applicants have privacy concerns, and the decision maker only has access to the noisy version of sensitive attributes. In this setting, we can show that the perfect ES fairness can still be attained under certain conditions.
DARE: Disentanglement-Augmented Rationale Extraction
Rationale extraction can be considered as a straightforward method of improving the model explainability, where rationales are a subsequence of the original inputs, and can be extracted to support the prediction results. Existing methods are mainly cascaded with the selector which extracts the rationale tokens, and the predictor which makes the prediction based on selected tokens. Since previous works fail to fully exploit the original input, where the information of non-selected tokens is ignored, in this paper, we propose a Disentanglement-Augmented Rationale Extraction (DARE) method, which encapsulates more information from the input to extract rationales. Specifically, it first disentangles the input into the rationale representations and the non-rationale ones, and then learns more comprehensive rationale representations for extracting by minimizing the mutual information (MI) between the two disentangled representations. Besides, to improve the performance of MI minimization, we develop a new MI estimator by exploring existing MI estimation methods. Extensive experimental results on three real-world datasets and simulation studies clearly validate the effectiveness of our proposed method. Code is released at https://github.com/yuelinan/DARE.
A Appendix
We compare with Indic and Non-Indic datasets. A.1 Comparison with existing datasets In this section, we compare our proposed MACD with existing datasets in detail in Table 10. We note that large scale datasets containing more than 50K samples exist for some non-Indic languages like English, Greek and Turkish language. These datasets enable large-scale study of abuse detection for these languages. However, for other languages, presence of large-scale datasets is still lacking. Next, we compare with Indic datasets and note that Indic datasets are small-scale as compared to non-Indic datasets. This shows that there is an immediate requirement for a dataset like MACD to fill this gap and foster advancements in abuse detection in Indic languages. Overall and at language level, MACD is one of the largest dataset for studying Indic languages. A.2 MACD dataset Explicit warning: We want to urge the community to be mindful of the fact that our dataset MACD contains comments which express abusive behaviour towards religion, region, gender etc. that might be abusive and depressing to the researchers.
MACD: Multilingual Abusive Comment Detection at Scale for Indic Languages
Social media platforms were conceived to act as online'town squares' where people could get together, share information and communicate with each other peacefully. However, harmful content borne out of bad actors are constantly plaguing these platforms slowly converting them into'mosh pits' where the bad actors take the liberty to extensively abuse various marginalised groups. Accurate and timely detection of abusive content on social media platforms is therefore very important for facilitating safe interactions between users. However, due to the small scale and sparse linguistic coverage of Indic abusive speech datasets, development of such algorithms for Indic social media users (one-sixth of global population) is severely impeded.
Generative Forests
We focus on generative AI for a type of data that still represent one of the most prevalent form of data: tabular data. Our paper introduces two key contributions: a new powerful class of forest-based models fit for such tasks and a simple training algorithm with strong convergence guarantees in a boosting model that parallels that of the original weak / strong supervised learning setting. This algorithm can be implemented by a few tweaks to the most popular induction scheme for decision tree induction (i.e.
Congress Passed a Sweeping Free-Speech Crackdown--and No One's Talking About It
Sign up for the Slatest to get the most insightful analysis, criticism, and advice out there, delivered to your inbox daily. Had you scanned any of the latest headlines around the TAKE IT DOWN Act, legislation that President Donald Trump signed into law Monday, you would have come away with a deeply mistaken impression of the bill and its true purpose. The surface-level pitch is that this is a necessary law for addressing nonconsensual intimate images--known more widely as revenge porn. Obfuscating its intent with a classic congressional acronym (Tools to Address Known Exploitation by Immobilizing Technological Deepfakes on Websites and Networks), the TAKE IT DOWN Act purports to help scrub the internet of exploitative, nonconsensual sexual media, whether real or digitally mocked up, at a time when artificial intelligence tools and automated image generators have supercharged its spread. Enforcement is delegated to the Federal Trade Commission, which will give online communities that specialize primarily in user-generated content (e.g., social media, message boards) a heads-up and a 48-hour takedown deadline whenever an appropriate example is reported.
Leak reveals what Sam Altman and Jony Ive are cooking up: 100 million AI companion devices
OpenAI and Jony Ive's vision for its AI device is a screenless companion that knows everything about you. Details leaked to the Wall Street Journal give us a clearer picture of OpenAI's acquisition of io, cofounded by Ive, the iconic iPhone designer. The ChatGPT maker reportedly plans to ship 100 million AI devices designed to fit in with users' everyday life. "The product will be capable of being fully aware of a user's surroundings and life, will be unobtrusive, able to rest in one's pocket or on one's desk," according to a recording of an OpenAI staff meeting reviewed by the Journal. The device "will be a third core device a person would put on a desk after a MacBook Pro and an iPhone," per the meeting which occurred the same day (Wednesday) that OpenAI announced its acquisition of Ive's company.
Politico's Newsroom Is Starting a Legal Battle With Management Over AI
Politico became one of the first newsrooms last year to win a union contract that included rules on how the media outlet can deploy artificial intelligence. The PEN Guild, which represents Politico and its sister publication, environment and energy site E&E News, is now gearing up for another first. The union's members allege that the AI provisions in their contract have been violated, and they're preparing for a groundbreaking legal dispute with management. The outcome could set a precedent for how much input journalists ultimately have over how AI is used in their newsrooms. Last year, Politico began publishing AI-generated live news summaries during big political events like the Democratic National Convention and the US vice presidential debates.
Who's to Blame When AI Agents Screw Up?
Over the past year, veteran software engineer Jay Prakash Thakur has spent his nights and weekends prototyping AI agents that could, in the near future, order meals and engineer mobile apps almost entirely on their own. His agents, while surprisingly capable, have also exposed new legal questions that await companies trying to capitalize on Silicon Valley's hottest new technology. Agents are AI programs that can act mostly independently, allowing companies to automate tasks such as answering customer questions or paying invoices. While ChatGPT and similar chatbots can draft emails or analyze bills upon request, Microsoft and other tech giants expect that agents will tackle more complex functions--and most importantly, do it with little human oversight. The tech industry's most ambitious plans involve multi-agent systems, with dozens of agents someday teaming up to replace entire workforces.
Interview with Gillian Hadfield: Normative infrastructure for AI alignment
During the 33rd International Joint Conference on Artificial Intelligence (IJCAI), held in Jeju, I had the opportunity to meet with one of the keynote speakers, Gillian Hadfield. We spoke about her interdisciplinary research, career trajectory, path into AI alignment, law, and general thoughts on AI systems. Transcript: Note: the transcript has been lightly edited for clarity. This is an interview with Professor Gillian Hadfield who was a keynote speaker at IJCAI 2024. She gave a very insightful talk about normative infrastructures and how they can guide our search for AI alignment. Kumar Kshitij Patel (KKP): Could you talk a bit about your background and career trajectory? I want our readers to understand how much interdisciplinary work you've done over the years. Gillian Hadfield (GH): I did a PhD in economics and a law degree, a JD, at Stanford, originally motivated by wanting to think about the big questions about the world. So I read John Rawls' theory of justice when I was an undergraduate, and those are the big questions: how do we organize the world and just institutions, but I was very interested in using more formal methods and social scientific approaches. That's why I decided to do that joint degree. So, this is in the 1980s, and in the early days of starting to use a lot of game theory. I studied information theory, a student of Canaro and Paul Milgram at the economics department at Stanford. I did work on contract theory, bargaining theory, but I was still very interested in going to law school, not to practice law, but to learn about legal institutions and how those work. I was a member of this emerging area of law and economics early in my career, which of course, was interdisciplinary, using economics to think about law and legal institutions.