Goto

Collaborating Authors

 screener


Signs of dyslexia and reading troubles can be spotted in kindergarten -- or even preschool

Los Angeles Times

Things to Do in L.A. Tap to enable a layout that focuses on the article. Vanessa Silver, who tutors young children with dyslexia, works with Liina Yerro, 9, in Granada Hills. This is read by an automated voice. Please report any issues or inconsistencies here . California to begin universal screening of kindergarten through second-grade students for reading difficulties, including dyslexia.


RTBAS: Defending LLM Agents Against Prompt Injection and Privacy Leakage

Zhong, Peter Yong, Chen, Siyuan, Wang, Ruiqi, McCall, McKenna, Titzer, Ben L., Miller, Heather, Gibbons, Phillip B.

arXiv.org Artificial Intelligence

Tool-Based Agent Systems (TBAS) allow Language Models (LMs) to use external tools for tasks beyond their standalone capabilities, such as searching websites, booking flights, or making financial transactions. However, these tools greatly increase the risks of prompt injection attacks, where malicious content hijacks the LM agent to leak confidential data or trigger harmful actions. Existing defenses (OpenAI GPTs) require user confirmation before every tool call, placing onerous burdens on users. We introduce Robust TBAS (RTBAS), which automatically detects and executes tool calls that preserve integrity and confidentiality, requiring user confirmation only when these safeguards cannot be ensured. RTBAS adapts Information Flow Control to the unique challenges presented by TBAS. We present two novel dependency screeners, using LM-as-a-judge and attention-based saliency, to overcome these challenges. Experimental results on the AgentDojo Prompt Injection benchmark show RTBAS prevents all targeted attacks with only a 2% loss of task utility when under attack, and further tests confirm its ability to obtain near-oracle performance on detecting both subtle and direct privacy leaks.


Learning from Response not Preference: A Stackelberg Approach for LLM Detoxification using Non-parallel Data

Xie, Xinhong, Li, Tao, Zhu, Quanyan

arXiv.org Artificial Intelligence

Text detoxification, a variant of style transfer tasks, finds useful applications in online social media. This work presents a fine-tuning method that only uses non-parallel data to turn large language models (LLM) into a detoxification rewritter. We model the fine-tuning process as a Stackelberg game between an LLM (leader) and a toxicity screener (follower), which is a binary style classifier (toxic or non-toxic). The LLM aims to align its preference according to the screener and generate paraphases passing the screening. The primary challenge of non-parallel data fine-tuning is incomplete preference. In the case of unsuccessful paraphrases, the classifier cannot establish a preference between the input and paraphrase, as they belong to the same toxic style. Hence, preference-alignment fine-tuning methods, such as direct preference optimization (DPO), no longer apply. To address the challenge of incomplete preference, we propose Stackelberg response optimization (SRO), adapted from DPO, to enable the LLM to learn from the follower's response. The gist is that SRO decreases the likelihood of generating the paraphrase if it fails the follower's screening while performing DPO on the pair of the toxic input and its paraphrase when the latter passes the screening. Experiments indicate that the SRO-fine-tunned LLM achieves satisfying performance comparable to state-of-the-art models regarding style accuracy, content similarity, and fluency. The overall detoxification performance surpasses other computing methods and matches the human reference. Additional empirical evidence suggests that SRO is sensitive to the screener's feedback, and a slight perturbation leads to a significant performance drop. We release the code and LLM models at \url{https://github.com/XXXinhong/Detoxification_LLM}.


SCREENER: A general framework for task-specific experiment design in quantitative MRI

Zheng, Tianshu, Wang, Zican, Bray, Timothy, Alexander, Daniel C., Wu, Dan, Zhang, Hui

arXiv.org Artificial Intelligence

Quantitative magnetic resonance imaging (qMRI) is increasingly investigated for use in a variety of clinical tasks from diagnosis, through staging, to treatment monitoring. However, experiment design in qMRI, the identification of the optimal acquisition protocols, has been focused on obtaining the most precise parameter estimations, with no regard for the specific requirements of downstream tasks. Here we propose SCREENER: A general framework for task-specific experiment design in quantitative MRI. SCREENER incorporates a task-specific objective and seeks the optimal protocol with a deep-reinforcement-learning (DRL) based optimization strategy. To illustrate this framework, we employ a task of classifying the inflammation status of bone marrow using diffusion MRI data with intravoxel incoherent motion (IVIM) modelling. Results demonstrate SCREENER outperforms previous ad hoc and optimized protocols under clinical signal-to-noise ratio (SNR) conditions, achieving significant improvement, both in binary classification tasks, e.g. from 67% to 89%, and in a multi-class classification task, from 46% to 59%. Additionally, we show this improvement is robust to the SNR. Lastly, we demonstrate the advantage of DRL-based optimization strategy, enabling zero-shot discovery of near-optimal protocols for a range of SNRs not used in training. In conclusion, SCREENER has the potential to enable wider uptake of qMRI in the clinic.


The Initial Screening Order Problem

Alvarez, Jose M., Ruggieri, Salvatore

arXiv.org Artificial Intelligence

In this paper we present the initial screening order problem, a crucial step within candidate screening. It involves a human-like screener with an objective to find the first k suitable candidates rather than the best k suitable candidates in a candidate pool given an initial screening order. The initial screening order represents the way in which the human-like screener arranges the candidate pool prior to screening. The choice of initial screening order has considerable effects on the selected set of k candidates. We prove that under an unbalanced candidate pool (e.g., having more male than female candidates), the human-like screener can suffer from uneven efforts that hinder its decision-making over the protected, under-represented group relative to the non-protected, over-represented group. Other fairness results are proven under the human-like screener. This research is based on a collaboration with a large company to better understand its hiring process for potential automation. Our main contribution is the formalization of the initial screening order problem which, we argue, opens the path for future extensions of the current works on ranking algorithms, fairness, and automation for screening procedures.


Sibyl: Explaining Machine Learning Models for High-Stakes Decision Making

#artificialintelligence

As machine learning is applied to an increasingly large number of domains, the need for an effective way to explain its predictions grows apace. In the domain of child welfare screening, machine learning offers a promising method of consolidating the large amount of data that screeners must look at, potentially improving the outcomes for children reported to child welfare departments. Interviews and case-studies suggest that adding an explanation alongside the model prediction may result in better outcomes, but it is not obvious what kind of explanation would be most useful in this context. Through a series of interviews and user studies, we developed Sibyl, a machine learning explanation dashboard specifically designed to aid child welfare screeners' decision making. When testing Sibyl, we evaluated four different explanation types, and based on this evaluation, decided a local feature contribution approach was most useful to screeners.


Algorithmic Hiring Needs a Human Face

Communications of the ACM

The way we apply for jobs has changed radically over the last 20 years, thanks to the arrival of sprawling online job-posting boards like LinkedIn, Indeed, and ZipRecruiter, and the use by hiring organizations of artificial intelligence (AI) algorithms to screen the tsunami of résumés that now gush forth from such sites into human resources (HR) departments. With video-based online job interviews now harnessing AI to analyze candidates' use of language and their performance in gamified aptitude tests, recruitment is becoming a decidedly algorithmic affair. Yet all is not well in HR's brave new world. After quizzing 8,000 job applicants and 2,250 hiring managers in the U.S., Germany, and Great Britain, researchers at Harvard Business School, working with the consultancy Accenture, discovered that many tens of millions of people are being barred from consideration for employment by résumé screening algorithms that throw out applicants who do not meet an unfeasibly large number of requirements, many of which are utterly irrelevant to the advertised job. For instance, says Joe Fuller, the Harvard professor of management practice who led the algorithmic hiring research, nurses and graphic designers who need merely to use computers have been barred from progressing to job interviews for not having experience, or degrees, in computer programming.


Making machine learning more useful to high-stakes decision makers

#artificialintelligence

The U.S. Centers for Disease Control and Prevention estimates that one in seven children in the United States experienced abuse or neglect in the past year. Child protective services agencies around the nation receive a high number of reports each year (about 4.4 million in 2019) of alleged neglect or abuse. With so many cases, some agencies are implementing machine learning models to help child welfare specialists screen cases and determine which to recommend for further investigation. But these models don't do any good if the humans they are intended to help don't understand or trust their outputs. Researchers at MIT and elsewhere launched a research project to identify and tackle machine learning usability challenges in child welfare screening.


Making machine learning more useful to high-stakes decision makers

#artificialintelligence

The U.S. Centers for Disease Control and Prevention estimates that one in seven children in the United States experienced abuse or neglect in the past year. Child protective services agencies around the nation receive a high number of reports each year (about 4.4 million in 2019) of alleged neglect or abuse. With so many cases, some agencies are implementing machine learning models to help child welfare specialists screen cases and determine which to recommend for further investigation. But these models don't do any good if the humans they are intended to help don't understand or trust their outputs. Researchers at MIT and elsewhere launched a research project to identify and tackle machine learning usability challenges in child welfare screening.


Making machine learning more useful to high-stakes decision makers

#artificialintelligence

The U.S. Centers for Disease Control and Prevention estimates that one in seven children in the United States experienced abuse or neglect in the past year. Child protective services agencies around the nation receive a high number of reports each year (about 4.4 million in 2019) of alleged neglect or abuse. With so many cases, some agencies are implementing machine learning models to help child welfare specialists screen cases and determine which to recommend for further investigation. But these models don't do any good if the humans they are intended to help don't understand or trust their outputs. Researchers at MIT and elsewhere launched a research project to identify and tackle machine learning usability challenges in child welfare screening.