identifying
Identifying, Evaluating, and Mitigating Risks of AI Thought Partnerships
Oktar, Kerem, Collins, Katherine M., Hernandez-Orallo, Jose, Coyle, Diane, Cave, Stephen, Weller, Adrian, Sucholutsky, Ilia
Artificial Intelligence (AI) systems have historically been used as tools that execute narrowly defined tasks. Yet recent advances in AI have unlocked possibilities for a new class of models that genuinely collaborate with humans in complex reasoning, from conceptualizing problems to brainstorming solutions. Such AI thought partners enable novel forms of collaboration and extended cognition, yet they also pose major risks-including and beyond risks of typical AI tools and agents. In this commentary, we systematically identify risks of AI thought partners through a novel framework that identifies risks at multiple levels of analysis, including Real-time, Individual, and Societal risks arising from collaborative cognition (RISc). We leverage this framework to propose concrete metrics for risk evaluation, and finally suggest specific mitigation strategies for developers and policymakers. As AI thought partners continue to proliferate, these strategies can help prevent major harms and ensure that humans actively benefit from productive thought partnerships.
- North America > United States (0.15)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Government (1.00)
- Banking & Finance > Economy (0.46)
Identifying the Best Transition Law
Ahmadipour, Mehrasa, Crepon, élise, Garivier, Aurélien
Motivated by recursive learning in Markov Decision Processes, this paper studies best-arm identification in bandit problems where each arm's reward is drawn from a multinomial distribution with a known support. We compare the performance { reached by strategies including notably LUCB without and with use of this knowledge. } In the first case, we use classical non-parametric approaches for the confidence intervals. In the second case, where a probability distribution is to be estimated, we first use classical deviation bounds (Hoeffding and Bernstein) on each dimension independently, and then the Empirical Likelihood method (EL-LUCB) on the joint probability vector. The effectiveness of these methods is demonstrated through simulations on scenarios with varying levels of structural complexity.
- North America > United States > New York > New York County > New York City (0.14)
- North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
- Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)
RSV Can Be a Killer. New Tools Are Identifying the Most At-Risk Kids
After 25 years as a pediatric infectious diseases specialist, Asunción Mejías is too familiar with the deadly unpredictability of respiratory syncytial virus (RSV), an infection that hospitalizes up to 80,000 children under the age of 5 every year in the US. "It's a disease which can change very quickly," says Mejías, who works at St. Jude Children's Research Hospital in Memphis, Tennessee. "I've always told my colleagues that for every two children that are admitted, one can go to the ICU in the next three hours and the other one may go home the next day. RSV infections are very common, to the point that nearly every child will have one before they turn 2 years old. Most children experience symptoms similar to a cold, like coughing and sneezing, but some can develop severe lung disease: RSV is responsible for more than 100,000 infant deaths globally every year, nearly half of which are in babies under 6 months of age.
- South America > Paraguay > Asunción > Asunción (0.26)
- North America > United States > Tennessee > Shelby County > Memphis (0.26)
Raising the Bar(ometer): Identifying a User's Stair and Lift Usage Through Wearable Sensor Data Analysis
Karande, Hrishikesh Balkrishna, Shivalingappa, Ravikiran Arasur Thippeswamy, Yaici, Abdelhafid Nassim, Haghbin, Iman, Bavadiya, Niravkumar, Burchard, Robin, Van Laerhoven, Kristof
Many users are confronted multiple times daily with the choice of whether to take the stairs or the elevator. Whereas taking the stairs could be beneficial for cardiovascular health and wellness, taking the elevator might be more convenient but it also consumes energy. By precisely tracking and boosting users' stairs and elevator usage through their wearable, users might gain health insights and motivation, encouraging a healthy lifestyle and lowering the risk of sedentary-related health problems. This research describes a new exploratory dataset, to examine the patterns and behaviors related to using stairs and lifts. We collected data from 20 participants while climbing and descending stairs and taking a lift in a variety of scenarios. The aim is to provide insights and demonstrate the practicality of using wearable sensor data for such a scenario. Our collected dataset was used to train and test a Random Forest machine learning model, and the results show that our method is highly accurate at classifying stair and lift operations with an accuracy of 87.61% and a multi-class weighted F1-score of 87.56% over 8-second time windows. Furthermore, we investigate the effect of various types of sensors and data attributes on the model's performance. Our findings show that combining inertial and pressure sensors yields a viable solution for real-time activity detection.
- Europe > Germany > North Rhine-Westphalia > Arnsberg Region > Siegen (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > Canada > Newfoundland and Labrador > Labrador (0.04)
Identifying the Hierarchical Emotional Areas in the Human Brain Through Information Fusion
Huang, Zhongyu, Du, Changde, Li, Chaozhuo, Fu, Kaicheng, He, Huiguang
The brain basis of emotion has consistently received widespread attention, attracting a large number of studies to explore this cutting-edge topic. However, the methods employed in these studies typically only model the pairwise relationship between two brain regions, while neglecting the interactions and information fusion among multiple brain regions$\unicode{x2014}$one of the key ideas of the psychological constructionist hypothesis. To overcome the limitations of traditional methods, this study provides an in-depth theoretical analysis of how to maximize interactions and information fusion among brain regions. Building on the results of this analysis, we propose to identify the hierarchical emotional areas in the human brain through multi-source information fusion and graph machine learning methods. Comprehensive experiments reveal that the identified hierarchical emotional areas, from lower to higher levels, primarily facilitate the fundamental process of emotion perception, the construction of basic psychological operations, and the coordination and integration of these operations. Overall, our findings provide unique insights into the brain mechanisms underlying specific emotions based on the psychological constructionist hypothesis.
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > New York (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Health Care Technology (1.00)
Identifying the Source of Generation for Large Language Models
Large language models (LLMs) memorize text from several sources of documents. In pretraining, LLM trains to maximize the likelihood of text but neither receives the source of the text nor memorizes the source. Accordingly, LLM can not provide document information on the generated content, and users do not obtain any hint of reliability, which is crucial for factuality or privacy infringement. This work introduces token-level source identification in the decoding step, which maps the token representation to the reference document. We propose a bi-gram source identifier, a multi-layer perceptron with two successive token representations as input for better generalization. We conduct extensive experiments on Wikipedia and PG19 datasets with several LLMs, layer locations, and identifier sizes. The overall results show a possibility of token-level source identifiers for tracing the document, a crucial problem for the safe use of LLMs.
Transfer Learning of Semantic Segmentation Methods for Identifying Buried Archaeological Structures on LiDAR Data
Sech, Gregory, Soleni, Paolo, der Vaart, Wouter B. Verschoof-van, Kokalj, Žiga, Traviglia, Arianna, Fiorucci, Marco
When applying deep learning to remote sensing data in archaeological research, a notable obstacle is the limited availability of suitable datasets for training models. The application of transfer learning is frequently employed to mitigate this drawback. However, there is still a need to explore its effectiveness when applied across different archaeological datasets. This paper compares the performance of various transfer learning configurations using two semantic segmentation deep neural networks on two LiDAR datasets. The experimental results indicate that transfer learning-based approaches in archaeology can lead to performance improvements, although a systematic enhancement has not yet been observed. We provide specific insights about the validity of such techniques that can serve as a baseline for future works.
Identifying the Risks of LM Agents with an LM-Emulated Sandbox
Ruan, Yangjun, Dong, Honghua, Wang, Andrew, Pitis, Silviu, Zhou, Yongchao, Ba, Jimmy, Dubois, Yann, Maddison, Chris J., Hashimoto, Tatsunori
Recent advances in Language Model (LM) agents and tool use, exemplified by applications like ChatGPT Plugins, enable a rich set of capabilities but also amplify potential risks - such as leaking private data or causing financial losses. Identifying these risks is labor-intensive, necessitating implementing the tools, manually setting up the environment for each test scenario, and finding risky cases. As tools and agents become more complex, the high cost of testing these agents will make it increasingly difficult to find high-stakes, long-tailed risks. To address these challenges, we introduce ToolEmu: a framework that uses an LM to emulate tool execution and enables the testing of LM agents against a diverse range of tools and scenarios, without manual instantiation. Alongside the emulator, we develop an LM-based automatic safety evaluator that examines agent failures and quantifies associated risks. We test both the tool emulator and evaluator through human evaluation and find that 68.8% of failures identified with ToolEmu would be valid real-world agent failures. Using our curated initial benchmark consisting of 36 high-stakes tools and 144 test cases, we provide a quantitative risk analysis of current LM agents and identify numerous failures with potentially severe outcomes. Notably, even the safest LM agent exhibits such failures 23.9% of the time according to our evaluator, underscoring the need to develop safer LM agents for real-world deployment.
Developing A Fair Individualized Polysocial Risk Score (iPsRS) for Identifying Increased Social Risk of Hospitalizations in Patients with Type 2 Diabetes (T2D)
Huang, Yu, Guo, Jingchuan, Donahoo, William T, Fan, Zhengkang, Lu, Ying, Chen, Wei-Han, Tang, Huilin, Bilello, Lori, Shenkman, Elizabeth A, Bian, Jiang
Background: Racial and ethnic minority groups and individuals facing social disadvantages, which often stem from their social determinants of health (SDoH), bear a disproportionate burden of type 2 diabetes (T2D) and its complications. It is therefore crucial to implement effective social risk management strategies at the point of care. Objective: To develop an EHR-based machine learning (ML) analytical pipeline to identify the unmet social needs associated with hospitalization risk in patients with T2D. Methods: We identified 10,192 T2D patients from the EHR data (from 2012 to 2022) from the University of Florida Health Integrated Data Repository, including contextual SDoH (e.g., neighborhood deprivation) and individual-level SDoH (e.g., housing stability). We developed an electronic health records (EHR)-based machine learning (ML) analytic pipeline, namely individualized polysocial risk score (iPsRS), to identify high social risk associated with hospitalizations in T2D patients, along with explainable AI (XAI) techniques and fairness assessment and optimization. Results: Our iPsRS achieved a C statistic of 0.72 in predicting 1-year hospitalization after fairness optimization across racial-ethnic groups. The iPsRS showed excellent utility for capturing individuals at high hospitalization risk; the actual 1-year hospitalization rate in the top 5% of iPsRS was ~13 times as high as the bottom decile. Conclusion: Our ML pipeline iPsRS can fairly and accurately screen for patients who have increased social risk leading to hospitalization in T2D patients.
Whats New? Identifying the Unfolding of New Events in Narratives
Mousavi, Seyed Mahed, Tanaka, Shohei, Roccabruna, Gabriel, Yoshino, Koichiro, Nakamura, Satoshi, Riccardi, Giuseppe
Narratives include a rich source of events unfolding over time and context. Automatic understanding of these events provides a summarised comprehension of the narrative for further computation (such as reasoning). In this paper, we study the Information Status (IS) of the events and propose a novel challenging task: the automatic identification of new events in a narrative. We define an event as a triplet of subject, predicate, and object. The event is categorized as new with respect to the discourse context and whether it can be inferred through commonsense reasoning. We annotated a publicly available corpus of narratives with the new events at sentence level using human annotators. We present the annotation protocol and study the quality of the annotation and the difficulty of the task. We publish the annotated dataset, annotation materials, and machine learning baseline models for the task of new event extraction for narrative understanding.