Maharaj, Tegan
Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Anwar, Usman, Saparov, Abulhair, Rando, Javier, Paleka, Daniel, Turpin, Miles, Hase, Peter, Lubana, Ekdeep Singh, Jenner, Erik, Casper, Stephen, Sourbut, Oliver, Edelman, Benjamin L., Zhang, Zhaowei, Günther, Mario, Korinek, Anton, Hernandez-Orallo, Jose, Hammond, Lewis, Bigelow, Eric, Pan, Alexander, Langosco, Lauro, Korbak, Tomasz, Zhang, Heidi, Zhong, Ruiqi, hÉigeartaigh, Seán Ó, Recchia, Gabriel, Corsi, Giulio, Chan, Alan, Anderljung, Markus, Edwards, Lilian, Bengio, Yoshua, Chen, Danqi, Albanie, Samuel, Maharaj, Tegan, Foerster, Jakob, Tramer, Florian, He, He, Kasirzadeh, Atoosa, Choi, Yejin, Krueger, David
This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are organized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose $200+$ concrete research questions.
Beyond Predictive Algorithms in Child Welfare
Moon, Erina Seh-Young, Saxena, Devansh, Maharaj, Tegan, Guha, Shion
Caseworkers in the child welfare (CW) sector use predictive decision-making algorithms built on risk assessment (RA) data to guide and support CW decisions. Researchers have highlighted that RAs can contain biased signals which flatten CW case complexities and that the algorithms may benefit from incorporating contextually rich case narratives, i.e. - casenotes written by caseworkers. To investigate this hypothesized improvement, we quantitatively deconstructed two commonly used RAs from a United States CW agency. We trained classifier models to compare the predictive validity of RAs with and without casenote narratives and applied computational text analysis on casenotes to highlight topics uncovered in the casenotes. Our study finds that common risk metrics used to assess families and build CWS predictive risk models (PRMs) are unable to predict discharge outcomes for children who are not reunified with their birth parent(s). We also find that although casenotes cannot predict discharge outcomes, they contain contextual case signals. Given the lack of predictive validity of RA scores and casenotes, we propose moving beyond quantitative risk assessments for public sector algorithms and towards using contextual sources of information such as narratives to study public sociotechnical systems.
Managing AI Risks in an Era of Rapid Progress
Bengio, Yoshua, Hinton, Geoffrey, Yao, Andrew, Song, Dawn, Abbeel, Pieter, Harari, Yuval Noah, Zhang, Ya-Qin, Xue, Lan, Shalev-Shwartz, Shai, Hadfield, Gillian, Clune, Jeff, Maharaj, Tegan, Hutter, Frank, Baydin, Atılım Güneş, McIlraith, Sheila, Gao, Qiqi, Acharya, Ashwin, Krueger, David, Dragan, Anca, Torr, Philip, Russell, Stuart, Kahneman, Daniel, Brauner, Jan, Mindermann, Sören
In this short consensus paper, we outline risks from upcoming, advanced AI systems. We examine large-scale social harms and malicious uses, as well as an irreversible loss of human control over autonomous AI systems. In light of rapid and continuing AI progress, we propose urgent priorities for AI R&D and governance.
Filling gaps in trustworthy development of AI
Avin, Shahar, Belfield, Haydn, Brundage, Miles, Krueger, Gretchen, Wang, Jasmine, Weller, Adrian, Anderljung, Markus, Krawczuk, Igor, Krueger, David, Lebensold, Jonathan, Maharaj, Tegan, Zilberman, Noa
The range of application of artificial intelligence (AI) is vast, as is the potential for harm. Growing awareness of potential risks from AI systems has spurred action to address those risks, while eroding confidence in AI systems and the organizations that develop them. A 2019 study found over 80 organizations that published and adopted "AI ethics principles'', and more have joined since. But the principles often leave a gap between the "what" and the "how" of trustworthy AI development. Such gaps have enabled questionable or ethically dubious behavior, which casts doubts on the trustworthiness of specific organizations, and the field more broadly. There is thus an urgent need for concrete methods that both enable AI developers to prevent harm and allow them to demonstrate their trustworthiness through verifiable behavior. Below, we explore mechanisms (drawn from arXiv:2004.07213) for creating an ecosystem where AI developers can earn trust - if they are trustworthy. Better assessment of developer trustworthiness could inform user choice, employee actions, investment decisions, legal recourse, and emerging governance regimes.
Predicting Infectiousness for Proactive Contact Tracing
Bengio, Yoshua, Gupta, Prateek, Maharaj, Tegan, Rahaman, Nasim, Weiss, Martin, Deleu, Tristan, Muller, Eilif, Qu, Meng, Schmidt, Victor, St-Charles, Pierre-Luc, Alsdurf, Hannah, Bilanuik, Olexa, Buckeridge, David, Caron, Gáetan Marceau, Carrier, Pierre-Luc, Ghosn, Joumana, Ortiz-Gagne, Satya, Pal, Chris, Rish, Irina, Schölkopf, Bernhard, Sharma, Abhinav, Tang, Jian, Williams, Andrew
The COVID-19 pandemic has spread rapidly worldwide, overwhelming manual contact tracing in many countries and resulting in widespread lockdowns for emergency containment. Various DCT methods have been proposed, each making tradeoffs between privacy, mobility restrictions, and public health. The most common approach, binary contact tracing (BCT), models infection as a binary event, informed only by an individual's test results, with corresponding binary recommendations that either all or none of the individual's contacts quarantine. BCT ignores the inherent uncertainty in contacts and the infection process, which could be used to tailor messaging to high-risk individuals, and prompt proactive testing or earlier warnings. It also does not make use of observations such as symptoms or preexisting medical conditions, which could be used to make more accurate infectiousness predictions. In this paper, we use a recently-proposed COVID-19 epidemiological simulator to develop and test methods that can be deployed to a smartphone to locally and proactively predict an individual's infectiousness (risk of infecting others) based on their contact history and other information, while respecting strong privacy constraints. Predictions are used to provide personalized recommendations to the individual via an app, as well as to send anonymized messages to the individual's contacts, who use this information to better predict their own infectiousness, an approach we call proactive contact tracing (PCT). Similarly to other works, we find that compared to no tracing, all DCT methods tested are able to reduce spread of the disease and thus save lives, even at low adoption rates, strongly supporting a role for DCT methods in managing the pandemic. Further, we find a deep-learning based PCT method which improves over BCT for equivalent average mobility, suggesting PCT could help in safe reopening and second-wave prevention. Until pharmaceutical interventions such as a vaccine become available, control of the COVID-19 pandemic relies on nonpharmaceutical interventions such as lockdown and social distancing. While these have often been successful in limiting spread of the disease in the short term, these restrictive measures have important negative social, mental health, and economic impacts. Digital contact tracing (DCT), a technique to track the spread of the virus among individuals in a population using smartphones, is an attractive potential solution to help reduce growth in the number of cases and thereby allow more economic and social activities to resume while keeping the number of cases low. All bolded terms are defined in the Glossary; Appendix 1.
Hidden Incentives for Auto-Induced Distributional Shift
Krueger, David, Maharaj, Tegan, Leike, Jan
Decisions made by machine learning systems have increasing influence on the world, yet it is common for machine learning algorithms to assume that no such influence exists. An example is the use of the i.i.d. assumption in content recommendation. In fact, the (choice of) content displayed can change users' perceptions and preferences, or even drive them away, causing a shift in the distribution of users. We introduce the term auto-induced distributional shift (ADS) to describe the phenomenon of an algorithm causing a change in the distribution of its own inputs. Our goal is to ensure that machine learning systems do not leverage ADS to increase performance when doing so could be undesirable. We demonstrate that changes to the learning algorithm, such as the introduction of meta-learning, can cause hidden incentives for auto-induced distributional shift (HI-ADS) to be revealed. To address this issue, we introduce `unit tests' and a mitigation strategy for HI-ADS, as well as a toy environment for modelling real-world issues with HI-ADS in content recommendation, where we demonstrate that strong meta-learners achieve gains in performance via ADS. We show meta-learning and Q-learning both sometimes fail unit tests, but pass when using our mitigation strategy.
COVI White Paper
Alsdurf, Hannah, Belliveau, Edmond, Bengio, Yoshua, Deleu, Tristan, Gupta, Prateek, Ippolito, Daphne, Janda, Richard, Jarvie, Max, Kolody, Tyler, Krastev, Sekoul, Maharaj, Tegan, Obryk, Robert, Pilat, Dan, Pisano, Valerie, Prud'homme, Benjamin, Qu, Meng, Rahaman, Nasim, Rish, Irina, Rousseau, Jean-Francois, Sharma, Abhinav, Struck, Brooke, Tang, Jian, Weiss, Martin, Yu, Yun William
The SARS-CoV-2 (Covid-19) pandemic has caused significant strain on public health institutions around the world. Contact tracing is an essential tool to change the course of the Covid-19 pandemic. Manual contact tracing of Covid-19 cases has significant challenges that limit the ability of public health authorities to minimize community infections. Personalized peer-to-peer contact tracing through the use of mobile apps has the potential to shift the paradigm. Some countries have deployed centralized tracking systems, but more privacy-protecting decentralized systems offer much of the same benefit without concentrating data in the hands of a state authority or for-profit corporations. Machine learning methods can circumvent some of the limitations of standard digital tracing by incorporating many clues and their uncertainty into a more graded and precise estimation of infection risk. The estimated risk can provide early risk awareness, personalized recommendations and relevant information to the user. Finally, non-identifying risk data can inform epidemiological models trained jointly with the machine learning predictor. These models can provide statistical evidence for the importance of factors involved in disease transmission. They can also be used to monitor, evaluate and optimize health policy and (de)confinement scenarios according to medical and economic productivity indicators. However, such a strategy based on mobile apps and machine learning should proactively mitigate potential ethical and privacy risks, which could have substantial impacts on society (not only impacts on health but also impacts such as stigmatization and abuse of personal data). Here, we present an overview of the rationale, design, ethical considerations and privacy strategy of `COVI,' a Covid-19 public peer-to-peer contact tracing and risk awareness mobile application developed in Canada.
ExtremeWeather: A large-scale climate dataset for semi-supervised detection, localization, and understanding of extreme weather events
Racah, Evan, Beckham, Christopher, Maharaj, Tegan, Kahou, Samira Ebrahimi, Prabhat, Mr., Pal, Chris
Then detection and identification of extreme weather events in large-scale climate simulations is an important problem for risk management, informing governmental policy decisions and advancing our basic understanding of the climate system. Recent work has shown that fully supervised convolutional neural networks (CNNs) can yield acceptable accuracy for classifying well-known types of extreme weather events when large amounts of labeled data are available. However, many different types of spatially localized climate patterns are of interest including hurricanes, extra-tropical cyclones, weather fronts, and blocking events among others. Existing labeled data for these patterns can be incomplete in various ways, such as covering only certain years or geographic areas and having false negatives. This type of climate data therefore poses a number of interesting machine learning challenges.
Tackling Climate Change with Machine Learning
Rolnick, David, Donti, Priya L., Kaack, Lynn H., Kochanski, Kelly, Lacoste, Alexandre, Sankaran, Kris, Ross, Andrew Slavin, Milojevic-Dupont, Nikola, Jaques, Natasha, Waldman-Brown, Anna, Luccioni, Alexandra, Maharaj, Tegan, Sherwin, Evan D., Mukkavilli, S. Karthik, Kording, Konrad P., Gomes, Carla, Ng, Andrew Y., Hassabis, Demis, Platt, John C., Creutzig, Felix, Chayes, Jennifer, Bengio, Yoshua
Climate change is one of the greatest challenges facing humanity, and we, as machine learning experts, may wonder how we can help. Here we describe how machine learning can be a powerful tool in reducing greenhouse gas emissions and helping society adapt to a changing climate. From smart grids to disaster management, we identify high impact problems where existing gaps can be filled by machine learning, in collaboration with other fields. Our recommendations encompass exciting research questions as well as promising business opportunities. We call on the machine learning community to join the global effort against climate change.
ExtremeWeather: A large-scale climate dataset for semi-supervised detection, localization, and understanding of extreme weather events
Racah, Evan, Beckham, Christopher, Maharaj, Tegan, Kahou, Samira Ebrahimi, Prabhat, Mr., Pal, Chris
Then detection and identification of extreme weather events in large-scale climate simulations is an important problem for risk management, informing governmental policy decisions and advancing our basic understanding of the climate system. Recent work has shown that fully supervised convolutional neural networks (CNNs) can yield acceptable accuracy for classifying well-known types of extreme weather events when large amounts of labeled data are available. However, many different types of spatially localized climate patterns are of interest including hurricanes, extra-tropical cyclones, weather fronts, and blocking events among others. Existing labeled data for these patterns can be incomplete in various ways, such as covering only certain years or geographic areas and having false negatives. This type of climate data therefore poses a number of interesting machine learning challenges. We present a multichannel spatiotemporal CNN architecture for semi-supervised bounding box prediction and exploratory data analysis. We demonstrate that our approach is able to leverage temporal information and unlabeled data to improve the localization of extreme weather events. Further, we explore the representations learned by our model in order to better understand this important data. We present a dataset, ExtremeWeather, to encourage machine learning research in this area and to help facilitate further work in understanding and mitigating the effects of climate change. The dataset is available at extremeweatherdataset.github.io and the code is available at https://github.com/eracah/hur-detect.