Goto

Collaborating Authors

 margaret mitchell


Getting Your Indices in a Row: Full-Text Search for LLM Training Data for Real World

Marinas, Ines Altemir, Kucherenko, Anastasiia, Sternfeld, Alexander, Kucharavy, Andrei

arXiv.org Artificial Intelligence

The performance of Large Language Models (LLMs) is determined by their training data. Despite the proliferation of open-weight LLMs, access to LLM training data has remained limited. Even for fully open LLMs, the scale of the data makes it all but inscrutable to the general scientific community, despite potentially containing critical data scraped from the internet. In this paper, we present the full-text indexing pipeline for the Apertus LLM training data. Leveraging Elasticsearch parallel indices and the Alps infrastructure, a state-of-the-art, highly energy-efficient arm64 supercluster, we were able to index 8.6T tokens out of 15.2T used to train the Apertus LLM family, creating both a critical LLM safety tool and effectively an offline, curated, open web search engine. Our contribution is threefold. First, we demonstrate that Elasticsearch can be successfully ported onto next-generation arm64-based infrastructure. Second, we demonstrate that full-text indexing at the scale of modern LLM training datasets and the entire open web is feasible and accessible. Finally, we demonstrate that such indices can be used to ensure previously inaccessible jailbreak-agnostic LLM safety. We hope that our findings will be useful to other teams attempting large-scale data indexing and facilitate the general transition towards greener computation.


AI Is Spreading Old Stereotypes to New Languages and Cultures

WIRED

Margaret Mitchell is a pioneer when it comes to testing generative AI tools for bias. She founded the Ethical AI team at Google, alongside another well-known researcher, Timnit Gebru, before they were later both fired from the company. She now works as the AI ethics leader at Hugging Face, a software startup focused on open source tools. We spoke about a new dataset she helped create to test how AI models continue perpetuating stereotypes. Unlike most bias-mitigation efforts that prioritize English, this dataset is malleable, with human translations for testing a wider breadth of languages and cultures.


The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources

Longpre, Shayne, Biderman, Stella, Albalak, Alon, Schoelkopf, Hailey, McDuff, Daniel, Kapoor, Sayash, Klyman, Kevin, Lo, Kyle, Ilharco, Gabriel, San, Nay, Rauh, Maribeth, Skowron, Aviya, Vidgen, Bertie, Weidinger, Laura, Narayanan, Arvind, Sanh, Victor, Adelani, David, Liang, Percy, Bommasani, Rishi, Henderson, Peter, Luccioni, Sasha, Jernite, Yacine, Soldaini, Luca

arXiv.org Artificial Intelligence

Foundation model development attracts a rapidly expanding body of contributors, scientists, and applications. To help shape responsible development practices, we introduce the Foundation Model Development Cheatsheet: a growing collection of 250+ tools and resources spanning text, vision, and speech modalities. We draw on a large body of prior work to survey resources (e.g. software, documentation, frameworks, guides, and practical tools) that support informed data selection, processing, and understanding, precise and limitation-aware artifact documentation, efficient model training, advance awareness of the environmental impact from training, careful model evaluation of capabilities, risks, and claims, as well as responsible model release, licensing and deployment practices. We hope this curated collection of resources helps guide more responsible development. The process of curating this list, enabled us to review the AI development ecosystem, revealing what tools are critically missing, misused, or over-used in existing practices. We find that (i) tools for data sourcing, model evaluation, and monitoring are critically under-serving ethical and real-world needs, (ii) evaluations for model safety, capabilities, and environmental impact all lack reproducibility and transparency, (iii) text and particularly English-centric analyses continue to dominate over multilingual and multi-modal analyses, and (iv) evaluation of systems, rather than just models, is needed so that capabilities and impact are assessed in context.


'There was all sorts of toxic behaviour': Timnit Gebru on her sacking by Google, AI's dangers and big tech's biases

The Guardian

'It feels like a gold rush," says Timnit Gebru. "In fact, it is a gold rush. And a lot of the people who are making money are not the people actually in the midst of it. But it's humans who decide whether all this should be done or not. We should remember that we have the agency to do that." Gebru is talking about her specialised field: artificial intelligence. On the day we speak via a video call, she is in Kigali, Rwanda, preparing to host a workshop and chair a panel at an international conference on AI. It will address the huge growth in AI's capabilities, as well as something that the frenzied conversation about AI misses out: the fact that many of its systems may well be built on a huge mess of biases, inequalities and imbalances of power. This gathering, the clunkily titled International Conference on Learning Representations, marks the first time people in the field have come together in an African country – which makes a powerful point about big tech's neglect of the global south. When Gebru talks about the way that AI "impacts people all over the world and they don't get to have a say on how they should shape it", the issue is thrown into even sharper relief by her backstory. In her teens, Gebru was a refugee from the war between Ethiopia, where she grew up, and Eritrea, where her parents were born. After a year in Ireland, she made it to the outskirts of Boston, Massachusetts, and from there to Stanford University in northern California, which opened the way to a career at the cutting edge of the computing industry: Apple, then Microsoft, followed by Google. But in late 2020, her work at Google came to a sudden end. As the co-leader of Google's small ethical AI team, Gebru was one of the authors of an academic paper that warned about the kind of AI that is increasingly built into our lives, taking internet searches and user recommendations to apparently new levels of sophistication and threatening to master such human talents as writing, composing music and analysing images. The clear danger, the paper said, is that such supposed "intelligence" is based on huge data sets that "overrepresent hegemonic viewpoints and encode biases potentially damaging to marginalised populations". Put more bluntly, AI threatens to deepen the dominance of a way of thinking that is white, male, comparatively affluent and focused on the US and Europe. In response, senior managers at Google demanded that Gebru either withdraw the paper, or take her name and those of her colleagues off it. This triggered a run of events that led to her departure. Google says she resigned; Gebru insists that she was fired. What all this told her, she says, is that big tech is consumed by a drive to develop AI and "you don't want someone like me who's going to get in your way.


A Human Rights-Based Approach to Responsible AI

Prabhakaran, Vinodkumar, Mitchell, Margaret, Gebru, Timnit, Gabriel, Iason

arXiv.org Artificial Intelligence

On the other hand, these research insights are meant to intervene on platforms that are globally present, serving a global population from diverse societies, cultures and values, with their own forms of injustices. A core concern in this arrangement is that of value imposition, where local values, i.e., values that are local to the regions where the interventions are built, implicitly shape and inform global systems without any or much room for discussion or contestation from those affected by those interventions. More specifically, interventions designed to address FATE failures necessarily impart a normative value system, but the values that guide the proposed solutions are rarely recognized as sites of contestation. This is problematic because while there may be ethical principles for ML that garner a degree of consensus across different value systems, in a pluralistic world this consensus is not something that should be assumed. Instead, we need to be explicit about the values that underpin the quest for ethical and just AI, and to cultivate an active debate about those values, critically examining and evaluating claims about them[28]. Another shortcoming of not being explicit about what normative value systems shape the interventions is the vagueness it entails, making it harder to arrive at a common vocabulary and shared understanding between computer scientists and civil society. Such a shared understanding is crucial to bridge the gap between research and practice, especially in a way that effectively supports the priorities of the latter constituency.


Google engineer goes public after suspension: warned AI is sentient

#artificialintelligence

A senior software engineer at Google who signed up to test Google's artificial intelligence tool called LaMDA (Language Model for Dialog Applications), has claimed that the AI robot is in fact sentient and has thoughts and feelings. During a series of conversations with LaMDA, 41-year-old Blake Lemoine presented the computer with various of scenarios through which analyses could be made. They included religious themes and whether the artificial intelligence could be goaded into using discriminatory or hateful speech. Lemoine came away with the perception that LaMDA was indeed sentient and was endowed with sensations and thoughts all of its own. Blake Lemoine, 41, a senior software engineer at Google has been testing Google's artificial intelligence tool called LaMDA'If I didn't know exactly what it was, which is this computer program we built recently, I'd think it was a 7-year-old, 8-year-old kid that happens to know physics,' he told the Washington Post.


AI ethics champion Margaret Mitchell on self-regulation and 'foresight'

#artificialintelligence

All the sessions from Transform 2021 are available on-demand now. Ethics and artificial intelligence have become increasingly intertwined due to the pervasiveness of AI. But researchers, creators, corporations, and governments still face major challenges if they hope to address some of the more pressing concerns around AI's impact on society. Much of this comes down to foresight -- being able to adequately predict what problems a new AI product, feature, or technology could create down the line, rather than focusing purely on short-term benefits. "If you do believe in foresight, then it should become part of what you do before you make the product," AI researcher and former Googler Margaret Mitchell said during a fireside chat at VentureBeat's Transform 2021 event today.

  Country: North America > United States > California > Alameda County > Berkeley (0.05)
  Industry: Law (0.71)

A researcher turned down a $60k grant from Google because it ousted 2 top AI ethics leaders: 'I don't think this is going to blow over'

#artificialintelligence

In a sign of continued blowback from Google's controversial ousting of two top artificial intelligence leaders, a researcher just publicly turned down a major grant from the company. Late last year, Luke Stark, an assistant professor at the University of Western Ontario researching the social and ethical impacts of artificial intelligence, applied for a Google Research Scholar award. Each year, the company offers grants to early-career professors pursuing topics relevant to Google's fields of interest. Stark applied with plans to put any funding towards his further research into how technology such as mood-tracking apps and facial recognition are used to monitor human emotions. "My impression was that Google was really pulling together a top ethical AI team," he told Insider.


Google fires top AI ethics researcher Margaret Mitchell – TechCrunch

#artificialintelligence

Google has fired Margaret Mitchell, the founder and former co-lead of the company's ethical AI team. Mitchell announced the news via a tweet. Google confirmed Mitchell's firing in a statement to TechCrunch; Google said: After conducting a review of this manager's conduct, we confirmed that there were multiple violations of our code of conduct, as well as of our security policies, which included the exfiltration of confidential business-sensitive documents and private data of other employees. In January, Google revoked corporate access from AI ethicist Margaret Mitchell for reportedly using automated scripts to find examples of mistreatment of Dr. Timnit Gebru, according to Axios. Gebru says she was fired from Google while Google has maintained that she resigned.


Why Is Google Investigating Its Ethical AI Lead?

#artificialintelligence

Google is at it again. After firing Timnit Gebru over her research paper's'unacceptable' content, the company is now investigating Margaret Mitchell, co-leader of Google's Ethical AI team. According to Axios, Google found out Mitchell has been using automated scripts to go through her messages to find examples showing discriminatory treatment against Gebru before her account was locked. She had been allegedly documenting critical issues surrounding Gebru's firing. As per the statement provided by Google to Axios, their system detected that an account had exfiltrated thousands of files and shared them with multiple external accounts.