Goto

Collaborating Authors

 Personal


RFK Jr. speaks candidly about his gravelly voice

Los Angeles Times

There was a time before the turn of the millennium when Robert F. Kennedy Jr. gave a full-throated accounting of himself and the things he cared about. He recalls his voice then as "unusually strong," so much so that he could fill large auditoriums with his words. The independent presidential candidate recounts those times somewhat wistfully, telling interviewers that he "can't stand" the sound of his voice today -- sometimes choked, halting and slightly tremulous. Spasmodic dysphonia, a rare neurological condition, in which an abnormality in the brain's neural network results in involuntary spasms of the muscles that open or close the vocal cords. My my voice doesn't really get tired. "I feel sorry for the people who have to listen to me," Kennedy said in a phone interview with The Times, his voice sounding as strained as it does in his public appearances.


A conversation with Dragoș Tudorache, the politician behind the AI Act

MIT Technology Review

A former interior minister, Tudorache is one of the most important players in European AI policy. He is one of the two lead negotiators of the AI Act in the European Parliament. The bill, the first sweeping AI law of its kind in the world, will enter into force this year. We first met two years ago, when Tudorache was appointed to his position as negotiator. But Tudorache's interest in AI started much earlier, in 2015.


Language Models as Critical Thinking Tools: A Case Study of Philosophers

arXiv.org Artificial Intelligence

Current work in language models (LMs) helps us speed up or even skip thinking by accelerating and automating cognitive work. But can LMs help us with critical thinking -- thinking in deeper, more reflective ways which challenge assumptions, clarify ideas, and engineer new concepts? We treat philosophy as a case study in critical thinking, and interview 21 professional philosophers about how they engage in critical thinking and on their experiences with LMs. We find that philosophers do not find LMs to be useful because they lack a sense of selfhood (memory, beliefs, consistency) and initiative (curiosity, proactivity). We propose the selfhood-initiative model for critical thinking tools to characterize this gap. Using the model, we formulate three roles LMs could play as critical thinking tools: the Interlocutor, the Monitor, and the Respondent. We hope that our work inspires LM researchers to further develop LMs as critical thinking tools and philosophers and other 'critical thinkers' to imagine intellectually substantive uses of LMs.


'Suite Life of Zack & Cody' star recalls choosing video games over a conversation with Matt Damon

FOX News

Actor Cillian Murphy tells Fox News Digital'everyday was a buzz' when working with his'Oppenheimer' co-stars Matt Damon, Emily Blunt and Florence Pugh. Cole was a guest on Wednesday's episode of "Let's Talk Off Camera with Kelly Ripa." Twins Dylan and Cole Sprouse were stars of the Disney television series "The Suite Life of Zack & Cody." (Gerardo Mora/Getty Image) During the episode, Cole told the story of how he and his brother were so obsessed with video games as kids that even an Oscar-winning actor couldn't tear them away. "So my brother and I were in set school at the time, and we were really into'World of Warcraft,'" Cole explained. "One of the PAs comes knocking on the set school door and goes, 'Oh, you guys, you won't believe it. Matt Damon is going to be here today. His kids love the show, so he's gonna be here in like thirty minutes,' and I remember Dylan and I turning to each other and just going, 'Ugh. I can't believe we have to get off of'World of Warcraft' right now.' We must have been fifteen," the "Riverdale" actor explained.


Balancing Progress and Responsibility: A Synthesis of Sustainability Trade-Offs of AI-Based Systems

arXiv.org Artificial Intelligence

Recent advances in artificial intelligence (AI) capabilities have increased the eagerness of companies to integrate AI into software systems. While AI can be used to have a positive impact on several dimensions of sustainability, this is often overshadowed by its potential negative influence. While many studies have explored sustainability factors in isolation, there is insufficient holistic coverage of potential sustainability benefits or costs that practitioners need to consider during decision-making for AI adoption. We therefore aim to synthesize trade-offs related to sustainability in the context of integrating AI into software systems. We want to make the sustainability benefits and costs of integrating AI more transparent and accessible for practitioners. The study was conducted in collaboration with a Dutch financial organization. We first performed a rapid review that led to the inclusion of 151 research papers. Afterward, we conducted six semi-structured interviews to enrich the data with industry perspectives. The combined results showcase the potential sustainability benefits and costs of integrating AI. The labels synthesized from the review regarding potential sustainability benefits were clustered into 16 themes, with "energy management" being the most frequently mentioned one. 11 themes were identified in the interviews, with the top mentioned theme being "employee wellbeing". Regarding sustainability costs, the review discovered seven themes, with "deployment issues" being the most popular one, followed by "ethics & society". "Environmental issues" was the top theme from the interviews. Our results provide valuable insights to organizations and practitioners for understanding the potential sustainability implications of adopting AI.


Improving Factual Accuracy of Neural Table-to-Text Output by Addressing Input Problems in ToTTo

arXiv.org Artificial Intelligence

Neural Table-to-Text models tend to hallucinate, producing texts that contain factual errors. We investigate whether such errors in the output can be traced back to problems with the input. We manually annotated 1,837 texts generated by multiple models in the politics domain of the ToTTo dataset. We identify the input problems that are responsible for many output errors and show that fixing these inputs reduces factual errors by between 52% and 76% (depending on the model). In addition, we observe that models struggle in processing tabular inputs that are structured in a non-standard way, particularly when the input lacks distinct row and column values or when the column headers are not correctly mapped to corresponding values.


How Easily do Irrelevant Inputs Skew the Responses of Large Language Models?

arXiv.org Artificial Intelligence

By leveraging the retrieval of information from external knowledge databases, Large Language Models (LLMs) exhibit enhanced capabilities for accomplishing many knowledge-intensive tasks. However, due to the inherent flaws of current retrieval systems, there might exist irrelevant information within those retrieving top-ranked passages. In this work, we present a comprehensive investigation into the robustness of LLMs to different types of irrelevant information under various conditions. We initially introduce a framework to construct high-quality irrelevant information that ranges from semantically unrelated, partially related, and related to questions. Furthermore, our analysis demonstrates that the constructed irrelevant information not only scores highly on similarity metrics, being highly retrieved by existing systems, but also bears semantic connections to the context. Our investigation reveals that current LLMs still face challenges in discriminating highly semantically related information and can be easily distracted by these irrelevant yet misleading contents. Besides, we also find that current solutions for handling irrelevant information have limitations in improving the robustness of LLMs to such distractions.


Tragic cancer loss inspires New York tech entrepreneur to address 'urgent medical need'

FOX News

Fox News Flash top headlines are here. Check out what's clicking on Foxnews.com. After losing his wife to colon cancer, a New York man has dedicated his life to fighting the disease and trying to protect other families from the same tragedy. Roy de Souza, now 54, and his wife, Aisha de Sequeira, had three young children when she was diagnosed with cancer in 2017. At the time, the family was living in India, where de Souza ran a technology company and his wife headed up an investment banking firm.


Empowering Biomedical Discovery with AI Agents

arXiv.org Artificial Intelligence

A long-standing ambition for artificial intelligence (AI) in biomedicine is the development of AI systems that could eventually make major scientific discoveries, with the potential to be worthy of a Nobel Prize--fulfilling the Nobel Turing Challenge [1]. While the concept of an "AI scientist" is aspirational, advances in agent-based AI pave the way to the development of AI agents as conversable systems capable of skeptical learning and reasoning that coordinate large language models (LLMs), machine learning (ML) tools, experimental platforms, or even combinations of them [2-5] (Figure 1). The complexity of biological problems requires a multistage approach, where decomposing complex questions into simpler tasks is necessary. AI agents can break down a problem into manageable subtasks, which can then be addressed by agents with specialized functions for targeted problem-solving and integration of scientific knowledge, paving the way toward a future in which a major biomedical discovery is made solely by AI [2, 6].


Long-form factuality in large language models

arXiv.org Artificial Intelligence

Large language models (LLMs) often generate content that contains factual errors when responding to fact-seeking prompts on open-ended topics. To benchmark a model's long-form factuality in open domains, we first use GPT-4 to generate LongFact, a prompt set comprising thousands of questions spanning 38 topics. We then propose that LLM agents can be used as automated evaluators for long-form factuality through a method which we call Search-Augmented Factuality Evaluator (SAFE). SAFE utilizes an LLM to break down a long-form response into a set of individual facts and to evaluate the accuracy of each fact using a multi-step reasoning process comprising sending search queries to Google Search and determining whether a fact is supported by the search results. Furthermore, we propose extending F1 score as an aggregated metric for long-form factuality. To do so, we balance the percentage of supported facts in a response (precision) with the percentage of provided facts relative to a hyperparameter representing a user's preferred response length (recall). Empirically, we demonstrate that LLM agents can outperform crowdsourced human annotators - on a set of ~16k individual facts, SAFE agrees with crowdsourced human annotators 72% of the time, and on a random subset of 100 disagreement cases, SAFE wins 76% of the time. At the same time, SAFE is more than 20 times cheaper than human annotators. We also benchmark thirteen language models on LongFact across four model families (Gemini, GPT, Claude, and PaLM-2), finding that larger language models generally achieve better long-form factuality. LongFact, SAFE, and all experimental code are available at https://github.com/google-deepmind/long-form-factuality.