Personal
Netflix's New Movie Takes On a Suddenly Controversial Reproductive Treatment. Does It Get It Right?
The grinding trial-and-error process that precedes world-changing scientific discoveries doesn't really lend itself to dramatization. Instead of our heroes chasing bad guys down dark alleys, the exciting story action involves them standing in front of a blackboard or gazing into a microscope. So dramatic tension is injected by financial or political forces threatening to derail a project of urgent importance (Oppenheimer); the scientists fighting for credibility in the face of belonging to a marginalized group (Hidden Figures, The Imitation Game, any biopic of a female scientist); or the old reliable of the main scientist being a difficult, maverick genius (Oppenheimer again). Joy: The Birth of IVF, Ben Taylor's new film out now on Netflix, about the arduous path to develop a viable technique for fertilizing human eggs outside the body and implanting them in the womb, aka in vitro fertilization, hits many of these notes. There's the irascible pioneer, here played by Bill Nighy at his most crotchety but sympathetic as gynecologist Patrick Steptoe, who introduced laparoscopy to the U.K. He's teamed with the driven visionary--physiologist Robert Edwards, played by James Norton, who, like Jude Law, is always required to conceal his innate gorgeousness under an unbecoming wig or glasses to convince as an ordinary guy.
The Extractive-Abstractive Spectrum: Uncovering Verifiability Trade-offs in LLM Generations
Worledge, Theodora, Hashimoto, Tatsunori, Guestrin, Carlos
Across all fields of academic study, experts cite their sources when sharing information. While large language models (LLMs) excel at synthesizing information, they do not provide reliable citation to sources, making it difficult to trace and verify the origins of the information they present. In contrast, search engines make sources readily accessible to users and place the burden of synthesizing information on the user. Through a survey, we find that users prefer search engines over LLMs for high-stakes queries, where concerns regarding information provenance outweigh the perceived utility of LLM responses. To examine the interplay between verifiability and utility of information-sharing tools, we introduce the extractive-abstractive spectrum, in which search engines and LLMs are extreme endpoints encapsulating multiple unexplored intermediate operating points. Search engines are extractive because they respond to queries with snippets of sources with links (citations) to the original webpages. LLMs are abstractive because they address queries with answers that synthesize and logically transform relevant information from training and in-context sources without reliable citation. We define five operating points that span the extractive-abstractive spectrum and conduct human evaluations on seven systems across four diverse query distributions that reflect real-world QA settings: web search, language simplification, multi-step reasoning, and medical advice. As outputs become more abstractive, we find that perceived utility improves by as much as 200%, while the proportion of properly cited sentences decreases by as much as 50% and users take up to 3 times as long to verify cited information. Our findings recommend distinct operating points for domain-specific LLM systems and our failure analysis informs approaches to high-utility LLM systems that empower users to verify information.
Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?
Yang, Sohee, Kassner, Nora, Gribovskaya, Elena, Riedel, Sebastian, Geva, Mor
We evaluate how well Large Language Models (LLMs) latently recall and compose facts to answer multi-hop queries like "In the year Scarlett Johansson was born, the Summer Olympics were hosted in the country of". One major challenge in evaluating this ability is that LLMs may have developed shortcuts by encounters of the head entity "Scarlett Johansson" and the answer entity "United States" in the same training sequences or merely guess the answer based on frequency-based priors. To prevent shortcuts, we exclude test queries where the head and answer entities co-appear in pretraining corpora. Through careful selection of relations and facts and systematic removal of cases where models might guess answers or exploit partial matches, we construct an evaluation dataset SOCRATES (ShOrtCut-fRee lATent rEaSoning). We observe that LLMs demonstrate promising latent multi-hop reasoning abilities without exploiting shortcuts, but only for certain types of queries. For queries requiring latent recall of countries as the intermediate answer, the best models achieve 80% latent composability, but this drops to just 5% for the recall of years. Comparisons with Chain-of-Thought composability highlight a significant gap between the ability of models to reason latently versus explicitly. Analysis reveals that latent representations of the intermediate answer are constructed more often in queries with higher latent composability, and shows the emergence of latent multi-hop reasoning during pretraining.
XAI and Android Malware Models
Kulkarni, Maithili, Stamp, Mark
Android malware detection based on machine learning (ML) and deep learning (DL) models is widely used for mobile device security. Such models offer benefits in terms of detection accuracy and efficiency, but it is often difficult to understand how such learning models make decisions. As a result, these popular malware detection strategies are generally treated as black boxes, which can result in a lack of trust in the decisions made, as well as making adversarial attacks more difficult to detect. The field of eXplainable Artificial Intelligence (XAI) attempts to shed light on such black box models. In this paper, we apply XAI techniques to ML and DL models that have been trained on a challenging Android malware classification problem. Specifically, the classic ML models considered are Support Vector Machines (SVM), Random Forest, and $k$-Nearest Neighbors ($k$-NN), while the DL models we consider are Multi-Layer Perceptrons (MLP) and Convolutional Neural Networks (CNN). The state-of-the-art XAI techniques that we apply to these trained models are Local Interpretable Model-agnostic Explanations (LIME), Shapley Additive exPlanations (SHAP), PDP plots, ELI5, and Class Activation Mapping (CAM). We obtain global and local explanation results, and we discuss the utility of XAI techniques in this problem domain. We also provide a literature review of XAI work related to Android malware.
AI-Native Multi-Access Future Networks -- The REASON Architecture
Katsaros, Konstantinos, Mavromatis, Ioannis, Antonakoglou, Kostantinos, Ghosh, Saptarshi, Kaleshi, Dritan, Mahmoodi, Toktam, Asgari, Hamid, Karousos, Anastasios, Tavakkolnia, Iman, Safi, Hossein, Hass, Harald, Vrontos, Constantinos, Emami, Amin, Ullauri, Juan Parra, Moazzeni, Shadi, Simeonidou, Dimitra
The development of the sixth generation of communication networks (6G) has been gaining momentum over the past years, with a target of being introduced by 2030. Several initiatives worldwide are developing innovative solutions and setting the direction for the key features of these networks. Some common emerging themes are the tight integration of AI, the convergence of multiple access technologies and sustainable operation, aiming to meet stringent performance and societal requirements. To that end, we are introducing REASON - Realising Enabling Architectures and Solutions for Open Networks. The REASON project aims to address technical challenges in future network deployments, such as E2E service orchestration, sustainability, security and trust management, and policy management, utilising AI-native principles, considering multiple access technologies and cloud-native solutions. This paper presents REASON's architecture and the identified requirements for future networks. The architecture is meticulously designed for modularity, interoperability, scalability, simplified troubleshooting, flexibility, and enhanced security, taking into consideration current and future standardisation efforts, and the ease of implementation and training. It is structured into four horizontal layers: Physical Infrastructure, Network Service, Knowledge, and End-User Application, complemented by two vertical layers: Management and Orchestration, and E2E Security. This layered approach ensures a robust, adaptable framework to support the diverse and evolving requirements of 6G networks, fostering innovation and facilitating seamless integration of advanced technologies.
In Memoriam: E. Allen Emerson
E. Allen Emerson was the first graduate student of Edmund M. Clarke at Harvard University. After discussing several ideas for Allen's dissertation, they identified a promising candidate: verifying a finite-state system against a formal specification. According to Martha Clarke, Edmund's widow, it was during a walk across Harvard Yard that they decided to call it "model checking." Emerson received his Ph.D. in applied mathematics for this work in 1981. Twenty-five years later, he and Clarke (along with Joseph Sifakis) shared the ACM A.M. Turing Award in 2007 for this and related work.
'An AI Fukushima is inevitable': scientists discuss technology's immense potential and dangers
When better to hold a conference on artificial intelligence and the countless ways it is advancing science than in those brief days between the first Nobel prizes being awarded in the field and the winners heading to Stockholm for the lavish white tie ceremony? It was fortuitous timing for Google DeepMind and the Royal Society who this week convened the AI for Science Forum in London. Last month, Google DeepMind bagged the Nobel prize in chemistry a day after AI took the physics prize. Scientists have worked with AI for years, but the latest generation of algorithms have brought us to brink of transformation, Demis Hassabis, the chief executive officer of Google DeepMind, told the meeting. "If we get it right, it should be an incredible new era of discovery and a new golden age, maybe even a kind of new renaissance," he said.
Dalรญ, Basquiat, Haring, and Hockney at Luna Luna
I don't know what Werner Herzog is up to these days, but if he's between projects, I humbly suggest that he make a documentary about Luna Luna, the Hamburg amusement park that took more than ten years to put together, included attractions designed by Dalรญ and Basquiat and Haring and Hockney, and spent thirty-five years in shipping containers. It's now been partly reassembled at the Shed, for the exhibition "Luna Luna, Forgotten Fantasy," through Jan. 5. The park's Fitzcarraldo, a poet-songwriter-pop star named Andrรฉ Heller, was born in Vienna in 1947 and spent much of his thirties persuading artists to decorate rides. Haring slathered a merry-go-round in melty cartoons; Basquiat dressed a Ferris wheel in his customary graffiti. The park opened to the public in 1987, largely funded by a gossip rag, and stayed that way for a summer.
Scaling up the Evaluation of Collaborative Problem Solving: Promises and Challenges of Coding Chat Data with ChatGPT
Hao, Jiangang, Cui, Wenju, Kyllonen, Patrick, Kerzabi, Emily, Liu, Lei, Flor, Michael
Collaborative problem solving (CPS) is widely recognized as a critical 21st century skill. Efficiently coding communication data is a big challenge in scaling up research on assessing CPS. This paper reports the findings on using ChatGPT to directly code CPS chat data by benchmarking performance across multiple datasets and coding frameworks. We found that ChatGPT-based coding outperformed human coding in tasks where the discussions were characterized by colloquial languages but fell short in tasks where the discussions dealt with specialized scientific terminology and contexts. The findings offer practical guidelines for researchers to develop strategies for efficient and scalable analysis of communication data from CPS tasks.