South America
Towards an AI co-scientist
Gottweis, Juraj, Weng, Wei-Hung, Daryin, Alexander, Tu, Tao, Palepu, Anil, Sirkovic, Petar, Myaskovsky, Artiom, Weissenberger, Felix, Rong, Keran, Tanno, Ryutaro, Saab, Khaled, Popovici, Dan, Blum, Jacob, Zhang, Fan, Chou, Katherine, Hassidim, Avinatan, Gokturk, Burak, Vahdat, Amin, Kohli, Pushmeet, Matias, Yossi, Carroll, Andrew, Kulkarni, Kavita, Tomasev, Nenad, Guan, Yuan, Dhillon, Vikram, Vaishnav, Eeshit Dhaval, Lee, Byron, Costa, Tiago R D, Penadés, José R, Peltz, Gary, Xu, Yunhan, Pawlosky, Annalisa, Karthikesalingam, Alan, Natarajan, Vivek
Scientific discovery relies on scientists generating novel hypotheses that undergo rigorous experimental validation. To augment this process, we introduce an AI co-scientist, a multi-agent system built on Gemini 2.0. The AI co-scientist is intended to help uncover new, original knowledge and to formulate demonstrably novel research hypotheses and proposals, building upon prior evidence and aligned to scientist-provided research objectives and guidance. The system's design incorporates a generate, debate, and evolve approach to hypothesis generation, inspired by the scientific method and accelerated by scaling test-time compute. Key contributions include: (1) a multi-agent architecture with an asynchronous task execution framework for flexible compute scaling; (2) a tournament evolution process for self-improving hypotheses generation. Automated evaluations show continued benefits of test-time compute, improving hypothesis quality. While general purpose, we focus development and validation in three biomedical areas: drug repurposing, novel target discovery, and explaining mechanisms of bacterial evolution and anti-microbial resistance. For drug repurposing, the system proposes candidates with promising validation findings, including candidates for acute myeloid leukemia that show tumor inhibition in vitro at clinically applicable concentrations. For novel target discovery, the AI co-scientist proposed new epigenetic targets for liver fibrosis, validated by anti-fibrotic activity and liver cell regeneration in human hepatic organoids. Finally, the AI co-scientist recapitulated unpublished experimental results via a parallel in silico discovery of a novel gene transfer mechanism in bacterial evolution. These results, detailed in separate, co-timed reports, demonstrate the potential to augment biomedical and scientific discovery and usher an era of AI empowered scientists.
Mixture models for data with unknown distributions
We describe and analyze a broad class of mixture models for real-valued multivariate data in which the probability density of observations within each component of the model is represented as an arbitrary combination of basis functions. Fits to these models give us a way to cluster data with distributions of unknown form, including strongly non-Gaussian or multimodal distributions, and return both a division of the data and an estimate of the distributions, effectively performing clustering and density estimation within each cluster at the same time. We describe two fitting methods, one using an expectation-maximization (EM) algorithm and the other a Bayesian non-parametric method using a collapsed Gibbs sampler. The former is numerically efficient, but gives only point estimates of the probability densities. The latter is more computationally demanding but returns a full Bayesian posterior and also an estimate of the number of components. We demonstrate our methods with a selection of illustrative applications and give code implementing both algorithms.
Assassin's Creed maker confirms leaked game footage is real
Assassin's Creed maker confirms leaked game footage is real 38 minutes agoTom RichardsonBBC NewsbeatUbisoftAssassin's Creed Shadows is seen as a pivotal release for Ubisoft The makers of Assassin's Creed Shadows - the forthcoming entry in one of video gaming's biggest franchises - have confirmed footage leaked online is real. Some players managed to get their hands on the game - due to be released on 20 March - ahead of its official release. Developer Ubisoft said gameplay videos shared online "did not represent the final quality of the game". In a statement posted online, the company said it was "still working on patches" and urged fans not to share spoilers. Shadows will be the first Assassin's Creed instalment set in Japan - something fans have long been asking for.
AI-Instruments: Embodying Prompts as Instruments to Abstract & Reflect Graphical Interface Commands as General-Purpose Tools
Riche, Nathalie, Offenwanger, Anna, Gmeiner, Frederic, Brown, David, Romat, Hugo, Pahud, Michel, Marquardt, Nicolai, Inkpen, Kori, Hinckley, Ken
Chat-based prompts respond with verbose linear-sequential texts, making it difficult to explore and refine ambiguous intents, back up and reinterpret, or shift directions in creative AI-assisted design work. AI-Instruments instead embody "prompts" as interface objects via three key principles: (1) Reification of user-intent as reusable direct-manipulation instruments; (2) Reflection of multiple interpretations of ambiguous user-intents (Reflection-in-intent) as well as the range of AI-model responses (Reflection-in-response) to inform design "moves" towards a desired result; and (3) Grounding to instantiate an instrument from an example, result, or extrapolation directly from another instrument. Further, AI-Instruments leverage LLM's to suggest, vary, and refine new instruments, enabling a system that goes beyond hard-coded functionality by generating its own instrumental controls from content. We demonstrate four technology probes, applied to image generation, and qualitative insights from twelve participants, showing how AI-Instruments address challenges of intent formulation, steering via direct manipulation, and non-linear iterative workflows to reflect and resolve ambiguous intents.
Medical Hallucinations in Foundation Models and Their Impact on Healthcare
Kim, Yubin, Jeong, Hyewon, Chen, Shan, Li, Shuyue Stella, Lu, Mingyu, Alhamoud, Kumail, Mun, Jimin, Grau, Cristina, Jung, Minseok, Gameiro, Rodrigo, Fan, Lizhou, Park, Eugene, Lin, Tristan, Yoon, Joonsik, Yoon, Wonjin, Sap, Maarten, Tsvetkov, Yulia, Liang, Paul, Xu, Xuhai, Liu, Xin, McDuff, Daniel, Lee, Hyeonhoon, Park, Hae Won, Tulebaev, Samir, Breazeal, Cynthia
Foundation Models that are capable of processing and generating multi-modal data have transformed AI's role in medicine. However, a key limitation of their reliability is hallucination, where inaccurate or fabricated information can impact clinical decisions and patient safety. We define medical hallucination as any instance in which a model generates misleading medical content. This paper examines the unique characteristics, causes, and implications of medical hallucinations, with a particular focus on how these errors manifest themselves in real-world clinical scenarios. Our contributions include (1) a taxonomy for understanding and addressing medical hallucinations, (2) benchmarking models using medical hallucination dataset and physician-annotated LLM responses to real medical cases, providing direct insight into the clinical impact of hallucinations, and (3) a multi-national clinician survey on their experiences with medical hallucinations. Our results reveal that inference techniques such as Chain-of-Thought (CoT) and Search Augmented Generation can effectively reduce hallucination rates. However, despite these improvements, non-trivial levels of hallucination persist. These findings underscore the ethical and practical imperative for robust detection and mitigation strategies, establishing a foundation for regulatory policies that prioritize patient safety and maintain clinical integrity as AI becomes more integrated into healthcare. The feedback from clinicians highlights the urgent need for not only technical advances but also for clearer ethical and regulatory guidelines to ensure patient safety. A repository organizing the paper resources, summaries, and additional information is available at https://github.com/mitmedialab/medical hallucination.
Complex Networks for Pattern-Based Data Classification
Chire, Josimar, Mahmood, Khalid, Liang, Zhao
Data classification techniques partition the data or feature space into smaller sub-spaces, each corresponding to a specific class. To classify into subspaces, physical features e.g., distance and distributions are utilized. This approach is challenging for the characterization of complex patterns that are embedded in the dataset. However, complex networks remain a powerful technique for capturing internal relationships and class structures, enabling High-Level Classification. Although several complex network-based classification techniques have been proposed, high-level classification by leveraging pattern formation to classify data has not been utilized. In this work, we present two network-based classification techniques utilizing unique measures derived from the Minimum Spanning Tree and Single Source Shortest Path. These network measures are evaluated from the data patterns represented by the inherent network constructed from each class. We have applied our proposed techniques to several data classification scenarios including synthetic and real-world datasets. Compared to the existing classic high-level and machine-learning classification techniques, we have observed promising numerical results for our proposed approaches. Furthermore, the proposed models demonstrate the following distinguished features in comparison to the previous high-level classification techniques: (1) A single network measure is introduced to characterize the data pattern, eliminating the need to determine weight parameters among network measures. Therefore, the model is largely simplified, while obtaining better classification results. (2) The metrics proposed are sensitive and used for classification with competitive results.
Generative Artificial Intelligence: Evolving Technology, Growing Societal Impact, and Opportunities for Information Systems Research
Storey, Veda C., Yue, Wei Thoo, Zhao, J. Leon, Lukyanenko, Roman
The continuing, explosive developments in generative artificial intelligence (GenAI), built on large language models and related algorithms, has led to much excitement and speculation about the potential impact of this new technology. Claims include AI being poised to revolutionize business and society and dramatically change personal life. However, it remains unclear exactly how this technology, with its significantly distinct features from past AI technologies, has transformative potential. Nor is it clear how researchers in information systems (IS) should respond. In this paper, we consider the evolving and emerging trends of AI in order to examine its present and predict its future impacts. Many existing papers on GenAI are either too technical for most IS researchers or lack the depth needed to appreciate the potential impacts of GenAI. We, therefore, attempt to bridge the technical and organizational communities of GenAI from a system-oriented sociotechnical perspective. Specifically, we explore the unique features of GenAI, which are rooted in the continued change from symbolism to connectionism, and the deep systemic and inherent properties of human-AI ecosystems. We retrace the evolution of AI that proceeded the level of adoption, adaption, and use found today, in order to propose future research on various impacts of GenAI in both business and society within the context of information systems research. Our efforts are intended to contribute to the creation of a well-structured research agenda in the IS community to support innovative strategies and operations enabled by this new wave of AI.
Data Augmentation for Instruction Following Policies via Trajectory Segmentation
Höpner, Niklas, Tiddi, Ilaria, van Hoof, Herke
The scalability of instructable agents in robotics or gaming is often hindered by limited data that pairs instructions with agent trajectories. However, large datasets of unannotated trajectories containing sequences of various agent behaviour (play trajectories) are often available. In a semi-supervised setup, we explore methods to extract labelled segments from play trajectories. The goal is to augment a small annotated dataset of instruction-trajectory pairs to improve the performance of an instruction-following policy trained downstream via imitation learning. Assuming little variation in segment length, recent video segmentation methods can effectively extract labelled segments. To address the constraint of segment length, we propose Play Segmentation (PS), a probabilistic model that finds maximum likely segmentations of extended subsegments, while only being trained on individual instruction segments. Our results in a game environment and a simulated robotic gripper setting underscore the importance of segmentation; randomly sampled segments diminish performance, while incorporating labelled segments from PS improves policy performance to the level of a policy trained on twice the amount of labelled data.
Anything Goes? A Crosslinguistic Study of (Im)possible Language Learning in LMs
Yang, Xiulin, Aoyama, Tatsuya, Yao, Yuekun, Wilcox, Ethan
Do LLMs offer insights into human language learning? A common argument against this idea is that because their architecture and training paradigm are so vastly different from humans, LLMs can learn arbitrary inputs as easily as natural languages. In this paper, we test this claim by training LMs to model impossible and typologically unattested languages. Unlike previous work, which has focused exclusively on English, we conduct experiments on 12 natural languages from 4 language families. Our results show that while GPT-2 small can primarily distinguish attested languages from their impossible counterparts, it does not achieve perfect separation between all the attested languages and all the impossible ones. We further test whether GPT-2 small distinguishes typologically attested from unattested languages with different NP orders by manipulating word order based on Greenberg's Universal 20. We find that the model's perplexity scores do not distinguish attested vs. unattested word orders, as long as the unattested variants maintain constituency structure. These findings suggest that language models exhibit some human-like inductive biases, though these biases are weaker than those found in human learners.
Plutus: Benchmarking Large Language Models in Low-Resource Greek Finance
Peng, Xueqing, Papadopoulos, Triantafillos, Soufleri, Efstathia, Giannouris, Polydoros, Xiang, Ruoyu, Wang, Yan, Qian, Lingfei, Huang, Jimin, Xie, Qianqian, Ananiadou, Sophia
Despite Greece's pivotal role in the global economy, large language models (LLMs) remain underexplored for Greek financial context due to the linguistic complexity of Greek and the scarcity of domain-specific datasets. Previous efforts in multilingual financial natural language processing (NLP) have exposed considerable performance disparities, yet no dedicated Greek financial benchmarks or Greek-specific financial LLMs have been developed until now. To bridge this gap, we introduce Plutus-ben, the first Greek Financial Evaluation Benchmark, and Plutus-8B, the pioneering Greek Financial LLM, fine-tuned with Greek domain-specific data. Plutus-ben addresses five core financial NLP tasks in Greek: numeric and textual named entity recognition, question answering, abstractive summarization, and topic classification, thereby facilitating systematic and reproducible LLM assessments. To underpin these tasks, we present three novel, high-quality Greek financial datasets, thoroughly annotated by expert native Greek speakers, augmented by two existing resources. Our comprehensive evaluation of 22 LLMs on Plutus-ben reveals that Greek financial NLP remains challenging due to linguistic complexity, domain-specific terminology, and financial reasoning gaps. These findings underscore the limitations of cross-lingual transfer, the necessity for financial expertise in Greek-trained models, and the challenges of adapting financial LLMs to Greek text. We release Plutus-ben, Plutus-8B, and all associated datasets publicly to promote reproducible research and advance Greek financial NLP, fostering broader multilingual inclusivity in finance.