scientific knowledge
- Europe > Switzerland (0.04)
- Asia > Japan > Kyūshū & Okinawa > Kyūshū (0.04)
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
- Africa > Ghana (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.68)
- Oceania > Australia > New South Wales > Sydney (0.04)
- Europe > Switzerland (0.04)
- Asia > Japan > Kyūshū & Okinawa > Okinawa (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
The Denario project: Deep knowledge AI agents for scientific discovery
Villaescusa-Navarro, Francisco, Bolliet, Boris, Villanueva-Domingo, Pablo, Bayer, Adrian E., Acquah, Aidan, Amancharla, Chetana, Barzilay-Siegal, Almog, Bermejo, Pablo, Bilodeau, Camille, Ramírez, Pablo Cárdenas, Cranmer, Miles, França, Urbano L., Hahn, ChangHoon, Jiang, Yan-Fei, Jimenez, Raul, Lee, Jun-Young, Lerario, Antonio, Mamun, Osman, Meier, Thomas, Ojha, Anupam A., Protopapas, Pavlos, Roy, Shimanto, Spergel, David N., Tarancón-Álvarez, Pedro, Tiwari, Ujjwal, Viel, Matteo, Wadekar, Digvijay, Wang, Chi, Wang, Bonny Y., Xu, Licong, Yovel, Yossi, Yue, Shuwen, Zhou, Wen-Han, Zhu, Qiyao, Zou, Jiajun, Zubeldia, Íñigo
We present Denario, an AI multi-agent system designed to serve as a scientific research assistant. Denario can perform many different tasks, such as generating ideas, checking the literature, developing research plans, writing and executing code, making plots, and drafting and reviewing a scientific paper. The system has a modular architecture, allowing it to handle specific tasks, such as generating an idea, or carrying out end-to-end scientific analysis using Cmbagent as a deep-research backend. In this work, we describe in detail Denario and its modules, and illustrate its capabilities by presenting multiple AI-generated papers generated by it in many different scientific disciplines such as astrophysics, biology, biophysics, biomedical informatics, chemistry, material science, mathematical physics, medicine, neuroscience and planetary science. Denario also excels at combining ideas from different disciplines, and we illustrate this by showing a paper that applies methods from quantum physics and machine learning to astrophysical data. We report the evaluations performed on these papers by domain experts, who provided both numerical scores and review-like feedback. We then highlight the strengths, weaknesses, and limitations of the current system. Finally, we discuss the ethical implications of AI-driven research and reflect on how such technology relates to the philosophy of science. We publicly release the code at https://github.com/AstroPilot-AI/Denario. A Denario demo can also be run directly on the web at https://huggingface.co/spaces/astropilot-ai/Denario, and the full app will be deployed on the cloud.
- North America > United States > Illinois > Cook County > Chicago (0.76)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.63)
- North America > United States > Texas > Travis County > Austin (0.14)
- (25 more...)
- Workflow (1.00)
- Research Report > New Finding (1.00)
- Oceania > Australia > New South Wales > Sydney (0.04)
- Europe > Switzerland (0.04)
- Asia > Japan > Kyūshū & Okinawa > Okinawa (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
- Europe > Switzerland (0.04)
- Asia > Japan > Kyūshū & Okinawa > Kyūshū (0.04)
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
- Africa > Ghana (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.68)
ScienceMeter: Tracking Scientific Knowledge Updates in Language Models
Wang, Yike, Feng, Shangbin, Tsvetkov, Yulia, Hajishirzi, Hannaneh
Large Language Models (LLMs) are increasingly used to support scientific research, but their knowledge of scientific advancements can quickly become outdated. We introduce ScienceMeter, a new framework for evaluating scientific knowledge update methods over scientific knowledge spanning the past, present, and future. ScienceMeter defines three metrics: knowledge preservation, the extent to which models' understanding of previously learned papers are preserved; knowledge acquisition, how well scientific claims from newly introduced papers are acquired; and knowledge projection, the ability of the updated model to anticipate or generalize to related scientific claims that may emerge in the future. Using ScienceMeter, we examine the scientific knowledge of LLMs on claim judgment and generation tasks across a curated dataset of 15,444 scientific papers and 30,888 scientific claims from ten domains including medicine, biology, materials science, and computer science. We evaluate five representative knowledge update approaches including training- and inference-time methods. With extensive experiments, we find that the best-performing knowledge update methods can preserve only 85.9% of existing knowledge, acquire 71.7% of new knowledge, and project 37.7% of future knowledge. Inference-based methods work for larger models, whereas smaller models require training to achieve comparable performance. Cross-domain analysis reveals that performance on these objectives is correlated. Even when applying on specialized scientific LLMs, existing knowledge update methods fail to achieve these objectives collectively, underscoring that developing robust scientific knowledge update mechanisms is both crucial and challenging.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Colorado > Douglas County > Highlands Ranch (0.04)
- (5 more...)
SciMantify -- A Hybrid Approach for the Evolving Semantification of Scientific Knowledge
John, Lena, Farfar, Kheir Eddine, Auer, Sören, Karras, Oliver
Scientific publications, primarily digitized as PDFs, remain static and unstructured, limiting the accessibility and reusability of the contained knowledge. At best, scientific knowledge from publications is provided in tabular formats, which lack semantic context. A more flexible, structured, and semantic representation is needed to make scientific knowledge understandable and processable by both humans and machines. We propose an evolution model of knowledge representation, inspired by the 5-star Linked Open Data (LOD) model, with five stages and defined criteria to guide the stepwise transition from a digital artifact, such as a PDF, to a semantic representation integrated in a knowledge graph (KG). Based on an exemplary workflow implementing the entire model, we developed a hybrid approach, called SciMantify, leveraging tabular formats of scientific knowledge, e.g., results from secondary studies, to support its evolving semantification. In the approach, humans and machines collaborate closely by performing semantic annotation tasks (SATs) and refining the results to progressively improve the semantic representation of scientific knowledge. We implemented the approach in the Open Research Knowledge Graph (ORKG), an established platform for improving the findability, accessibility, interoperability, and reusability of scientific knowledge. A preliminary user experiment showed that the approach simplifies the preprocessing of scientific knowledge, reduces the effort for the evolving semantification, and enhances the knowledge representation through better alignment with the KG structures.
NeurIPS 2025 E2LM Competition : Early Training Evaluation of Language Models
Yagoubi, Mouadh, Dahou, Yasser, Mokeddem, Billel, Belkada, Younes, Le-Khac, Phuc H., Boussaha, Basma El Amel, Alami, Reda, Zuo, Jingwei, Marsili, Damiano, Farooq, Mugariya, Lalmas, Mounia, Gkioxari, Georgia, Gallinari, Patrick, Torr, Philip, Hacid, Hakim
Existing benchmarks have proven effective for assessing the performance of fully trained large language models. However, we find striking differences in the early training stages of small models, where benchmarks often fail to provide meaningful or discriminative signals. To explore how these differences arise, this competition tackles the challenge of designing scientific knowledge evaluation tasks specifically tailored for measuring early training progress of language models. Participants are invited to develop novel evaluation methodologies or adapt existing benchmarks to better capture performance differences among language models. To support this effort, we provide three pre-trained small models (0.5B, 1B, and 3B parameters), along with intermediate checkpoints sampled during training up to 200B tokens. All experiments and development work can be run on widely available free cloud-based GPU platforms, making participation accessible to researchers with limited computational resources. Submissions will be evaluated based on three criteria: the quality of the performance signal they produce, the consistency of model rankings at 1 trillion tokens of training, and their relevance to the scientific knowledge domain. By promoting the design of tailored evaluation strategies for early training, this competition aims to attract a broad range of participants from various disciplines, including those who may not be machine learning experts or have access to dedicated GPU resources. Ultimately, this initiative seeks to make foundational LLM research more systematic and benchmark-informed from the earliest phases of model development.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- (10 more...)
- Research Report (0.82)
- Personal (0.67)
- Information Technology (0.66)
- Education > Educational Setting > Higher Education (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
On the definition and importance of interpretability in scientific machine learning
Rowan, Conor, Doostan, Alireza
Though neural networks trained on large datasets have been successfully used to describe and predict many physical phenomena, there is a sense among scientists that, unlike traditional scientific models comprising simple mathematical expressions, their findings cannot be integrated into the body of scientific knowledge. Critics of machine learning's inability to produce human-understandable relationships have converged on the concept of "interpretability" as its point of departure from more traditional forms of science. As the growing interest in interpretability has shown, researchers in the physical sciences seek not just predictive models, but also to uncover the fundamental principles that govern a system of interest. However, clarity around a definition of interpretability and the precise role that it plays in science is lacking in the literature. In this work, we argue that researchers in equation discovery and symbolic regression tend to conflate the concept of sparsity with interpretability. We review key papers on interpretable machine learning from outside the scientific community and argue that, though the definitions and methods they propose can inform questions of interpretability for scientific machine learning (SciML), they are inadequate for this new purpose. Noting these deficiencies, we propose an operational definition of interpretability for the physical sciences. Our notion of interpretability emphasizes understanding of the mechanism over mathematical sparsity. Innocuous though it may seem, this emphasis on mechanism shows that sparsity is often unnecessary. It also questions the possibility of interpretable scientific discovery when prior knowledge is lacking. We believe a precise and philosophically informed definition of interpretability in SciML will help focus research efforts toward the most significant obstacles to realizing a data-driven scientific future.
- North America > United States > Colorado > Boulder County > Boulder (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Asia > India > Tripura (0.04)
- (4 more...)
- Overview (0.88)
- Research Report (0.64)
- Health & Medicine (1.00)
- Energy (1.00)
- Government > Regional Government > North America Government > United States Government (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.88)
Agentic Publications: An LLM-Driven Framework for Interactive Scientific Publishing, Supplementing Traditional Papers with AI-Powered Knowledge Systems
Pugliese, Roberto, Kourousias, George, Venier, Francesco, Costa, Grazia Garlatti
The exponential growth of scientific literature presents significant challenges for researchers navigating the complex knowledge landscape. We propose "Agentic Publications", a novel LLM-driven framework complementing traditional publishing by transforming papers into interactive knowledge systems. Our architecture integrates structured data with unstructured content through retrieval-augmented generation and multi-agent verification. The framework offers interfaces for both humans and machines, combining narrative explanations with machine-readable outputs while addressing ethical considerations through automated validation and transparent governance. Key features include continuous knowledge updates, automatic integration of new findings, and customizable detail levels. Our proof-of-concept demonstrates multilingual interaction, API accessibility, and structured knowledge representation through vector databases, knowledge graphs, and verification agents. This approach enhances scientific communication across disciplines, improving efficiency and collaboration while preserving traditional publishing pathways, particularly valuable for interdisciplinary fields where knowledge integration remains challenging.
- North America > United States (0.14)
- Europe > Italy > Friuli Venezia Giulia > Trieste Province > Trieste (0.05)
- Africa > Comoros > Grande Comore > Moroni (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine (1.00)
- Information Technology > Security & Privacy (0.92)
- Education > Educational Setting (0.67)