Scientific Discovery
Unlearning as Ablation: Toward a Falsifiable Benchmark for Generative Scientific Discovery
Bold claims about AI's role in science-from "AGI will cure all diseases" to promises of radically accelerated discovery-raise a central epistemic question: do large language models (LLMs) truly generate new knowledge, or do they merely remix memorized fragments? We propose unlearning-as-ablation as a falsifiable probe of constructive scientific discovery. The idea is to systematically remove a target result together with its forget-closure (supporting lemmas, paraphrases, and multi-hop entailments) and then evaluate whether the model can re-derive the result from only permitted axioms and tools. Success would indicate generative capability beyond recall; failure would expose current limits. Unlike prevailing motivations for unlearning-privacy, copyright, or safety-our framing repositions it as an epistemic probe for AI-for-Science. We outline a minimal pilot in mathematics and algorithms to illustrate feasibility, and sketch how the same approach could later be extended to domains such as physics or chemistry. This is a position paper: our contribution is conceptual and methodological, not empirical. We aim to stimulate discussion on how principled ablation tests could help distinguish models that reconstruct knowledge from those that merely retrieve it, and how such probes might guide the next generation of AI-for-Science benchmarks.
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.62)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.57)
MA-COIR: Leveraging Semantic Search Index and Generative Models for Ontology-Driven Biomedical Concept Recognition
Liu, Shanshan, Nishida, Noriki, Munne, Rumana Ferdous, Tokunaga, Narumi, Yamagata, Yuki, Kozaki, Kouji, Matsumoto, Yuji
Recognizing biomedical concepts in the text is vital for ontology refinement, knowledge graph construction, and concept relationship discovery. However, traditional concept recognition methods, relying on explicit mention identification, often fail to capture complex concepts not explicitly stated in the text. To overcome this limitation, we introduce MA-COIR, a framework that reformulates concept recognition as an indexing-recognition task. By assigning semantic search indexes (ssIDs) to concepts, MA-COIR resolves ambiguities in ontology entries and enhances recognition efficiency. Using a pretrained BART-based model fine-tuned on small datasets, our approach reduces computational requirements to facilitate adoption by domain experts. Furthermore, we incorporate large language models (LLMs)-generated queries and synthetic data to improve recognition in low-resource settings. Experimental results on three scenarios (CDR, HPO, and HOIP) highlight the effectiveness of MA-COIR in recognizing both explicit and implicit concepts without the need for mention-level annotations during inference, advancing ontology-driven concept recognition in biomedical domain applications. Our code and constructed data are available at https://github.com/sl-633/macoir-master.
- Media > News (1.00)
- Health & Medicine (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- (3 more...)
- Europe > Netherlands > North Brabant > Eindhoven (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > France (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.42)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > New York (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.41)
Active Measurement: Efficient Estimation at Scale
Hamilton, Max, Lai, Jinlin, Zhao, Wenlong, Maji, Subhransu, Sheldon, Daniel
AI has the potential to transform scientific discovery by analyzing vast datasets with little human effort. However, current workflows often do not provide the accuracy or statistical guarantees that are needed. We introduce active measurement, a human-in-the-loop AI framework for scientific measurement. An AI model is used to predict measurements for individual units, which are then sampled for human labeling using importance sampling. With each new set of human labels, the AI model is improved and an unbiased Monte Carlo estimate of the total measurement is refined. Active measurement can provide precise estimates even with an imperfect AI model, and requires little human effort when the AI model is very accurate. We derive novel estimators, weighting schemes, and confidence intervals, and show that active measurement reduces estimation error compared to alternatives in several measurement tasks.
- Health & Medicine (1.00)
- Food & Agriculture > Agriculture (0.46)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.34)
- North America > United States > Georgia > Fulton County > Atlanta (0.05)
- North America > Canada > Quebec > Montreal (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.52)
On robust hypothesis testing with respect to Hellinger distance
We study the hypothesis testing problem where the observed samples need not come from either of the specified hypotheses (distributions). In such a situation, we would like our test to be robust to this misspecification and output the distribution closer in Hellinger distance. If the underlying distribution is close to being equidistant from the hypotheses, then this would not be possible. Our main result is quantifying how close the underlying distribution has to be to either of the hypotheses. We also study the composite testing problem, where each hypothesis is a Hellinger ball around a fixed distribution. A generalized likelihood ratio test is known to work for this problem. We give an alternate test for the same.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > India > Maharashtra > Mumbai (0.04)
Structural Enforcement of Statistical Rigor in AI-Driven Discovery: A Functional Architecture
Sequential statistical protocols require meticulous state management and robust error handling -- challenges naturally suited to functional programming. We present a functional architecture for structural enforcement of statistical rigor in automated research systems (AI-Scientists). These LLM-driven systems risk generating spurious discoveries through dynamic hypothesis testing. We introduce the Research monad, a Haskell eDSL that enforces sequential statistical protocols (e.g., Online FDR (false discovery rate) control) using a monad transformer stack. To address risks in hybrid architectures where LLMs generate imperative code, we employ Declarative Scaffolding -- generating rigid harnesses that structurally constrain execution and prevent methodological errors like data leakage. We validate this approach through large-scale simulation (N=2000 hypotheses) and an end-to-end case study, demonstrating essential defense-in-depth for automated science integrity.
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.76)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.70)
Medieval duke's remains recount his grisly murder
Science Archaeology Medieval duke's remains recount his grisly murder In 1272, Hungary's Béla of Macsó received over 23 sword gashes-and more. Breakthroughs, discoveries, and DIY tips sent every weekday. In 1272 CE, a Hungarian duke was murdered in cold blood. Details surrounding the grisly killing of the 13th century Hungarian duke named Béla of Macsó have remained murky for centuries. The duke met his demise at the hand of enemies, but far less is known about what motivated his killers or how the attack really unfolded.
- Retail (0.71)
- Health & Medicine (0.50)
- Media > Photography (0.50)
Accelerating scientific discovery with the common task framework
Kutz, J. Nathan, Battaglia, Peter, Brenner, Michael, Carlberg, Kevin, Hagberg, Aric, Ho, Shirley, Hoyer, Stephan, Lange, Henning, Lipson, Hod, Mahoney, Michael W., Noe, Frank, Welling, Max, Zanna, Laure, Zhu, Francis, Brunton, Steven L.
Machine learning (ML) and artificial intelligence (AI) algorithms are transforming and empowering the characterization and control of dynamic systems in the engineering, physical, and biological sciences. These emerging modeling paradigms require comparative metrics to evaluate a diverse set of scientific objectives, including forecasting, state reconstruction, generalization, and control, while also considering limited data scenarios and noisy measurements. We introduce a common task framework (CTF) for science and engineering, which features a growing collection of challenge data sets with a diverse set of practical and common objectives. The CTF is a critically enabling technology that has contributed to the rapid advance of ML/AI algorithms in traditional applications such as speech recognition, language processing, and computer vision. There is a critical need for the objective metrics of a CTF to compare the diverse algorithms being rapidly developed and deployed in practice today across science and engineering.
- North America > United States > Washington > King County > Seattle (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- (9 more...)
- Information Technology (0.68)
- Leisure & Entertainment > Games (0.46)
- Government > Regional Government > North America Government > United States Government (0.46)
- Energy (0.46)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Robots (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.64)