Scientific Discovery
MA-COIR: Leveraging Semantic Search Index and Generative Models for Ontology-Driven Biomedical Concept Recognition
Liu, Shanshan, Nishida, Noriki, Munne, Rumana Ferdous, Tokunaga, Narumi, Yamagata, Yuki, Kozaki, Kouji, Matsumoto, Yuji
Recognizing biomedical concepts in the text is vital for ontology refinement, knowledge graph construction, and concept relationship discovery. However, traditional concept recognition methods, relying on explicit mention identification, often fail to capture complex concepts not explicitly stated in the text. To overcome this limitation, we introduce MA-COIR, a framework that reformulates concept recognition as an indexing-recognition task. By assigning semantic search indexes (ssIDs) to concepts, MA-COIR resolves ambiguities in ontology entries and enhances recognition efficiency. Using a pretrained BART-based model fine-tuned on small datasets, our approach reduces computational requirements to facilitate adoption by domain experts. Furthermore, we incorporate large language models (LLMs)-generated queries and synthetic data to improve recognition in low-resource settings. Experimental results on three scenarios (CDR, HPO, and HOIP) highlight the effectiveness of MA-COIR in recognizing both explicit and implicit concepts without the need for mention-level annotations during inference, advancing ontology-driven concept recognition in biomedical domain applications. Our code and constructed data are available at https://github.com/sl-633/macoir-master.
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- (3 more...)
- Media > News (1.00)
- Health & Medicine (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- (3 more...)
Unlearning as Ablation: Toward a Falsifiable Benchmark for Generative Scientific Discovery
Bold claims about AI's role in science-from "AGI will cure all diseases" to promises of radically accelerated discovery-raise a central epistemic question: do large language models (LLMs) truly generate new knowledge, or do they merely remix memorized fragments? We propose unlearning-as-ablation as a falsifiable probe of constructive scientific discovery. The idea is to systematically remove a target result together with its forget-closure (supporting lemmas, paraphrases, and multi-hop entailments) and then evaluate whether the model can re-derive the result from only permitted axioms and tools. Success would indicate generative capability beyond recall; failure would expose current limits. Unlike prevailing motivations for unlearning-privacy, copyright, or safety-our framing repositions it as an epistemic probe for AI-for-Science. We outline a minimal pilot in mathematics and algorithms to illustrate feasibility, and sketch how the same approach could later be extended to domains such as physics or chemistry. This is a position paper: our contribution is conceptual and methodological, not empirical. We aim to stimulate discussion on how principled ablation tests could help distinguish models that reconstruct knowledge from those that merely retrieve it, and how such probes might guide the next generation of AI-for-Science benchmarks.
- North America > United States > California > Santa Clara County > San Jose (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Asia > Singapore (0.04)
- (2 more...)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.62)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.57)
- Europe > Netherlands > North Brabant > Eindhoven (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > France (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.42)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > New York (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.41)
Active Measurement: Efficient Estimation at Scale
Hamilton, Max, Lai, Jinlin, Zhao, Wenlong, Maji, Subhransu, Sheldon, Daniel
AI has the potential to transform scientific discovery by analyzing vast datasets with little human effort. However, current workflows often do not provide the accuracy or statistical guarantees that are needed. We introduce active measurement, a human-in-the-loop AI framework for scientific measurement. An AI model is used to predict measurements for individual units, which are then sampled for human labeling using importance sampling. With each new set of human labels, the AI model is improved and an unbiased Monte Carlo estimate of the total measurement is refined. Active measurement can provide precise estimates even with an imperfect AI model, and requires little human effort when the AI model is very accurate. We derive novel estimators, weighting schemes, and confidence intervals, and show that active measurement reduces estimation error compared to alternatives in several measurement tasks.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- Europe > Austria (0.04)
- Health & Medicine (1.00)
- Food & Agriculture > Agriculture (0.46)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.34)
- North America > United States > Georgia > Fulton County > Atlanta (0.05)
- North America > Canada > Quebec > Montreal (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.52)
Nonzero-sum Adversarial Hypothesis Testing Games
Sarath Yasodharan, Patrick Loiseau
We study nonzero-sum hypothesis testing games that arise in the context of adversarial classification, in both the Bayesian as well as the Neyman-Pearson frameworks. We first show that these games admit mixed strategy Nash equilibria, and then we examine some interesting concentration phenomena of these equilibria.
- Information Technology > Security & Privacy (1.00)
- Leisure & Entertainment > Games (0.88)
- Information Technology > Game Theory (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.65)
work is the first analysis of a nonzero-sum adversarial hypothesis testing model bridging multiple areas and is meant to
We thank the reviewers for their positive feedback and valuable suggestions. NE and we hope that our work is a first step in that direction. We have indeed not looked into the problem of computing a NE in our game. In the numerical example of Section 5 in one dimension (see Appendix C.1 for As the reviewers pointed out, we agree that this is in fact a strong assumption. Lemma A.6 may always hold.
- Information Technology > Game Theory (0.76)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.41)
- North America > United States > Georgia > Fulton County > Atlanta (0.05)
- North America > Canada > Quebec > Montreal (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.52)
A unified framework for bandit multiple testing
In bandit multiple hypothesis testing, each arm corresponds to a different null hypothesis that we wish to test, and the goal is to design adaptive algorithms that correctly identify large set of interesting arms (true discoveries), while only mistakenly identifying a few uninteresting ones (false discoveries).
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
- Information Technology > Data Science > Data Mining > Big Data (0.47)