Machine Understanding of Scientific Language
–arXiv.org Artificial Intelligence
Scientific information expresses human understanding of nature. This knowledge is largely disseminated in different forms of text, including scientific papers, news articles, and discourse among people on social media. While important for accelerating our pursuit of knowledge, not all scientific text is faithful to the underlying science. As the volume of this text has burgeoned online in recent years, it has become a problem of societal importance to be able to identify the faithfulness of a given piece of scientific text automatically. This thesis is concerned with the cultivation of datasets, methods, and tools for machine understanding of scientific language, in order to analyze and understand science communication at scale. To arrive at this, I present several contributions in three areas of natural language processing and machine learning: automatic fact checking, learning with limited data, and scientific text processing. These contributions include new methods and resources for identifying check-worthy claims, adversarial claim generation, multi-source domain adaptation, learning from crowd-sourced labels, cite-worthiness detection, zero-shot scientific fact checking, detecting exaggerated scientific claims, and modeling degrees of information change in science communication. Critically, I demonstrate how the research outputs of this thesis are useful for effectively learning from limited amounts of scientific text in order to identify misinformative scientific statements and generate new insights into the science communication process
arXiv.org Artificial Intelligence
Jul-1-2025
- Country:
- Africa
- Ethiopia > Addis Ababa
- Addis Ababa (0.04)
- Uganda (0.04)
- Ethiopia > Addis Ababa
- Asia
- China > Hong Kong (0.04)
- Japan (0.04)
- South Korea > Seoul
- Seoul (0.04)
- Turkmenistan > Aspheron Ridge (0.04)
- Europe
- Czechia > Prague (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Mediterranean Sea (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Middle East > Cyprus (0.04)
- Italy
- France
- Occitanie > Hérault
- Montpellier (0.04)
- Provence-Alpes-Côte d'Azur > Bouches-du-Rhône
- Marseille (0.04)
- Occitanie > Hérault
- Portugal > Lisbon
- Lisbon (0.04)
- United Kingdom > England
- Hampshire > Southampton (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Netherlands (0.04)
- Germany > Berlin (0.04)
- Bulgaria > Varna Province
- Varna (0.04)
- North America
- Canada
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.14)
- Quebec > Montreal (0.04)
- British Columbia > Metro Vancouver Regional District
- Dominican Republic (0.04)
- United States
- Colorado > Boulder County
- Boulder (0.04)
- Alaska > Anchorage Municipality
- Anchorage (0.04)
- Georgia > Fulton County
- Atlanta (0.04)
- Massachusetts > Hampshire County
- Amherst (0.04)
- New Mexico > Santa Fe County
- Santa Fe (0.04)
- Wisconsin > Dane County
- Madison (0.04)
- Michigan (0.04)
- Oregon > Multnomah County
- Portland (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- California > San Francisco County
- San Francisco (0.13)
- Maryland > Baltimore (0.14)
- Arkansas (0.04)
- Wyoming > Campbell County (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.13)
- Colorado > Boulder County
- Canada
- Oceania > Australia
- New South Wales > Sydney (0.04)
- Victoria > Melbourne (0.04)
- South America > Colombia
- Meta Department > Villavicencio (0.04)
- Africa
- Genre:
- Instructional Material (0.92)
- Overview (1.00)
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Promising Solution (0.92)
- Industry:
- Transportation (0.67)
- Education (1.00)
- Media > News (1.00)
- Health & Medicine
- Consumer Health (1.00)
- Pharmaceuticals & Biotechnology (1.00)
- Therapeutic Area
- Cardiology/Vascular Diseases (0.92)
- Endocrinology > Diabetes (0.67)
- Energy (0.67)
- Information Technology (0.67)
- Leisure & Entertainment (0.67)
- Consumer Products & Services (0.67)
- Government > Regional Government
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning
- Inductive Learning (1.00)
- Learning Graphical Models > Directed Networks
- Bayesian Learning (0.92)
- Neural Networks > Deep Learning (1.00)
- Statistical Learning (1.00)
- Natural Language
- Grammars & Parsing (0.92)
- Large Language Model (0.88)
- Text Processing (1.00)
- Representation & Reasoning > Uncertainty
- Bayesian Inference (0.67)
- Machine Learning
- Communications > Social Media (1.00)
- Artificial Intelligence
- Information Technology