AITopics | Musen, Mark A.

Collaborating Authors

Musen, Mark A.

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Use of a Structured Knowledge Base Enhances Metadata Curation by Large Language Models

Sundaram, Sowmya S., Solomon, Benjamin, Khatri, Avani, Laumas, Anisha, Khatri, Purvesh, Musen, Mark A.

arXiv.org Artificial IntelligenceJul-10-2024

Metadata play a crucial role in ensuring the findability, accessibility, interoperability, and reusability of datasets. This paper investigates the potential of large language models (LLMs), specifically GPT-4, to improve adherence to metadata standards. We conducted experiments on 200 random data records describing human samples relating to lung cancer from the NCBI BioSample repository, evaluating GPT-4's ability to suggest edits for adherence to metadata standards. We computed the adherence accuracy of field name-field value pairs through a peer review process, and we observed a marginal average improvement in adherence to the standard data dictionary from 79% to 80% (p<0.5). We then prompted GPT-4 with domain information in the form of the textual descriptions of CEDAR templates and recorded a significant improvement to 97% from 79% (p<0.01). These results indicate that, while LLMs may not be able to correct legacy metadata to ensure satisfactory adherence to standards when unaided, they do show promise for use in automated metadata curation when integrated with a structured knowledge base. Introduction Data sharing, a pivotal requirement for good science that is now required by most funding agencies, continues to be a challenging prospect.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2404.05893

Country:

North America > United States > California (0.14)
North America > United States > Texas (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (0.50)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.79)

Add feedback

Making Metadata More FAIR Using Large Language Models

Sundaram, Sowmya S., Musen, Mark A.

arXiv.org Artificial IntelligenceJul-24-2023

With the global increase in experimental data artifacts, harnessing them in a unified fashion leads to a major stumbling block - bad metadata. To bridge this gap, this work presents a Natural Language Processing (NLP) informed application, called FAIRMetaText, that compares metadata. Specifically, FAIRMetaText analyzes the natural language descriptions of metadata and provides a mathematical similarity measure between two terms. This measure can then be utilized for analyzing varied metadata, by suggesting terms for compliance or grouping similar terms for identification of replaceable terms. The efficacy of the algorithm is presented qualitatively and quantitatively on publicly available research artifacts and demonstrates large gains across metadata related tasks through an in-depth study of a wide variety of Large Language Models (LLMs). This software can drastically reduce the human effort in sifting through various natural language metadata while employing several experimental datasets on the same topic.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.4126/FRL01-006444995

2307.13085

Country: North America > United States > California (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

An Empirical Meta-analysis of the Life Sciences (Linked?) Open Data on the Web

Kamdar, Maulik R., Musen, Mark A.

arXiv.org Artificial IntelligenceJun-7-2020

While the biomedical community has published several "open data" sources in the last decade, most researchers still endure severe logistical and technical challenges to discover, query, and integrate heterogeneous data and knowledge from multiple sources. To tackle these challenges, the community has experimented with Semantic Web and linked data technologies to create the Life Sciences Linked Open Data (LSLOD) cloud. In this paper, we extract schemas from more than 80 publicly available biomedical linked data graphs into an LSLOD schema graph and conduct an empirical meta-analysis to evaluate the extent of semantic heterogeneity across the LSLOD cloud. We observe that several LSLOD sources exist as stand-alone data sources that are not inter-linked with other sources, use unpublished schemas with minimal reuse or mappings, and have elements that are not useful for data integration from a biomedical perspective. We envision that the LSLOD schema graph and the findings from this research will aid researchers who wish to query and integrate data and knowledge from multiple biomedical sources simultaneously on the Web.

oncology, ontology, semantic web, (21 more...)

arXiv.org Artificial Intelligence

2006.04161

Country: North America > United States (0.93)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Knowledge Management (1.00)
Information Technology > Information Management (1.00)
Information Technology > Databases (1.00)
(6 more...)

Add feedback

WebProt\'eg\'e: A Cloud-Based Ontology Editor

Horridge, Matthew, Gonçalves, Rafael S., Nyulas, Csongor I., Tudorache, Tania, Musen, Mark A.

arXiv.org Artificial IntelligenceMar-5-2019

We present WebProt\'eg\'e, a tool to develop ontologies represented in the Web Ontology Language (OWL). WebProt\'eg\'e is a cloud-based application that allows users to collaboratively edit OWL ontologies, and it is available for use at https://webprotege.stanford.edu. WebProt\'ege\'e currently hosts more than 68,000 OWL ontology projects and has over 50,000 user accounts. In this paper, we detail the main new features of the latest version of WebProt\'eg\'e.

artificial intelligence, information technology services, webprotégé, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3308560.3317707

1902.08251

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.27)
North America > United States > California > San Francisco County (0.15)

Genre: Research Report (0.50)

Industry: Information Technology > Services (0.62)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)

Add feedback

The Variable Quality of Metadata About Biological Samples Used in Biomedical Experiments

Gonçalves, Rafael S., Musen, Mark A.

arXiv.org Artificial IntelligenceAug-17-2018

We present an analytical study of the quality of metadata about samples used in biomedical experiments. The metadata under analysis are stored in two well- known databases: BioSample---a repository managed by the National Center for Biotechnology Information (NCBI), and BioSamples---a repository managed by the European Bioinformatics Institute (EBI). We tested whether 11.4M sample metadata records in the two repositories are populated with values that fulfill the stated requirements for such values. Our study revealed multiple anomalies in the metadata. Most metadata field names and their values are not standardized or controlled. Even simple binary or numeric fields are often populated with inadequate values of different data types. By clustering metadata field names, we discovered there are often many distinct ways to represent the same aspect of a sample. Overall, the metadata we analyzed reveal that there is a lack of principled mechanisms to enforce and validate metadata requirements. The significant aberrancies that we found in the metadata are likely to impede search and secondary use of the associated datasets.

immunology, internal medicine, metadata, (21 more...)

arXiv.org Artificial Intelligence

1808.06907

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.68)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Communications > Web (0.93)
Information Technology > Information Management > Metadata Management (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

NCBO Ontology Recommender 2.0: An Enhanced Approach for Biomedical Ontology Recommendation

Martinez-Romero, Marcos, Jonquet, Clement, O'Connor, Martin J., Graybeal, John, Pazos, Alejandro, Musen, Mark A.

arXiv.org Artificial IntelligenceMay-25-2017

Biomedical researchers use ontologies to annotate their data with ontology terms, enabling better data integration and interoperability. However, the number, variety and complexity of current biomedical ontologies make it cumbersome for researchers to determine which ones to reuse for their specific needs. To overcome this problem, in 2010 the National Center for Biomedical Ontology (NCBO) released the Ontology Recommender, which is a service that receives a biomedical text corpus or a list of keywords and suggests ontologies appropriate for referencing the indicated terms. We developed a new version of the NCBO Ontology Recommender. Called Ontology Recommender 2.0, it uses a new recommendation approach that evaluates the relevance of an ontology to biomedical text data according to four criteria: (1) the extent to which the ontology covers the input data; (2) the acceptance of the ontology in the biomedical community; (3) the level of detail of the ontology classes that cover the input data; and (4) the specialization of the ontology to the domain of the input data. Our evaluation shows that the enhanced recommender provides higher quality suggestions than the original approach, providing better coverage of the input data, more detailed information about their concepts, increased specialization for the domain of the input data, and greater acceptance and use in the community. In addition, it provides users with more explanatory information, along with suggestions of not only individual ontologies but also groups of ontologies. It also can be customized to fit the needs of different scenarios. Ontology Recommender 2.0 combines the strengths of its predecessor with a range of adjustments and new features that improve its reliability and usefulness. Ontology Recommender 2.0 recommends over 500 biomedical ontologies from the NCBO BioPortal platform, where it is openly available.

oncology, ontology, survey article, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1186/s13326-017-0128-y

1611.05973

Country:

Europe (1.00)
North America > United States > California > Santa Clara County (0.14)
North America > United States > Texas (0.14)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government (0.67)
Health & Medicine > Therapeutic Area > Oncology (0.47)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)

Add feedback

Discovering Beaten Paths in Collaborative Ontology-Engineering Projects using Markov Chains

Walk, Simon, Singer, Philipp, Strohmaier, Markus, Tudorache, Tania, Musen, Mark A., Noy, Natalya F.

arXiv.org Artificial IntelligenceFeb-29-2016

Biomedical taxonomies, thesauri and ontologies in the form of the International Classification of Diseases (ICD) as a taxonomy or the National Cancer Institute Thesaurus as an OWL-based ontology, play a critical role in acquiring, representing and processing information about human health. With increasing adoption and relevance, biomedical ontologies have also significantly increased in size. For example, the 11th revision of the ICD, which is currently under active development by the WHO contains nearly 50,000 classes representing a vast variety of different diseases and causes of death. This evolution in terms of size was accompanied by an evolution in the way ontologies are engineered. Because no single individual has the expertise to develop such large-scale ontologies, ontology-engineering projects have evolved from small-scale efforts involving just a few domain experts to large-scale projects that require effective collaboration between dozens or even hundreds of experts, practitioners and other stakeholders. Understanding how these stakeholders collaborate will enable us to improve editing environments that support such collaborations. We uncover how large ontology-engineering projects, such as the ICD in its 11th revision, unfold by analyzing usage logs of five different biomedical ontology-engineering projects of varying sizes and scopes using Markov chains. We discover intriguing interaction patterns (e.g., which properties users subsequently change) that suggest that large collaborative ontology-engineering projects are governed by a few general principles that determine and drive development. From our analysis, we identify commonalities and differences between different projects that have implications for project managers, ontology editors, developers and contributors working on collaborative ontology-engineering projects and tools in the biomedical domain.

oncology, ontology, us government, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.jbi.2014.06.004

1407.2002

Country:

Europe (1.00)
North America > United States > California > San Francisco County > San Francisco (0.14)

Industry:

Health & Medicine > Therapeutic Area > Oncology (0.86)
Health & Medicine > Health Care Providers & Services (0.55)
Government > Regional Government > North America Government > United States Government (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.72)

Add feedback

Ontology Quality Assurance with the Crowd

Mortensen, Jonathan M. (Stanford University) | Musen, Mark A. (Stanford University) | Noy, Natalya F. (Stanford University)

AAAI ConferencesNov-5-2013

The Semantic Web has the potential to change the Web as we know it. However, the community faces a significant challenge in managing, aggregating, and curating the massive amount of data and knowledge. Human computation is only beginning to serve an essential role in the curation of these Web-based data. Ontologies, which facilitate data integration and search, serve as a central component of the Semantic Web, but they are large, complex, and typically require extensive expert curation. Furthermore, ontology-engineering tasks require more knowledge than is required in a typical crowdsourcing-task. We have developed ontology-engineering methods that leverage the crowd. In this work, we describe our general crowdsourcing workflow. We then highlight our work on applying this workflow to ontology verification and quality assurance. In a pilot study, this method approaches expert ability, finding the same errors that experts identified with 86% accuracy in a faster and more scalable fashion. The work provides a general framework with which to develop crowdsourcing methods for the Semantic Web. In addition, it highlights opportunities for future research in human computation and crowdsourcing.

ontology quality assurance

AAAI Conferences

First AAAI Conference on Human Computation and Crowdsourcing

Genre: Workflow (0.53)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)

Add feedback

Graph-Grammar Assistance for Automated Generation of Influence Diagrams

Egar, John W., Musen, Mark A.

arXiv.org Artificial IntelligenceMar-6-2013

One of the most difficult aspects of modeling complex dilemmas in decision-analytic terms is composing a diagram of relevance relations from a set of domain concepts. Decision models in domains such as medicine, however, exhibit certain prototypical patterns that can guide the modeling process. Medical concepts can be classified according to semantic types that have characteristic positions and typical roles in an influence-diagram model. We have developed a graph-grammar production system that uses such inherent interrelationships among medical terms to facilitate the modeling of medical decisions.

bayesian inference, grammar, health & medicine, (18 more...)

arXiv.org Artificial Intelligence

1303.1482

Country:

North America > United States > California > Santa Clara County (0.14)
North America > United States > California > San Mateo County (0.14)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.87)

Add feedback

Pragmatic Analysis of Crowd-Based Knowledge Production Systems with iCAT Analytics: Visualizing Changes to the ICD-11 Ontology

Pöschko, Jan (Graz University of Technology) | Strohmaier, Markus (Graz University of Technology) | Tudorache, Tania (Stanford University) | Noy, Natalya F. (Stanford University) | Musen, Mark A. (Stanford University)

AAAI ConferencesMar-25-2012

While in the past taxonomic and ontological knowledge was traditionally produced by small groups of co-located experts, today the production of such knowledge has a radically different shape and form. For example, potentially thousands of health professionals, scientists, and ontology experts will collaboratively construct, evaluate and maintain the most recent version of the International Classification of Diseases (ICD-11), a large ontology of diseases and causes of deaths managed by the World Health Organization. In this work, we present a novel web-based tool — iCAT Analytics — that allows to investigate systematically crowd-based processes in knowledge-production systems. To enable such investigation, the tool supports interactive exploration of pragmatic aspects of ontology engineering such as how a given ontology evolved and the nature of changes, discussions and interactions that took place during its production process. While iCAT Analytics was motivated by ICD-11, it could potentially be applied to any crowd-based ontology-engineering project. We give an introduction to the features of iCAT Analytics and present some insights specifically for ICD-11.

health & medicine, ontology, survey article, (17 more...)

AAAI Conferences

2012 AAAI Spring Symposium Series

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Overview (0.67)

Industry: Health & Medicine > Health Care Providers & Services (0.54)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)

Add feedback