Goto

Collaborating Authors

 Yue, Songhui


A Data-to-Product Multimodal Conceptual Framework to Achieve Automated Software Evolution for Context-rich Intelligent Applications

arXiv.org Artificial Intelligence

With the advancements of Artifical Intelligence (AI) and Natural Language Processing (NLP) in the past decades, especially the rising of Large Language Model (LLM) and multimodality learning, softwrare engineering fields welcome AI techniques to be employed to every aspects of software cycles. Meanwhile, the research of intelligent applications has continuously been a hotspot (Zhao et al., 2021) because of the increasing amount of data of multimodalities generated in various domains. This type of software is designed to adapt to constantly changing scenarios of rich context (Zhao et al., 2021; Yue and Smith, 2021), and some examples are listed in part C of figure 1. One primary characteristic of those applications is that a great portion of their system behaviors is learned from continuous interaction with the users and environment involving detection and analysis of states and activities (Tzafestas, 2012; Yang and Newman, 2013; Cassavia et al., 2017), unlike applications of banking or insurance with more matured and stable business logic. The rapid evolution of hardware and software wheels bring more capabilities to intelligent applications meanwhile making the creation and maintenance of that software more intricate (Chu et al., 2021; Zheng et al., 2023), both fields of software engineering and intelligent applications are eager for breakthroughs in higher-level automation (HLA) - collaboratively resolving the challenges by benefiting from AI techniques.


Applying BioBERT to Extract Germline Gene-Disease Associations for Building a Knowledge Graph from the Biomedical Literature

arXiv.org Artificial Intelligence

Published biomedical information has and continues to rapidly increase. The recent advancements in Natural Language Processing (NLP), have generated considerable interest in automating the extraction, normalization, and representation of biomedical knowledge about entities such as genes and diseases. Our study analyzes germline abstracts in the construction of knowledge graphs of the of the immense work that has been done in this area for genes and diseases. This paper presents SimpleGermKG, an automatic knowledge graph construction approach that connects germline genes and diseases. For the extraction of genes and diseases, we employ BioBERT, a pre-trained BERT model on biomedical corpora. We propose an ontology-based and rule-based algorithm to standardize and disambiguate medical terms. For semantic relationships between articles, genes, and diseases, we implemented a part-whole relation approach to connect each entity with its data source and visualize them in a graph-based knowledge representation. Lastly, we discuss the knowledge graph applications, limitations, and challenges to inspire the future research of germline corpora. Our knowledge graph contains 297 genes, 130 diseases, and 46,747 triples. Graph-based visualizations are used to show the results.


CSM-H-R: A Context Modeling Framework in Supporting Reasoning Automation for Interoperable Intelligent Systems and Privacy Protection

arXiv.org Artificial Intelligence

Automation of High-Level Context (HLC) reasoning for intelligent systems at scale is imperative due to the unceasing accumulation of contextual data in the IoT era, the trend of the fusion of data from multi-sources, and the intrinsic complexity and dynamism of the context-based decision-making process. To mitigate this issue, we propose an automatic context reasoning framework CSM-H-R, which programmatically combines ontologies and states at runtime and the model-storage phase for attaining the ability to recognize meaningful HLC, and the resulting data representation can be applied to different reasoning techniques. Case studies are developed based on an intelligent elevator system in a smart campus setting. An implementation of the framework - a CSM Engine, and the experiments of translating the HLC reasoning into vector and matrix computing especially take care of the dynamic aspects of context and present the potentiality of using advanced mathematical and probabilistic models to achieve the next level of automation in integrating intelligent systems; meanwhile, privacy protection support is achieved by anonymization through label embedding and reducing information correlation. The code of this study is available at: https://github.com/songhui01/CSM-H-R.


Using Twitter Data to Determine Hurricane Category: An Experiment

arXiv.org Artificial Intelligence

Social media posts contain an abundant amount of information about public opinion on major events, especially natural disasters such as hurricanes. Posts related to an event, are usually published by the users who live near the place of the event at the time of the event. Special correlation between the social media data and the events can be obtained using data mining approaches. This paper presents research work to find the mappings between social media data and the severity level of a disaster. Specifically, we have investigated the Twitter data posted during hurricanes Harvey and Irma, and attempted to find the correlation between the Twitter data of a specific area and the hurricane level in that area. Our experimental results indicate a positive correlation between them. We also present a method to predict the hurricane category for a specific area using relevant Twitter data.