Goto

Collaborating Authors

 Moldova




A Diagnosis and Treatment of Liver Diseases: Integrating Batch Processing, Rule-Based Event Detection and Explainable Artificial Intelligence

arXiv.org Artificial Intelligence

Liver diseases pose a significant global health burden, impacting many individuals and having substantial economic and social consequences. Rising liver problems are considered a fatal disease in many countries, such as Egypt and Moldova. This study aims to develop a diagnosis and treatment model for liver disease using Basic Formal Ontology (BFO), Patient Clinical Data (PCD) ontology, and detection rules derived from a decision tree algorithm. For the development of the ontology, the National Viral Hepatitis Control Program (NVHCP) guidelines were used, which made the ontology more accurate and reliable. The Apache Jena framework uses batch processing to detect events based on these rules. Based on the event detected, queries can be directly processed using SPARQL. We convert these Decision Tree (DT) and medical guidelines-based rules into Semantic Web Rule Language (SWRL) to operationalize the ontology. Using this SWRL in the ontology to predict different types of liver disease with the help of the Pellet and Drools inference engines in Protege Tools, a total of 615 records were taken from different liver diseases. After inferring the rules, the result can be generated for the patient according to the rules, and other patient-related details, along with different precautionary suggestions, can be obtained based on these results. These rules can make suggestions more accurate with the help of Explainable Artificial Intelligence (XAI) with open API-based suggestions. When the patient has prescribed a medical test, the model accommodates this result using optical character recognition (OCR), and the same process applies when the patient has prescribed a further medical suggestion according to the test report. These models combine to form a comprehensive Decision Support System (DSS) for the diagnosis of liver disease.


Counterfactual Memorization in Neural Language Models Chiyuan Zhang Daphne Ippolito Katherine Lee Google Research Carnegie Mellon University Google DeepMind

Neural Information Processing Systems

Modern neural language models that are widely used in various NLP tasks risk memorizing sensitive information from their training data. Understanding this memorization is important in real world applications and also from a learningtheoretical perspective. An open question in previous studies of language model memorization is how to filter out "common" memorization. In fact, most memorization criteria strongly correlate with the number of occurrences in the training set, capturing memorized familiar phrases, public knowledge, templated texts, or other repeated data. We formulate a notion of counterfactual memorization which characterizes how a model's predictions change if a particular document is omitted during training. We identify and study counterfactually-memorized training examples in standard text datasets. We estimate the influence of each memorized training example on the validation set and on generated texts, showing how this can provide direct evidence of the source of memorization at test time.



RoLargeSum: A Large Dialect-Aware Romanian News Dataset for Summary, Headline, and Keyword Generation

arXiv.org Artificial Intelligence

Using supervised automatic summarisation methods requires sufficient corpora that include pairs of documents and their summaries. Similarly to many tasks in natural language processing, most of the datasets available for summarization are in English, posing challenges for developing summarization models in other languages. Thus, in this work, we introduce RoLargeSum, a novel large-scale summarization dataset for the Romanian language crawled from various publicly available news websites from Romania and the Republic of Moldova that were thoroughly cleaned to ensure a high-quality standard. RoLargeSum contains more than 615K news articles, together with their summaries, as well as their headlines, keywords, dialect, and other metadata that we found on the targeted websites. We further evaluated the performance of several BART variants and open-source large language models on RoLargeSum for benchmarking purposes. We manually evaluated the results of the best-performing system to gain insight into the potential pitfalls of this data set and future development.


Unification of Balti and trans-border sister dialects in the essence of LLMs and AI Technology

arXiv.org Artificial Intelligence

The language called Balti belongs to the Sino-Tibetan, specifically the Tibeto-Burman language family. It is understood with variations, across populations in India, China, Pakistan, Nepal, Tibet, Burma, and Bhutan, influenced by local cultures and producing various dialects. Considering the diverse cultural, socio-political, religious, and geographical impacts, it is important to step forward unifying the dialects, the basis of common root, lexica, and phonological perspectives, is vital. In the era of globalization and the increasingly frequent developments in AI technology, understanding the diversity and the efforts of dialect unification is important to understanding commonalities and shortening the gaps impacted by unavoidable circumstances. This article analyzes and examines how artificial intelligence AI in the essence of Large Language Models LLMs, can assist in analyzing, documenting, and standardizing the endangered Balti Language, based on the efforts made in different dialects so far.


Moldova's Russia-backed Transnistria region claims drone attacked military unit

FOX News

Fox News Flash top headlines are here. Check out what's clicking on Foxnews.com. State security services in Moldova's Russia-backed breakaway region of Transnistria said Friday that a drone attacked a military unit close to the border with Ukraine, causing minor damage to a radar station but no casualties. The incident occurred in the region of Rabnita, about 4 miles from the Ukraine border, the region's state security ministry said, adding that a criminal investigation has been opened. They did not say who they thought was behind the alleged attack.


CP Regulatory Data Manager Romania and Moldova

#artificialintelligence

Find open roles in Artificial Intelligence (AI), Machine Learning (ML), Natural Language Processing (NLP), Computer Vision (CV), Data Engineering, Data Analytics, Big Data, and Data Science in general, filtered by job title or popular skill, toolset and products used.


Counterfactual Memorization in Neural Language Models

arXiv.org Artificial Intelligence

Modern neural language models widely used in tasks across NLP risk memorizing sensitive information from their training data. As models continue to scale up in parameters, training data, and compute, understanding memorization in language models is both important from a learning-theoretical point of view, and is practically crucial in real world applications. An open question in previous studies of memorization in language models is how to filter out "common" memorization. In fact, most memorization criteria strongly correlate with the number of occurrences in the training set, capturing "common" memorization such as familiar phrases, public knowledge or templated texts. In this paper, we provide a principled perspective inspired by a taxonomy of human memory in Psychology. From this perspective, we formulate a notion of counterfactual memorization, which characterizes how a model's predictions change if a particular document is omitted during training. We identify and study counterfactually-memorized training examples in standard text datasets. We further estimate the influence of each training example on the validation set and on generated texts, and show that this can provide direct evidence of the source of memorization at test time.