AITopics | Schlegel, Viktor

Collaborating Authors

Schlegel, Viktor

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Generating Synthetic Data with Formal Privacy Guarantees: State of the Art and the Road Ahead

Schlegel, Viktor, Bharath, Anil A, Zhao, Zilong, Yee, Kevin

arXiv.org Artificial IntelligenceMar-26-2025

Privacy-preserving synthetic data offers a promising solution to harness segregated data in high-stakes domains where information is compartmentalized for regulatory, privacy, or institutional reasons. This survey provides a comprehensive framework for understanding the landscape of privacy-preserving synthetic data, presenting the theoretical foundations of generative models and differential privacy followed by a review of state-of-the-art methods across tabular data, images, and text. Our synthesis of evaluation approaches highlights the fundamental trade-off between utility for down-stream tasks and privacy guarantees, while identifying critical research gaps: the lack of realistic benchmarks representing specialized domains and insufficient empirical evaluations required to contextualise formal guarantees. Through empirical analysis of four leading methods on five real-world datasets from specialized domains, we demonstrate significant performance degradation under realistic privacy constraints ($\epsilon \leq 4$), revealing a substantial gap between results reported on general domain benchmarks and performance on domain-specific data. %Our findings highlight key challenges including unaccounted privacy leakage, insufficient empirical verification of formal guarantees, and a critical deficit of realistic benchmarks. These challenges underscore the need for robust evaluation frameworks, standardized benchmarks for specialized domains, and improved techniques to address the unique requirements of privacy-sensitive fields such that this technology can deliver on its considerable potential.

knowledge management, large language model, machine learning, (24 more...)

arXiv.org Artificial Intelligence

2503.20846

Country: North America > United States (0.93)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.86)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Technology (0.92)
(4 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Knowledge Management (1.00)
Information Technology > Information Management (1.00)
(9 more...)

Add feedback

BRIDGE: Bootstrapping Text to Control Time-Series Generation via Multi-Agent Iterative Optimization and Diffusion Modelling

Li, Hao, Huang, Yu-Hao, Xu, Chang, Schlegel, Viktor, Jiang, Ren-He, Batista-Navarro, Riza, Nenadic, Goran, Bian, Jiang

arXiv.org Artificial IntelligenceMar-5-2025

For example, realistic Time-series Generation (TSG) is a prominent synthetic medical electrocardiogram (ECG) patterns research area with broad applications in simulations, can be used to train medical residents (Hong & Chun, 2023), data augmentation, and counterfactual while simulating regional electricity usage can be used to analysis. While existing methods have shown stress test the power grid (Westgaard et al., 2021). Although promise in unconditional single-domain TSG, some remarkable works (Huang & Deng, 2023; Bao et al., real-world applications demand for cross-domain 2024) have been done for TSG, showing promising results approaches capable of controlled generation tailored in generating realistic and coherent time series (TS), most to domain-specific constraints and instancelevel of them focus on the basic setting--unconditional single requirements. In this paper, we argue that domain generation. However, in real application scenarios, text can provide semantic insights, domain information there are specific constraints or requirements for the generated and instance-specific temporal patterns, TS to be met, such as specifying domain-specific characteristics, to guide and improve TSG. We introduce "Text-incorporating prior knowledge (Yuan & Qiao, Controlled TSG", a task focused on generating realistic 2024), or satisfying operational constraints (Coletta et al., time series by incorporating textual descriptions.

data mining, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2503.02445

Country:

Asia (0.67)
North America > United States > New York (0.14)

Genre: Research Report > New Finding (0.92)

Industry:

Transportation > Passenger (0.93)
Energy > Power Industry (0.65)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.54)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Pay Attention to Real World Perturbations! Natural Robustness Evaluation in Machine Reading Comprehension

Wu, Yulong, Schlegel, Viktor, Batista-Navarro, Riza

arXiv.org Artificial IntelligenceFeb-23-2025

As neural language models achieve human-comparable performance on Machine Reading Comprehension (MRC) and see widespread adoption, ensuring their robustness in real-world scenarios has become increasingly important. Current robustness evaluation research, though, primarily develops synthetic perturbation methods, leaving unclear how well they reflect real life scenarios. Considering this, we present a framework to automatically examine MRC models on naturally occurring textual perturbations, by replacing paragraph in MRC benchmarks with their counterparts based on available Wikipedia edit history. Such perturbation type is natural as its design does not stem from an arteficial generative process, inherently distinct from the previously investigated synthetic approaches. In a large-scale study encompassing SQUAD datasets and various model architectures we observe that natural perturbations result in performance degradation in pre-trained encoder language models. More worryingly, these state-of-the-art Flan-T5 and Large Language Models (LLMs) inherit these errors. Further experiments demonstrate that our findings generalise to natural perturbations found in other more challenging MRC benchmarks. In an effort to mitigate these errors, we show that it is possible to improve the robustness to natural perturbations by training on naturally or synthetically perturbed examples, though a noticeable gap still remains compared to performance on unperturbed data.

computational linguistic, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2502.16523

Country:

Europe (1.00)
Asia > Middle East (0.92)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.87)

Industry:

Information Technology (1.00)
Government (1.00)
Banking & Finance (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning to Generate and Evaluate Fact-checking Explanations with Transformers

Feher, Darius, Khered, Abdullah, Zhang, Hao, Batista-Navarro, Riza, Schlegel, Viktor

arXiv.org Artificial IntelligenceOct-21-2024

In an era increasingly dominated by digital platforms, the spread of misinformation poses a significant challenge, highlighting the need for solutions capable of assessing information veracity. Our research contributes to the field of Explainable Artificial Antelligence (XAI) by developing transformer-based fact-checking models that contextualise and justify their decisions by generating human-accessible explanations. Importantly, we also develop models for automatic evaluation of explanations for fact-checking verdicts across different dimensions such as \texttt{(self)-contradiction}, \texttt{hallucination}, \texttt{convincingness} and \texttt{overall quality}. By introducing human-centred evaluation methods and developing specialised datasets, we emphasise the need for aligning Artificial Intelligence (AI)-generated explanations with human judgements. This approach not only advances theoretical knowledge in XAI but also holds practical implications by enhancing the transparency, reliability and users' trust in AI-driven fact-checking systems. Furthermore, the development of our metric learning models is a first step towards potentially increasing efficiency and reducing reliance on extensive manual assessment. Based on experimental results, our best performing generative model \textsc{ROUGE-1} score of 47.77, demonstrating superior performance in generating fact-checking explanations, particularly when provided with high-quality evidence. Additionally, the best performing metric learning model showed a moderately strong correlation with human judgements on objective dimensions such as \texttt{(self)-contradiction and \texttt{hallucination}, achieving a Matthews Correlation Coefficient (MCC) of around 0.7.}

explanation, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2410.15669

Country:

North America (1.00)
Europe > United Kingdom > England (0.46)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Experimental Study (0.68)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Government (1.00)
Media > News (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Representation Learning of Structured Data for Medical Foundation Models

Dwivedi, Vijay Prakash, Schlegel, Viktor, Liu, Andy T., Nguyen, Thanh-Tung, Kashyap, Abhinav Ramesh, Wei, Jeng, Yin, Wei-Hsian, Winkler, Stefan, Tan, Robby T.

arXiv.org Artificial IntelligenceOct-17-2024

Large Language Models (LLMs) have demonstrated remarkable performance across various domains, including healthcare. However, their ability to effectively represent structured non-textual data, such as the alphanumeric medical codes used in records like ICD-10 or SNOMED-CT, is limited and has been particularly exposed in recent research. This paper examines the challenges LLMs face in processing medical codes due to the shortcomings of current tokenization methods. As a result, we introduce the UniStruct architecture to design a multimodal medical foundation model of unstructured text and structured data, which addresses these challenges by adapting subword tokenization techniques specifically for the structured medical codes. Our approach is validated through model pre-training on both an extensive internal medical database and a public repository of structured medical records. Trained on over 1 billion tokens on the internal medical database, the proposed model achieves up to a 23% improvement in evaluation metrics, with around 2% gain attributed to our proposed tokenization. Additionally, when evaluated on the EHRSHOT public benchmark with a 1/1000 fraction of the pre-training data, the UniStruct model improves performance on over 42% of the downstream tasks. Our approach not only enhances the representation and generalization capabilities of patient-centric models but also bridges a critical gap in representation learning models' ability to handle complex structured medical data, alongside unstructured text.

large language model, machine learning, medical code, (17 more...)

arXiv.org Artificial Intelligence

2410.13351

Country: Asia (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.89)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Which Side Are You On? A Multi-task Dataset for End-to-End Argument Summarisation and Evaluation

Li, Hao, Wu, Yuping, Schlegel, Viktor, Batista-Navarro, Riza, Madusanka, Tharindu, Zahid, Iqra, Zeng, Jiayan, Wang, Xiaochi, He, Xinran, Li, Yizhi, Nenadic, Goran

arXiv.org Artificial IntelligenceJun-6-2024

With the recent advances of large language models (LLMs), it is no longer infeasible to build an automated debate system that helps people to synthesise persuasive arguments. Previous work attempted this task by integrating multiple components. In our work, we introduce an argument mining dataset that captures the end-to-end process of preparing an argumentative essay for a debate, which covers the tasks of claim and evidence identification (Task 1 ED), evidence convincingness ranking (Task 2 ECR), argumentative essay summarisation and human preference ranking (Task 3 ASR) and metric learning for automated evaluation of resulting essays, based on human feedback along argument quality dimensions (Task 4 SQE). Our dataset contains 14k examples of claims that are fully annotated with the various properties supporting the aforementioned tasks. We evaluate multiple generative baselines for each of these tasks, including representative LLMs. We find, that while they show promising results on individual tasks in our benchmark, their end-to-end performance on all four tasks in succession deteriorates significantly, both in automated measures as well as in human-centred evaluation. This challenge presented by our proposed dataset motivates future research on end-to-end argument mining and summarisation. The repository of this project is available at https://github.com/HarrywillDr/ArgSum-Datatset

computational linguistic, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2406.03151

Country:

North America > United States (0.67)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.64)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

M-QALM: A Benchmark to Assess Clinical Reading Comprehension and Knowledge Recall in Large Language Models via Question Answering

Subramanian, Anand, Schlegel, Viktor, Kashyap, Abhinav Ramesh, Nguyen, Thanh-Tung, Dwivedi, Vijay Prakash, Winkler, Stefan

arXiv.org Artificial IntelligenceJun-5-2024

There is vivid research on adapting Large Language Models (LLMs) to perform a variety of tasks in high-stakes domains such as healthcare. Despite their popularity, there is a lack of understanding of the extent and contributing factors that allow LLMs to recall relevant knowledge and combine it with presented information in the clinical and biomedical domain: a fundamental pre-requisite for success on down-stream tasks. Addressing this gap, we use Multiple Choice and Abstractive Question Answering to conduct a large-scale empirical study on 22 datasets in three generalist and three specialist biomedical sub-domains. Our multifaceted analysis of the performance of 15 LLMs, further broken down by sub-domain, source of knowledge and model architecture, uncovers success factors such as instruction tuning that lead to improved recall and comprehension. We further show that while recently proposed domain-adapted models may lack adequate knowledge, directly fine-tuning on our collected medical knowledge datasets shows encouraging results, even generalising to unseen specialist sub-domains. We complement the quantitative results with a skill-oriented manual error analysis, which reveals a significant gap between the models' capabilities to simply recall necessary knowledge and to integrate it with the presented context. To foster research and collaboration in this field we share M-QALM, our resources, standardised methodology, and evaluation results, with the research community to facilitate further advancements in clinical knowledge representation learning within language models.

large language model, machine learning, option text, (17 more...)

arXiv.org Artificial Intelligence

2406.03699

Country:

Europe (1.00)
Asia (0.67)
North America > United States > Michigan (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Education > Assessment & Standards > Student Performance (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Automated Clinical Coding for Outpatient Departments

Schlegel, Viktor, Kashyap, Abhinav Ramesh, Nguyen, Thanh-Tung, Yang, Tsung-Han, Dwivedi, Vijay Prakash, Yin, Wei-Hsian, Wei, Jeng, Winkler, Stefan

arXiv.org Artificial IntelligenceDec-24-2023

Computerised clinical coding approaches aim to automate the process of assigning a set of codes to medical records. While there is active research pushing the state of the art on clinical coding for hospitalized patients, the outpatient setting -- where doctors tend to non-hospitalised patients -- is overlooked. Although both settings can be formalised as a multi-label classification task, they present unique and distinct challenges, which raises the question of whether the success of inpatient clinical coding approaches translates to the outpatient setting. This paper is the first to investigate how well state-of-the-art deep learning-based clinical coding approaches work in the outpatient setting at hospital scale. To this end, we collect a large outpatient dataset comprising over 7 million notes documenting over half a million patients. We adapt four state-of-the-art clinical coding approaches to this setting and evaluate their potential to assist coders. We find evidence that clinical coding in outpatient settings can benefit from more innovations in popular inpatient coding benchmarks. A deeper analysis of the factors contributing to the success -- amount and form of data and choice of document representation -- reveals the presence of easy-to-solve examples, the coding of which can be completely automated with a low error rate.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2312.13533

Country:

Asia (0.28)
North America > United States (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.88)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PULSAR at MEDIQA-Sum 2023: Large Language Models Augmented by Synthetic Dialogue Convert Patient Dialogues to Medical Records

Schlegel, Viktor, Li, Hao, Wu, Yuping, Subramanian, Anand, Nguyen, Thanh-Tung, Kashyap, Abhinav Ramesh, Beck, Daniel, Zeng, Xiaojun, Batista-Navarro, Riza Theresa, Winkler, Stefan, Nenadic, Goran

arXiv.org Artificial IntelligenceJul-4-2023

This paper describes PULSAR, our system submission at the ImageClef 2023 MediQA-Sum task on summarising patient-doctor dialogues into clinical records. The proposed framework relies on domain-specific pre-training, to produce a specialised language model which is trained on task-specific natural data augmented by synthetic data generated by a black-box LLM. We find limited evidence towards the efficacy of domain-specific pre-training and data augmentation, while scaling up the language model yields the best performance gains. Our approach was ranked second and third among 13 submissions on task B of the challenge. Our code is available at https://github.com/yuping-wu/PULSAR.

data augmentation, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2307.02006

Country:

Europe > Greece (0.14)
Asia > Singapore (0.14)
Oceania > Australia (0.14)
Europe > United Kingdom (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Health Care Technology > Medical Record (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

PULSAR: Pre-training with Extracted Healthcare Terms for Summarising Patients' Problems and Data Augmentation with Black-box Large Language Models

Li, Hao, Wu, Yuping, Schlegel, Viktor, Batista-Navarro, Riza, Nguyen, Thanh-Tung, Kashyap, Abhinav Ramesh, Zeng, Xiaojun, Beck, Daniel, Winkler, Stefan, Nenadic, Goran

arXiv.org Artificial IntelligenceJun-5-2023

Medical progress notes play a crucial role in documenting a patient's hospital journey, including his or her condition, treatment plan, and any updates for healthcare providers. Automatic summarisation of a patient's problems in the form of a problem list can aid stakeholders in understanding a patient's condition, reducing workload and cognitive bias. BioNLP 2023 Shared Task 1A focuses on generating a list of diagnoses and problems from the provider's progress notes during hospitalisation. In this paper, we introduce our proposed approach to this task, which integrates two complementary components. One component employs large language models (LLMs) for data augmentation; the other is an abstractive summarisation LLM with a novel pre-training objective for generating the patients' problems summarised as a list. Our approach was ranked second among all submissions to the shared task. The performance of our model on the development and test datasets shows that our approach is more robust on unknown data, with an improvement of up to 3.1 points over the same size of the larger model.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2306.02754

Country:

Asia > Singapore (0.14)
Oceania > Australia (0.14)
Europe > United Kingdom (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.68)
Health & Medicine > Health Care Providers & Services (0.54)
Health & Medicine > Health Care Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback