Goto

Collaborating Authors

 synthesized


When Tom Eats Kimchi: Evaluating Cultural Bias of Multimodal Large Language Models in Cultural Mixture Contexts

arXiv.org Artificial Intelligence

In a highly globalized world, it is important for multi-modal large language models (MLLMs) to recognize and respond correctly to mixed-cultural inputs. For example, a model should correctly identify kimchi (Korean food) in an image both when an Asian woman is eating it, as well as an African man is eating it. However, current MLLMs show an over-reliance on the visual features of the person, leading to misclassification of the entities. To examine the robustness of MLLMs to different ethnicity, we introduce MixCuBe, a cross-cultural bias benchmark, and study elements from five countries and four ethnicities. Our findings reveal that MLLMs achieve both higher accuracy and lower sensitivity to such perturbation for high-resource cultures, but not for low-resource cultures. GPT-4o, the best-performing model overall, shows up to 58% difference in accuracy between the original and perturbed cultural settings in low-resource cultures. Our dataset is publicly available at: https://huggingface.co/datasets/kyawyethu/MixCuBe.


Development and Validation of the Provider Documentation Summarization Quality Instrument for Large Language Models

arXiv.org Artificial Intelligence

As Large Language Models (LLMs) are integrated into electronic health record (EHR) workflows, validated instruments are essential to evaluate their performance before implementation. Existing instruments for provider documentation quality are often unsuitable for the complexities of LLM-generated text and lack validation on real-world data. The Provider Documentation Summarization Quality Instrument (PDSQI-9) was developed to evaluate LLM-generated clinical summaries. Multi-document summaries were generated from real-world EHR data across multiple specialties using several LLMs (GPT-4o, Mixtral 8x7b, and Llama 3-8b). Validation included Pearson correlation for substantive validity, factor analysis and Cronbach's alpha for structural validity, inter-rater reliability (ICC and Krippendorff's alpha) for generalizability, a semi-Delphi process for content validity, and comparisons of high- versus low-quality summaries for discriminant validity. Seven physician raters evaluated 779 summaries and answered 8,329 questions, achieving over 80% power for inter-rater reliability. The PDSQI-9 demonstrated strong internal consistency (Cronbach's alpha = 0.879; 95% CI: 0.867-0.891) and high inter-rater reliability (ICC = 0.867; 95% CI: 0.867-0.868), supporting structural validity and generalizability. Factor analysis identified a 4-factor model explaining 58% of the variance, representing organization, clarity, accuracy, and utility. Substantive validity was supported by correlations between note length and scores for Succinct (rho = -0.200, p = 0.029) and Organized (rho = -0.190, p = 0.037). Discriminant validity distinguished high- from low-quality summaries (p < 0.001). The PDSQI-9 demonstrates robust construct validity, supporting its use in clinical practice to evaluate LLM-generated summaries and facilitate safer integration of LLMs into healthcare workflows.


AI-driven platform identifies and remediates biases in data - i4.0 today

#artificialintelligence

The Community Edition is one part of Synthesized's data platform. The complete platform uses AI to automate all stages of data provisioning; the process of making data available in an orderly and secure way. This level of automation enables organisations to generate synthesized datasets, allowing them to better test data for new products and tools, validate mathematical models, or train machine learning models. Synthesized completely removes the heavy and costly burden of finding, collecting, and preparing data. Gartner estimates that data scientists and test engineers currently waste up to 80% of their valuable time on such repetitive tasks.


Equitable tech: AI-enabled platform to reduce bias in datasets released

#artificialintelligence

On Wednesday, London-based Synthesized launched a platform to help organizations identify and rectify biases in their data. Synthesized touts the platform as the "first publicly available solution to accurately detect and remove biases in data." A "freemium" Community Edition of the platform designed to mitigate bias in data is now available. "The reputational risk of all organisations is under threat due to biased data, and we've seen this will no longer be tolerated at any level. It's a burning priority now and must be dealt with as a matter of urgency, both from a legal and ethical standpoint," said Nicolai Baldin, CEO and founder of Synthesized in a press release.


Tractable Monotone Temporal Planning

AAAI Conferences

This paper describes a polynomially-solvable sub-problem of temporal planning. Polynomiality follows from two assumptions. Firstly, by supposing that each sub-goal fluent can be established by at most one action, we can quickly determine which actions are necessary in any plan. Secondly, the monotonicity of sub-goal fluents allows us to express planning as an instance of STP≠ (Simple Temporal Problem, difference constraints). Our class includes temporally-expressive problems, which we illustrate with an example of chemical process planning.