Goto

Collaborating Authors

 synth


Disney advert banned for showing 'disturbing' severed body

BBC News

Disney advert banned for showing'disturbing' severed body A menacing Disney advert featuring a severed body has been banned by the advertising regulator, which said it was likely to frighten and cause distress to children. The Advertising Standards Authority (ASA) found the entertainment giant had broken its rules with its advert for the Predator Badlands film. Parents complained that the digital poster, which featured a large alien holding aloft the severed body of a smaller, human figure, was inappropriate and disturbing for young children. Disney said the severed body was actually that of a robot, and the fact it had been cut in two further emphasised its non-human nature. The advert, which was seen on the roadside in Giffnock, Glasgow, was promoting the Disney sci-fi film ahead of its release in November.


A Supplementary Material: Learning Compositional Rules via Neural Program Synthesis A.1 Experimental and computational details

Neural Information Processing Systems

All models were implemented in PyTorch. For all experiments, we report standard error below. Primitive rules map a word to a color (e.g. In a higher-order rule, the left hand side can be one or two variables and a word, and the right hand side can be any sequence of bracketed forms of those variables. Figure A.2 shows several example training grammars sampled from the meta-grammar.


Distributional Treatment Effect Estimation across Heterogeneous Sites via Optimal Transport

Bateni, Borna, Yuan, Yubai, Xu, Qi, Qu, Annie

arXiv.org Machine Learning

We propose a novel framework for synthesizing counterfactual treatment group data in a target site by integrating full treatment and control group data from a source site with control group data from the target. Departing from conventional average treatment effect estimation, our approach adopts a distributional causal inference perspective by modeling treatment and control as distinct probability measures on the source and target sites. We formalize the cross-site heterogeneity (effect modification) as a push-forward transformation that maps the joint feature-outcome distribution from the source to the target site. This transformation is learned by aligning the control group distributions between sites using an Optimal Transport-based procedure, and subsequently applied to the source treatment group to generate the synthetic target treatment distribution. Under general regularity conditions, we establish theoretical guarantees for the consistency and asymptotic convergence of the synthetic treatment group data to the true target distribution. Simulation studies across multiple data-generating scenarios and a real-world application to patient-derived xenograft data demonstrate that our framework robustly recovers the full distributional properties of treatment effects.




A Supplementary Material Learning Compositional Rules via Neural Program Synthesis

Neural Information Processing Systems

All models were implemented in PyTorch. For all experiments, we report standard error below. Primitive rules map a word to a color (e.g. In a higher-order rule, the left hand side can be one or two variables and a word, and the right hand side can be any sequence of bracketed forms of those variables. Figure A.2 shows several example training grammars sampled from the meta-grammar.


KLAAD: Refining Attention Mechanisms to Reduce Societal Bias in Generative Language Models

Kim, Seorin, Lee, Dongyoung, Lee, Jaejin

arXiv.org Artificial Intelligence

Large language models (LLMs) often exhibit societal biases in their outputs, prompting ethical concerns regarding fairness and harm. In this work, we propose KLAAD (KL-Attention Alignment Debiasing), an attention-based debiasing framework that implicitly aligns attention distributions between stereotypical and anti-stereotypical sentence pairs without directly modifying model weights. KLAAD introduces a composite training objective combining Cross-Entropy, KL divergence, and Triplet losses, guiding the model to consistently attend across biased and unbiased contexts while preserving fluency and coherence. Experimental evaluation of KLAAD demonstrates improved bias mitigation on both the BBQ and BOLD benchmarks, with minimal impact on language modeling quality. The results indicate that attention-level alignment offers a principled solution for mitigating bias in generative language models.


GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation

Chen, Zihong, Jiang, Wanli, Li, Jinzhe, Yuan, Zhonghang, Kong, Huanjun, Ouyang, Wanli, Dong, Nanqing

arXiv.org Artificial Intelligence

Fine-tuning for large language models (LLMs) typically requires substantial amounts of high-quality supervised data, which is both costly and labor-intensive to acquire. While synthetic data generation has emerged as a promising solution, existing approaches frequently suffer from factual inaccuracies, insufficient long-tail coverage, simplistic knowledge structures, and homogenized outputs. To address these challenges, we introduce GraphGen, a knowledge graph-guided framework designed for three key question-answering (QA) scenarios: atomic QA, aggregated QA, and multi-hop QA. It begins by constructing a fine-grained knowledge graph from the source text. It then identifies knowledge gaps in LLMs using the expected calibration error metric, prioritizing the generation of QA pairs that target high-value, long-tail knowledge. Furthermore, GraphGen incorporates multi-hop neighborhood sampling to capture complex relational information and employs style-controlled generation to diversify the resulting QA data. Experimental results on knowledge-intensive tasks under closed-book settings demonstrate that GraphGen outperforms conventional synthetic data methods, offering a more reliable and comprehensive solution to the data scarcity challenge in supervised fine-tuning. The code and data are publicly available at https://github.com/open-sciencelab/GraphGen.


TAMIS: Tailored Membership Inference Attacks on Synthetic Data

Andrey, Paul, Bars, Batiste Le, Tommasi, Marc

arXiv.org Machine Learning

Membership Inference Attacks (MIA) enable to empirically assess the privacy of a machine learning algorithm. In this paper, we propose TAMIS, a novel MIA against differentially-private synthetic data generation methods that rely on graphical models. This attack builds upon MAMA-MIA, a recently-published state-of-the-art method. It lowers its computational cost and requires less attacker knowledge. Our attack is the product of a two-fold improvement. First, we recover the graphical model having generated a synthetic dataset by using solely that dataset, rather than shadow-modeling over an auxiliary one. This proves less costly and more performant. Second, we introduce a more mathematically-grounded attack score, that provides a natural threshold for binary predictions. In our experiments, TAMIS achieves better or similar performance as MAMA-MIA on replicas of the SNAKE challenge.


Data Augmentation for Deep Learning Regression Tasks by Machine Learning Models

Shmuel, Assaf, Glickman, Oren, Lazebnik, Teddy

arXiv.org Artificial Intelligence

Deep learning (DL) models have gained prominence in domains such as computer vision and natural language processing but remain underutilized for regression tasks involving tabular data. In these cases, traditional machine learning (ML) models often outperform DL models. In this study, we propose and evaluate various data augmentation (DA) techniques to improve the performance of DL models for tabular data regression tasks. We compare the performance gain of Neural Networks by different DA strategies ranging from a naive method of duplicating existing observations and adding noise to a more sophisticated DA strategy that preserves the underlying statistical relationship in the data. Our analysis demonstrates that the advanced DA method significantly improves DL model performance across multiple datasets and regression tasks, resulting in an average performance increase of over 10\% compared to baseline models without augmentation. The efficacy of these DA strategies was rigorously validated across 30 distinct datasets, with multiple iterations and evaluations using three different automated deep learning (AutoDL) frameworks: AutoKeras, H2O, and AutoGluon. This study demonstrates that by leveraging advanced DA techniques, DL models can realize their full potential in regression tasks, thereby contributing to broader adoption and enhanced performance in practical applications.