Trend Filtered Mixture of Experts for Automated Gating of High-Frequency Flow Cytometry Data
Hyun, Sangwon, Coleman, Tim, Ribalet, Francois, Bien, Jacob
Ocean microbes are critical to both ocean ecosystems and the global climate. Flow cytometry, which measures cell optical properties in fluid samples, is routinely used in oceanographic research. Despite decades of accumulated data, identifying key microbial populations (a process known as ``gating'') remains a significant analytical challenge. To address this, we focus on gating multidimensional, high-frequency flow cytometry data collected {\it continuously} on board oceanographic research vessels, capturing time- and space-wise variations in the dynamic ocean. Our paper proposes a novel mixture-of-experts model in which both the gating function and the experts are given by trend filtering. The model leverages two key assumptions: (1) Each snapshot of flow cytometry data is a mixture of multivariate Gaussians and (2) the parameters of these Gaussians vary smoothly over time. Our method uses regularization and a constraint to ensure smoothness and that cluster means match biologically distinct microbe types. We demonstrate, using flow cytometry data from the North Pacific Ocean, that our proposed model accurately matches human-annotated gating and corrects significant errors.
Proximal Inference on Population Intervention Indirect Effect
Bai, Yang, Cui, Yifan, Sun, Baoluo
Additionally, experiments have shown that depersonalization symptoms can arise as a reaction to alcohol consumption (Raimo et al., 1999), and they are increasingly recognized as a significant prognostic factor in the course of depression (Michal et al., 2024). Despite these findings, little research has explored the mediating role of depersonalization symptoms in the causal pathway from alcohol consumption to depression. In this paper, we propose a methodological framework to evaluate the indirect effect of alcohol consumption on depression, with depersonalization acting as a mediator. To ground our analysis, we use data from a cross-sectional survey conducted during the COVID-19 pandemic by Dom ฤฑnguez-Espinosa et al. (2023) as a running example. In observational studies, the population average causal effect (ACE) and the natural indirect effect (NIE) are the most commonly used measures of total and mediation effects, respectively, to compare the outcomes of different intervention policies. For instance, in our running example, these two measures compare the depression outcomes between individuals engaging in hazardous versus non-hazardous alcohol consumption. However, clinical practice imposes ethical constraints, as healthcare professionals would not prescribe harmful levels of alcohol consumption. As a result, hypothetical interventions involving dangerous exposure levels are unrealistic. To address this situation with potentially harmful exposure, Hubbard and Van der Laan (2008) propose the population intervention effect (PIE), which contrasts outcomes between the natural population and a hypothetical population where no one is exposed to the harmful exposure level.
Discrimination-free Insurance Pricing with Privatized Sensitive Attributes
Zhang, Tianhe, Liu, Suhan, Shi, Peng
Fairness has emerged as a critical consideration in the landscape of machine learning algorithms, particularly as AI continues to transform decision-making across societal domains. To ensure that these algorithms are free from bias and do not discriminate against individuals based on sensitive attributes such as gender and race, the field of algorithmic bias has introduced various fairness concepts, along with methodologies to achieve these notions in different contexts. Despite the rapid advancement, not all sectors have embraced these fairness principles to the same extent. One specific sector that merits attention in this regard is insurance. Within the realm of insurance pricing, fairness is defined through a distinct and specialized framework. Consequently, achieving fairness according to established notions does not automatically ensure fair pricing in insurance. In particular, regulators are increasingly emphasizing transparency in pricing algorithms and imposing constraints on insurance companies on the collection and utilization of sensitive consumer attributes. These factors present additional challenges in the implementation of fairness in pricing algorithms. To address these complexities and comply with regulatory demands, we propose an efficient method for constructing fair models that are tailored to the insurance domain, using only privatized sensitive attributes. Notably, our approach ensures statistical guarantees, does not require direct access to sensitive attributes, and adapts to varying transparency requirements, addressing regulatory demands while ensuring fairness in insurance pricing.
Bayesian Density-Density Regression with Application to Cell-Cell Communications
Nguyen, Khai, Ni, Yang, Mueller, Peter
We introduce a scalable framework for regressing multivariate distributions onto multivariate distributions, motivated by the application of inferring cell-cell communication from population-scale single-cell data. The observed data consist of pairs of multivariate distributions for ligands from one cell type and corresponding receptors from another. For each ordered pair $e=(l,r)$ of cell types $(l \neq r)$ and each sample $i = 1, \ldots, n$, we observe a pair of distributions $(F_{ei}, G_{ei})$ of gene expressions for ligands and receptors of cell types $l$ and $r$, respectively. The aim is to set up a regression of receptor distributions $G_{ei}$ given ligand distributions $F_{ei}$. A key challenge is that these distributions reside in distinct spaces of differing dimensions. We formulate the regression of multivariate densities on multivariate densities using a generalized Bayes framework with the sliced Wasserstein distance between fitted and observed distributions. Finally, we use inference under such regressions to define a directed graph for cell-cell communications.
Meta-Dependence in Conditional Independence Testing
Mazaheri, Bijan, Zhang, Jiaqi, Uhler, Caroline
Constraint-based causal discovery algorithms utilize many statistical tests for conditional independence to uncover networks of causal dependencies. These approaches to causal discovery rely on an assumed correspondence between the graphical properties of a causal structure and the conditional independence properties of observed variables, known as the causal Markov condition and faithfulness. Finite data yields an empirical distribution that is "close" to the actual distribution. Across these many possible empirical distributions, the correspondence to the graphical properties can break down for different conditional independencies, and multiple violations can occur at the same time. We study this "meta-dependence" between conditional independence properties using the following geometric intuition: each conditional independence property constrains the space of possible joint distributions to a manifold. The "meta-dependence" between conditional independences is informed by the position of these manifolds relative to the true probability distribution. We provide a simple-to-compute measure of this meta-dependence using information projections and consolidate our findings empirically using both synthetic and real-world data.
Predictive Multiplicity in Survival Models: A Method for Quantifying Model Uncertainty in Predictive Maintenance Applications
In many applications, especially those involving prediction, models may yield near-optimal performance yet significantly disagree on individual-level outcomes. This phenomenon, known as predictive multiplicity, has been formally defined in binary, probabilistic, and multi-target classification, and undermines the reliability of predictive systems. However, its implications remain unexplored in the context of survival analysis, which involves estimating the time until a failure or similar event while properly handling censored data. We frame predictive multiplicity as a critical concern in survival-based models and introduce formal measures -- ambiguity, discrepancy, and obscurity -- to quantify it. This is particularly relevant for downstream tasks such as maintenance scheduling, where precise individual risk estimates are essential. Understanding and reporting predictive multiplicity helps build trust in models deployed in high-stakes environments. We apply our methodology to benchmark datasets from predictive maintenance, extending the notion of multiplicity to survival models. Our findings show that ambiguity steadily increases, reaching up to 40-45% of observations; discrepancy is lower but exhibits a similar trend; and obscurity remains mild and concentrated in a few models. These results demonstrate that multiple accurate survival models may yield conflicting estimations of failure risk and degradation progression for the same equipment. This highlights the need to explicitly measure and communicate predictive multiplicity to ensure reliable decision-making in process health management.
NVIDIA says the US has put export restrictions on H20 AI chips
According to an SEC filing from NVIDIA, the US government now requires companies to obtain a license to export H20 integrated circuits and any other products that achieve the same performance benchmarks. The filing states that "the license requirement addresses the risk that the covered products may be used in, or diverted to, a supercomputer in China." Mainland China is not the only place targeted by this license; NVIDIA will also require permission to sell the H20 to the territories of Hong Kong and Macau as well as to nations with the D:5 designation as US Arms Embargo Countries. The H20 chips are currently the most advanced chips that can be sold to select international markets under present laws and they are powerful enough to be used for artificial intelligence applications. NVIDIA has wanted the ability to retain Chinese customers for these products and last week, it seemed like the company may have gotten a reprieve on new restrictions.
Nvidias RTX 5060 GPUs get release date, 299 staring price
Nvidia has announced that its 5060 line of GeForce RTX GPUs will be available for purchase beginning on April 16. The line includes two models: The RTX 5060 Ti, which comes in 8GB and 16GB VRAM variants, and the RTX 5060, which will only come in an 8GB VRAM option. With the RTX 5060 and RTX 5060 Ti, Nvidia is making high-tech features more accessible for gamers who can't afford a 2,000 GPU. In particular, the new 5060 GPUs offer DLSS 4 support and full ray tracing capabilities for enhanced visuals and higher frame rates. "The RTX 5060 family offers gamers next-generation performance and AI-enhanced visuals starting at 299," said Vice President of GeForce Marketing Matt Wuebbling in an Nvidia press release.
Is OpenAI building a social network for ChatGPTs viral image generator?
OpenAI is reportedly working on a social media prototype for sharing images generated by ChatGPT. According to The Verge, who spoke with anonymous sources "familiar with the matter," OpenAI is working on a social media feed akin to X that would host ChatGPT-generated images created by users. This would reportedly serve two goals: boosting visibility of ChatGPT's now-viral image generator and serving as a source for real-time user data. The Verge also reports that OpenAI CEO Sam Altman has been privately asking for feedback on such a tool. It's unclear whether the social feed would be a standalone social networking app or integrated with ChatGPT, similar to the feed of user-generated images on Midjourney.
Deep sea craft filmed unprecedented footage of a colossal squid
Scientists previously captured rare footage of a giant squid. Now, they've filmed another huge squid species -- the colossal squid. The first specimens of the colossal squid (Mesonychoteuthis hamiltoni) were formally described by biologists a century ago, in 1925. These deep sea dwellers, which live exclusively in Antarctic waters, are rarely seen, so they're largely mysterious. But the Schmidt Ocean Institute, a well-traveled ocean exploration group, has used a high-tech robot to film the first-ever confirmed footage of colossal squid in its natural and remote marine environs.