Collaborating Authors

Data Quality

Data Governance That Works


As tools like generative AI become increasingly mainstream, the quality and accessibility of enterprise data has become more important than ever before. Many organizations are rethinking their data governance programs as a result. But according to Kevin Lewis, a leading data analytics expert at AWS, 90% of organizations make the same mistake when they start a data governance program. "Most organizations have a generic goal to make data easier for people to find or to improve data quality generally," Lewis says. "But it's a big mistake to focus on data governance in isolation rather than starting your data governance journey by identifying business initiatives that will prove transformational for the company."

The Principles of Data-Centric AI

Communications of the ACM

The role of data and its quality in supporting AI systems is gaining prominence and giving rise to the concept of data-centric AI (DCAI), which breaks away from widespread model-centric approaches. The flurry of conversation around DCAI can be credited to a recent campaign by Andrew Ng, an AI pioneer, and his colleagues. However, DCAI is a culmination of concerns and efforts around improving data quality in AI projects. DCAI can be understood as an emerging term for a wealth of preceding practices and research work around data quality that complements structured frameworks such as human-centered data science.4,5 As such, the nature of'data work' itself is not necessarily new.35

New strategies to manage clinical trial risk


It is essential for healthcare and pharmaceutical companies to be aware of both critical and non-critical risks when conducting quality clinical trials. However, managing both takes time and money -- resources that clinical teams are often strapped for. Additionally, the risks that organisations define at the start of the trial may change, meaning the data they need to collect will also change. In order to address these challenges, researchers must break down silos and create a centralised process for monitoring and managing risk. Many organisations are turning to risk-based quality management (RBQM) practices to make that happen.

The Julia Programming Language


Julia is designed from the ground up to be very good at numerical and scientific computing. This can be seen in the abundance of scientific tooling written in Julia, such as the state-of-the-art differential equations ecosystem (DifferentialEquations.jl), optimization tools (JuMP.jl Fast Fourier transforms (AbstractFFTs.jl), and much more. General purpose simulation frameworks are available for Scientific Machine Learning, Quantum computing and much more. Julia also offers a number of domain-specific ecosystems, such as in biology (BioJulia), operations research (JuMP Dev), image processing (JuliaImages), quantum physics (QuantumBFS), nonlinear dynamics (JuliaDynamics), quantitative economics (QuantEcon), astronomy (JuliaAstro) and ecology (EcoJulia).

Deep ADMM-Net for Compressive Sensing MRI

Neural Information Processing Systems

Compressive Sensing (CS) is an effective approach for fast Magnetic Resonance Imaging (MRI). It aims at reconstructing MR image from a small number of undersampled data in k-space, and accelerating the data acquisition in MRI. To improve the current MRI system in reconstruction accuracy and computational speed, in this paper, we propose a novel deep architecture, dubbed ADMM-Net. ADMM-Net is defined over a data flow graph, which is derived from the iterative procedures in Alternating Direction Method of Multipliers (ADMM) algorithm for optimizing a CS-based MRI model. In the training phase, all parameters of the net, e.g., image transforms, shrinkage functions, etc., are discriminatively trained end-to-end using L-BFGS algorithm. In the testing phase, it has computational overhead similar to ADMM but uses optimized parameters learned from the training data for CS-based reconstruction task. Experiments on MRI image reconstruction under different sampling ratios in k-space demonstrate that it significantly improves the baseline ADMM algorithm and achieves high reconstruction accuracies with fast computational speed.

Blind Phase Retrieval via Convex Programming

Neural Information Processing Systems

We consider the task of recovering two real or complex m-vectors from phaseless Fourier measurements of their circular convolution. Our method is a novel convex relaxation that is based on a lifted matrix recovery formulation that allows a nontrivial convex relaxation of the bilinear measurements from convolution.

Thwarting Adversarial Examples: An L-Robust Sparse Fourier Transform ∗ Jack Murtagh

Neural Information Processing Systems

Our techniques generalize to a wide range of linear transformations that are used in data analysis such as the Discrete Cosine and Sine transforms, the Hadamard transform, and their high-dimensional analogs.

Testing for Families of Distributions via the Fourier Transform

Neural Information Processing Systems

We study the general problem of testing whether an unknown discrete distribution belongs to a specified family of distributions. More specifically, given a distribution family P and sample access to an unknown discrete distribution P, we want to distinguish (with high probability) between the case that P P and the case that P is ɛ-far, in total variation distance, from every distribution in P. This is the prototypical hypothesis testing problem that has received significant attention in statistics and, more recently, in computer science. The main contribution of this work is a simple and general testing technique that is applicable to all distribution families whose Fourier spectrum satisfies a certain approximate sparsity property. We apply our Fourier-based framework to obtain near sample-optimal and computationally efficient testers for the following fundamental distribution families: Sums of Independent Integer Random Variables (SIIRVs), Poisson Multinomial Distributions (PMDs), and Discrete Log-Concave Distributions. For the first two, ours are the first non-trivial testers in the literature, vastly generalizing previous work on testing Poisson Binomial Distributions. For the third, our tester improves on prior work in both sample and time complexity.

Multivariate Convolutional Sparse Coding for Electromagnetic Brain Signals Tom Dupré La Tour

Neural Information Processing Systems

Frequency-specific patterns of neural activity are traditionally interpreted as sustained rhythmic oscillations, and related to cognitive mechanisms such as attention, high level visual processing or motor control. While alpha waves (8-12 Hz) are known to closely resemble short sinusoids, and thus are revealed by Fourier analysis or wavelet transforms, there is an evolving debate that electromagnetic neural signals are composed of more complex waveforms that cannot be analyzed by linear filters and traditional signal representations. In this paper, we propose to learn dedicated representations of such recordings using a multivariate convolutional sparse coding (CSC) algorithm. Applied to electroencephalography (EEG) or magnetoencephalography (MEG) data, this method is able to learn not only prototypical temporal waveforms, but also associated spatial patterns so their origin can be localized in the brain. Our algorithm is based on alternated minimization and a greedy coordinate descent solver that leads to state-of-the-art running time on long time series. To demonstrate the implications of this method, we apply it to MEG data and show that it is able to recover biological artifacts. More remarkably, our approach also reveals the presence of non-sinusoidal mu-shaped patterns, along with their topographic maps related to the somatosensory cortex.

Joint Sub-bands Learning with Clique Structures for Wavelet Domain Super-Resolution Yibo Yang

Neural Information Processing Systems

Convolutional neural networks (CNNs) have recently achieved great success in single-image super-resolution (SISR). However, these methods tend to produce over-smoothed outputs and miss some textural details. To solve these problems, we propose the Super-Resolution CliqueNet (SRCliqueNet) to reconstruct the high resolution (HR) image with better textural details in the wavelet domain. The proposed SRCliqueNet firstly extracts a set of feature maps from the low resolution (LR) image by the clique blocks group. Then we send the set of feature maps to the clique up-sampling module to reconstruct the HR image. The clique upsampling module consists of four sub-nets which predict the high resolution wavelet coefficients of four sub-bands. Since we consider the edge feature properties of four sub-bands, the four sub-nets are connected to the others so that they can learn the coefficients of four sub-bands jointly. Finally we apply inverse discrete wavelet transform (IDWT) to the output of four sub-nets at the end of the clique up-sampling module to increase the resolution and reconstruct the HR image. Extensive quantitative and qualitative experiments on benchmark datasets show that our method achieves superior performance over the state-of-the-art methods.