AITopics | Mukherjee, Sumit

Collaborating Authors

Mukherjee, Sumit

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Synthesizing Proton-Density Fat Fraction and $R_2^*$ from 2-point Dixon MRI with Generative Machine Learning

Anand, Suma, Xu, Kaiwen, O'Dushlaine, Colm, Mukherjee, Sumit

arXiv.org Artificial IntelligenceOct-14-2024

Magnetic Resonance Imaging (MRI) is the gold standard for measuring fat and iron content non-invasively in the body via measures known as Proton Density Fat Fraction (PDFF) and $R_2^*$, respectively. However, conventional PDFF and $R_2^*$ quantification methods operate on MR images voxel-wise and require at least three measurements to estimate three quantities: water, fat, and $R_2^*$. Alternatively, the two-point Dixon MRI protocol is widely used and fast because it acquires only two measurements; however, these cannot be used to estimate three quantities voxel-wise. Leveraging the fact that neighboring voxels have similar values, we propose using a generative machine learning approach to learn PDFF and $R_2^*$ from Dixon MRI. We use paired Dixon-IDEAL data from UK Biobank in the liver and a Pix2Pix conditional GAN to demonstrate the first large-scale $R_2^*$ imputation from two-point Dixon MRIs. Using our proposed approach, we synthesize PDFF and $R_2^*$ maps that show significantly greater correlation with ground-truth than conventional voxel-wise baselines.

artificial intelligence, machine learning, pdff, (13 more...)

arXiv.org Artificial Intelligence

2410.11186

Country: North America > United States > California (0.94)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Assessment of Differentially Private Synthetic Data for Utility and Fairness in End-to-End Machine Learning Pipelines for Tabular Data

Pereira, Mayana, Kshirsagar, Meghana, Mukherjee, Sumit, Dodhia, Rahul, Ferres, Juan Lavista, de Sousa, Rafael

arXiv.org Artificial IntelligenceOct-29-2023

Differentially private (DP) synthetic data sets are a solution for sharing data while preserving the privacy of individual data providers. Understanding the effects of utilizing DP synthetic data in end-to-end machine learning pipelines impacts areas such as health care and humanitarian action, where data is scarce and regulated by restrictive privacy laws. In this work, we investigate the extent to which synthetic data can replace real, tabular data in machine learning pipelines and identify the most effective synthetic data generation techniques for training and evaluating machine learning models. We systematically investigate the impacts of differentially private synthetic data on downstream classification tasks from the point of view of utility as well as fairness. Our analysis is comprehensive and includes representatives of the two main types of synthetic data generation algorithms: marginal-based and GAN-based. To the best of our knowledge, our work is the first that: (i) proposes a training and evaluation framework that does not assume that real data is available for testing the utility and fairness of machine learning models trained on synthetic data; (ii) presents the most extensive analysis of synthetic data set generation algorithms in terms of utility and fairness when used for training machine learning models; and (iii) encompasses several different definitions of fairness. Our findings demonstrate that marginal-based synthetic data generators surpass GAN-based ones regarding model training utility for tabular data. Indeed, we show that models trained using data generated by marginal-based algorithms can exhibit similar utility to models trained using real data. Our analysis also reveals that the marginal-based synthetic data generator MWEM PGM can train models that simultaneously achieve utility and fairness characteristics close to those obtained by models trained with real data.

artificial intelligence, machine learning, synthetic data, (18 more...)

arXiv.org Artificial Intelligence

2310.1925

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)

Add feedback

A Mean Field Approach to Empirical Bayes Estimation in High-dimensional Linear Regression

Mukherjee, Sumit, Sen, Bodhisattva, Sen, Subhabrata

arXiv.org Machine LearningOct-25-2023

We study empirical Bayes estimation in high-dimensional linear regression. To facilitate computationally efficient estimation of the underlying prior, we adopt a variational empirical Bayes approach, introduced originally in Carbonetto and Stephens (2012) and Kim et al. (2022). We establish asymptotic consistency of the nonparametric maximum likelihood estimator (NPMLE) and its (computable) naive mean field variational surrogate under mild assumptions on the design and the prior. Assuming, in addition, that the naive mean field approximation has a dominant optimizer, we develop a computationally efficient approximation to the oracle posterior distribution, and establish its accuracy under the 1-Wasserstein metric. This enables computationally feasible Bayesian inference; e.g., construction of posterior credible intervals with an average coverage guarantee, Bayes optimal estimation for the regression coefficients, estimation of the proportion of non-nulls, etc. Our analysis covers both deterministic and random designs, and accommodates correlations among the features. To the best of our knowledge, this provides the first rigorous nonparametric empirical Bayes method in a high-dimensional regression setting without sparsity.

approximation, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

2309.16843

Country:

Europe > United Kingdom > England (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Variational Inference in high-dimensional linear regression

Mukherjee, Sumit, Sen, Subhabrata

arXiv.org Machine LearningApr-25-2021

We study high-dimensional Bayesian linear regression with product priors. Using the nascent theory of non-linear large deviations (Chatterjee and Dembo,2016), we derive sufficient conditions for the leading-order correctness of the naive mean-field approximation to the log-normalizing constant of the posterior distribution. Subsequently, assuming a true linear model for the observed data, we derive a limiting infinite dimensional variational formula for the log normalizing constant of the posterior. Furthermore, we establish that under an additional "separation" condition, the variational problem has a unique optimizer, and this optimizer governs the probabilistic properties of the posterior distribution. We provide intuitive sufficient conditions for the validity of this "separation" condition. Finally, we illustrate our results on concrete examples with specific design matrices.

approximation, bayesian inference, survey article, (19 more...)

arXiv.org Machine Learning

2104.12232

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.85)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

Reducing bias and increasing utility by federated generative modeling of medical images using a centralized adversary

Rajotte, Jean-Francois, Mukherjee, Sumit, Robinson, Caleb, Ortiz, Anthony, West, Christopher, Ferres, Juan Lavista, Ng, Raymond T

arXiv.org Machine LearningJan-18-2021

We introduce FELICIA (FEderated LearnIng with a CentralIzed Adversary) a generative mechanism enabling collaborative learning. In particular, we show how a data owner with limited and biased data could benefit from other data owners while keeping data from all the sources private. This is a common scenario in medical image analysis where privacy legislation prevents data from being shared outside local premises. FELICIA works for a large family of Generative Adversarial Networks (GAN) architectures including vanilla and conditional GANs as demonstrated in this work. We show that by using the FELICIA mechanism, a data owner with limited image samples can generate high-quality synthetic images with high utility while neither data owners has to provide access to its data. The sharing happens solely through a central discriminator that has access limited to synthetic data. Here, utility is defined as classification performance on a real test set. We demonstrate these benefits on several realistic healthcare scenarions using benchmark image datasets (MNIST, CIFAR-10) as well as on medical images for the task of skin lesion classification. With multiple experiments, we show that even in the worst cases, combining FELICIA with real data gracefully achieves performance on par with real data while most results significantly improves the utility.

dataset, neural network, oncology, (23 more...)

arXiv.org Machine Learning

2101.07235

Country: North America > Canada > British Columbia (0.14)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Dermatology (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

Add feedback

Risks of Using Non-verified Open Data: A case study on using Machine Learning techniques for predicting Pregnancy Outcomes in India

Trivedi, Anusua, Mukherjee, Sumit, Tse, Edmund, Ewing, Anne, Ferres, Juan Lavista

arXiv.org Artificial IntelligenceOct-21-2019

Artificial intelligence (AI) has evolved considerably in the last few years. While applications of AI is now becoming more common in fields like retail and marketing, application of AI in solving problems related to developing countries is still an emerging topic. Specially, AI applications in resource-poor settings remains relatively nascent. There is a huge scope of AI being used in such settings. For example, researchers have started exploring AI applications to reduce poverty and deliver a broad range of critical public services. However, despite many promising use cases, there are many dataset related challenges that one has to overcome in such projects. These challenges often take the form of missing data, incorrectly collected data and improperly labeled variables, among other factors. As a result, we can often end up using data that is not representative of the problem we are trying to solve. In this case study, we explore the challenges of using such an open dataset from India, to predict an important health outcome. We highlight how the use of AI without proper understanding of reporting metrics can lead to erroneous conclusions.

artificial intelligence, dataset, health & medicine, (17 more...)

arXiv.org Artificial Intelligence

1910.02136

Country: Asia > India > NCT (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (1.00)
Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (1.00)
Health & Medicine > Public Health > Maternal Health (0.95)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback