lundberg
A Virtual Cell Is a 'Holy Grail' of Science. It's Getting Closer.
The human cell is a miserable thing to study. Tens of trillions of them exist in the body, forming an enormous and intricate network that governs every disease and metabolic process. Each cell in that circuit is itself the product of an equally dense and complex interplay among genes, proteins, and other bits of profoundly small biological machinery. Our understanding of this world is hazy and constantly in flux. As recently as a few years ago, scientists thought there were only a few hundred distinct cell types, but new technologies have revealed thousands (and that's just the start).
Targeted Data Generation: Finding and Fixing Model Weaknesses
He, Zexue, Ribeiro, Marco Tulio, Khani, Fereshte
Even when aggregate accuracy is high, state-of-the-art NLP models often fail systematically on specific subgroups of data, resulting in unfair outcomes and eroding user trust. Additional data collection may not help in addressing these weaknesses, as such challenging subgroups may be unknown to users, and underrepresented in the existing and new data. We propose Targeted Data Generation (TDG), a framework that automatically identifies challenging subgroups, and generates new data for those subgroups using large language models (LLMs) with a human in the loop. TDG estimates the expected benefit and potential harm of data augmentation for each subgroup, and selects the ones most likely to improve within group performance without hurting overall performance. In our experiments, TDG significantly improves the accuracy on challenging subgroups for state-of-the-art sentiment analysis and natural language inference models, while also improving overall test accuracy.
Five ways deep learning has transformed image analysis
But in the human brain, that volume of tissue contains some 50,000 neural'wires' connected by 134 million synapses. Jeff Lichtman wanted to trace them all. To generate the raw data, he used a protocol known as serial thin-section electron microscopy, imaging thousands of slivers of tissue over 11 months. But the data set was enormous, amounting to 1.4 petabytes -- the equivalent of about 2 million CD-ROMs -- far too much for researchers to handle on their own. "It is simply impossible for human beings to manually trace out all the wires," says Lichtman, a molecular and cell biologist at Harvard University in Cambridge, Massachusetts.
Comparing Baseline Shapley and Integrated Gradients for Local Explanation: Some Additional Insights
Feng, Tianshu, Zhou, Zhipu, Tarun, Joshi, Nair, Vijayan N.
There are many different methods in the literature for local explanation of machine learning results. However, the methods differ in their approaches and often do not provide same explanations. In this paper, we consider two recent methods: Integrated Gradients (Sundararajan, Taly, & Yan, 2017) and Baseline Shapley (Sundararajan and Najmi, 2020). The original authors have already studied the axiomatic properties of the two methods and provided some comparisons. Our work provides some additional insights on their comparative behavior for tabular data. We discuss common situations where the two provide identical explanations and where they differ. We also use simulation studies to examine the differences when neural networks with ReLU activation function is used to fit the models.
Brain Predictability toolbox: a Python library for neuroimaging based machine learning
Hahn, Sage, Yuan, Dekang, Thompson, Wesley, Owens, Max M, Allgaier, Nicholas, Garavan, Hugh
Summary Brain Predictability toolbox (BPt) represents a unified framework of machine learning (ML) tools designed to work with both tabulated data (in particular brain, psychiatric, behavioral, and physiological variables) and neuroimaging specific derived data (e.g., brain volumes and surfaces). This package is suitable for investigating a wide range of different neuroimaging based ML questions, in particular, those queried from large human datasets. Availability and Implementation BPt has been developed as an open-source Python 3.6+ package hosted at https://github.com/sahahn/BPt under MIT License, with documentation provided at https://bpt.readthedocs.io/en/latest/, and continues to be actively developed. The project can be downloaded through the github link provided. A web GUI interface based on the same code is currently under development and can be set up through docker with instructions at https://github.com/sahahn/BPt_app. Contact Please contact Sage Hahn at sahahn@uvm.edu
Deep learning takes on tumours
As cancer cells spread in a culture dish, Guillaume Jacquemet is watching. The cell movements hold clues to how drugs or gene variants might affect the spread of tumours in the body, and he is tracking the nucleus of each cell in frame after frame of time-lapse microscopy films. But because he has generated about 500 films, each with 120 frames and 200–300 cells per frame, that analysis is challenging to say the least. "If I had to do the tracking manually, it would be impossible," says Jacquemet, a cell biologist at Åbo Akademi University in Turku, Finland. So he has trained a machine to spot the nuclei instead.
Optimal and Greedy Algorithms for Multi-Armed Bandits with Many Arms
Bayati, Mohsen, Hamidi, Nima, Johari, Ramesh, Khosravi, Khashayar
We characterize Bayesian regret in a stochastic multi-armed bandit problem with a large but finite number of arms. In particular, we assume the number of arms $k$ is $T^{\alpha}$, where $T$ is the time-horizon and $\alpha$ is in $(0,1)$. We consider a Bayesian setting where the reward distribution of each arm is drawn independently from a common prior, and provide a complete analysis of expected regret with respect to this prior. Our results exhibit a sharp distinction around $\alpha = 1/2$. When $\alpha < 1/2$, the fundamental lower bound on regret is $\Omega(k)$; and it is achieved by a standard UCB algorithm. When $\alpha > 1/2$, the fundamental lower bound on regret is $\Omega(\sqrt{T})$, and it is achieved by an algorithm that first subsamples $\sqrt{T}$ arms uniformly at random, then runs UCB on just this subset. Interestingly, we also find that a sufficiently large number of arms allows the decision-maker to benefit from "free" exploration if she simply uses a greedy algorithm. In particular, this greedy algorithm exhibits a regret of $\tilde{O}(\max(k,T/\sqrt{k}))$, which translates to a {\em sublinear} (though not optimal) regret in the time horizon. We show empirically that this is because the greedy algorithm rapidly disposes of underperforming arms, a beneficial trait in the many-armed regime. Technically, our analysis of the greedy algorithm involves a novel application of the Lundberg inequality, an upper bound for the ruin probability of a random walk; this approach may be of independent interest.
PerceptiLabs' drag-and-drop interface makes ML modeling easier and faster
One of machine learning's promises is to help humans do things faster and more efficiently. Ironically, one of the roadblocks that keeps businesses and independent developers from capitalizing on ML's capabilities is that it can be time-consuming and difficult to build, train, and deploy models. PerceptiLabs, a two-person Swedish startup, developed a visual drag-and-drop interface to streamline and simplify the entire process. It's designed specifically to offload some of the labor a data scientist or developer would usually have to perform, thereby accelerating the process of development. But it also has pragmatic implications for any business or organization struggling with developing ML tools, because in addition to giving a dev team a speed boost, it allows non-technical people to better understand the process and collaborate.
AI and EVE Online Community Improve Cell and Protein Mapping in the Human Body
August 20th 2018 – Reykjavik, Iceland – Researchers from KTH Royal Institute of Technology and Massive Multiplayer Online Science (MMOS) worked with CCP Games using their massively multiplayer online game set in space, EVE Online to gain a more granular understanding of patterns of proteins arranged within the body's cells. Built on a map that shows hundreds of thousands of microscopic images of human cells, EVE Online players worked alongside an artificial intelligence to accomplish this goal. In a study to be published in the September issue of Nature Biotechnology, the researchers found that players, or "citizen scientists" as KTH and MMOS now call them, helped boost the artificial intelligence system used for predicting protein localization on a subcellular level. The combination of crowdsourcing and AI led to improved classification of subcellular protein patterns and the first-time identification of ten new members of the family of cellular structures known as "Rods & Rings," according to Emma Lundberg, a researcher from KTH who leads the Cell Atlas, part of the Human Protein Atlas, at the Science for Life joint research center. She is also the first ever scientist who was put into a videogame as an agent NPC (non-playable character) to direct the project in-game as Professor Lundberg.