mossel
A Hierarchical Language Model with Predictable Scaling Laws and Provable Benefits of Reasoning
Gaitonde, Jason, Koehler, Frederic, Mossel, Elchanan, Shin, Joonhyung, Sly, Allan
We introduce a family of synthetic languages with hierarchical structure -- generated by a broadcast process on trees -- for which the role of context length and reasoning in autoregressive generation can be analyzed precisely. At the heart of our analytic approach is an \emph{exact $k$-gram ansatz} in place of transformers with context length $k$, a substitution we then validate empirically. Using this ansatz we derive explicit asymptotic predictions for distributional statistics of the sequences produced by a trained model, instantiated in two settings. For the \emph{Ising broadcast process} (a soft-constrained language), we prove that the variance of the generated sum scales log-linearly in the context depth and its kurtosis converges to that of a Gaussian -- both deviating from the true language for any sublinear context. For the \emph{coloring broadcast process} (a hard-constrained language) in the freezing regime, bounded-context autoregression produces sequences that, with high probability, are inconsistent with \emph{any} valid coloring of the underlying tree. Together these results imply an $ฮฉ(n)$ lower bound on the context length required to faithfully sample length-$n$ sequences. In contrast, we prove that an autoregressive \emph{reasoning} model with only $ฮ(\log n)$ working memory can sample exactly from the true language -- an exponential improvement. We confirm both the lower-bound predictions and the reasoning-based upper bound empirically with transformers trained on the synthetic language; the trained models track our asymptotic predictions quantitatively across a wide range of context sizes.
Reconstruction on Trees and Low-Degree Polynomials Frederic Koehler
The study of Markov processes and broadcasting on trees has deep connections to a variety of areas including statistical physics, graphical models, phylogenetic reconstruction, Markov Chain Monte Carlo, and community detection in random graphs. Notably, the celebrated Belief Propagation (BP) algorithm achieves Bayes-optimal performance for the reconstruction problem of predicting the value of the Markov process at the root of the tree from its values at the leaves. Recently, the analysis of low-degree polynomials has emerged as a valuable tool for predicting computational-to-statistical gaps. In this work, we investigate the performance of low-degree polynomials for the reconstruction problem on trees.
Low Degree Hardness for Broadcasting on Trees
We study the low-degree hardness of broadcasting on trees.Broadcasting on trees has been extensively studied in statistical physics, in computational biology in relation to phylogenetic reconstruction and in statistics and computer science in the context of block model inference, and as a simple data model for algorithms that may require depth for inference. The inference of the root can be carried by celebrated Belief Propagation (BP) algorithm which achieves Bayes-optimal performance. Despite the fact that this algorithm runs in linear time (using real operations), recent works indicated that this algorithm in fact requires high level of complexity. Moitra, Mossel and Sandon constructed a chain for which estimating the root better than random (for a typical input) is NC1 complete. Kohler and Mossel constructed chains such that for trees with N leaves, recovering the root better than random requires a polynomial of degree N {\Omega(1)} .
Reconstruction on Trees and Low-Degree Polynomials
The study of Markov processes and broadcasting on trees has deep connections to a variety of areas including statistical physics, graphical models, phylogenetic reconstruction, Markov Chain Monte Carlo, and community detection in random graphs. Notably, the celebrated Belief Propagation (BP) algorithm achieves Bayes-optimal performance for the reconstruction problem of predicting the value of the Markov process at the root of the tree from its values at the leaves.Recently, the analysis of low-degree polynomials has emerged as a valuable tool for predicting computational-to-statistical gaps. In this work, we investigate the performance of low-degree polynomials for the reconstruction problem on trees. Perhaps surprisingly, we show that there are simple tree models with N leaves and bounded arity where (1) nontrivial reconstruction of the root value is possible with a simple polynomial time algorithm and with robustness to noise, but not with any polynomial of degree N {c} for c 0 a constant depending only on the arity, and (2) when the tree is unknown and given multiple samples with correlated root assignments, nontrivial reconstruction of the root value is possible with a simple Statistical Query algorithm but not with any polynomial of degree N c . These results clarify some of the limitations of low-degree polynomials vs. polynomial time algorithms for Bayesian estimation problems.
Ibex raises $38M for AI that diagnoses diseases
Ibex Medical Analytics, a Tel Aviv, Israel-based startup developing a product suite for clinical decision-making and pathology laboratory workflows, today announced that it raised $38 million in funding. The global big data analytics market for health care was valued at $16.87 billion in 2017 and is projected to reach $67.82 billion by 2025, according to a recent report from Allied Market Research. It's believed that health care organizations' implementation of big data analytics might reduce annual costs by over 25% in the coming years. Better diagnosis and disease predictions, assisted by AI and analytics, can lead to cost reduction by decreasing hospital readmission rates, among other factors. Ibex, whose competitors include Paige, PathAI, and ContextVision, among others, was founded by Joseph Mossel and Dr. Chaim Linhart. Mossel's background is in computer science, product development, and management, whereas Linhart is an AI and machine learning researcher.
AI algorithm for detecting prostate cancer shows more than 98% sensitivity, 97% specificity in study - MedCity News
An Israeli startup developing a digital pathology system based around artificial intelligence has published what it calls "outstanding outcomes" in a clinical validation study. Tel Aviv-based Ibex Medical Analysis said Tuesday that it had published data on Galen Prostate, its AI-based system for use by pathologists to detect and measure prostate cancer, in The Lancet Digital Health. The company called it the first and only AI-based system used by pathologists in routine clinical practice. The study took place at the University of Pittsburgh Medical Center, led by Drs. According to the data, sensitivity measured for prostate cancer was 98.46%, and specificity was 97.33%, while the operating characteristic curve was 0.991.
Israeli start-up Ibex helps detect cancer using AI
Tackling cancer is one of the most significant public health challenges of the 21st century, and accurately detecting the presence of tumors is an essential part of this challenge.Tel Aviv-based Ibex Medical Analytics supports medical professionals through its AI-based technology, which is designed to be used in routine clinical practice.Ibex's Galen Prostate solution helps identify suspected cancer on prostate core-needle biopsies. It just received the CE-IVD Mark, the company announced last Thursday, marking the first time one of its products has received the certification of conformity to EU health, safety and protection standards.Ibex provides an important contribution in facing two challenges, the shortage of pathologists and the occurrence of human errors, CEO Joseph Mossel told The Jerusalem Post."When a person shows potential symptoms of cancer, the next step is a biopsy, an analysis of a sample of the affected tissues," he said. "The examination of the sample has to be carried out by a pathologist using a microscope. Pathology is a complex medical discipline that requires years of training, and the world is facing a shortage of these specialists, even in countries such as the US and the UK. There is a growing gap between supply and demand."Galen
The non-tightness of the reconstruction threshold of a 4 states symmetric model with different in-block and out-block mutations
The tree reconstruction problem is to collect and analyze massive data at the $n$th level of the tree, to identify whether there is non-vanishing information of the root, as $n$ goes to infinity. Its connection to the clustering problem in the setting of the stochastic block model, which has wide applications in machine learning and data mining, has been well established. For the stochastic block model, an "information-theoretically-solvable-but-computationally-hard" region, or say "hybrid-hard phase", appears whenever the reconstruction bound is not tight of the corresponding reconstruction on the tree problem. Although it has been studied in numerous contexts, the existing literature with rigorous reconstruction thresholds established are very limited, and it becomes extremely challenging when the model under investigation has $4$ states (the stochastic block model with $4$ communities). In this paper, inspired by the newly proposed $q_1+q_2$ stochastic block model, we study a $4$ states symmetric model with different in-block and out-block transition probabilities, and rigorously give the conditions for the non-tightness of the reconstruction threshold.
Accuracy-Memory Tradeoffs and Phase Transitions in Belief Propagation
Jain, Vishesh, Koehler, Frederic, Liu, Jingbo, Mossel, Elchanan
The analysis of Belief Propagation and other algorithms for the {\em reconstruction problem} plays a key role in the analysis of community detection in inference on graphs, phylogenetic reconstruction in bioinformatics, and the cavity method in statistical physics. We prove a conjecture of Evans, Kenyon, Peres, and Schulman (2000) which states that any bounded memory message passing algorithm is statistically much weaker than Belief Propagation for the reconstruction problem. More formally, any recursive algorithm with bounded memory for the reconstruction problem on the trees with the binary symmetric channel has a phase transition strictly below the Belief Propagation threshold, also known as the Kesten-Stigum bound. The proof combines in novel fashion tools from recursive reconstruction, information theory, and optimal transport, and also establishes an asymptotic normality result for BP and other message-passing algorithms near the critical threshold.