Goto

Collaborating Authors

 engine


Reid Hoffman Thinks Doctors Should Ask AI for a Second Opinion

WIRED

The LinkedIn cofounder now has an AI drug discovery startup--and thinks not asking chatbots for medical advice is "bordering on committing malpractice." Following a three-decade career at the helm of some of Silicon Valley's most powerful companies--cofounding LinkedIn and sitting on the boards of PayPal and OpenAI-- Reid Hoffman recently turned his attention to health care. Hoffman's startup, Manas AI, is building an AI engine that aims to fast-track the traditionally slow process of drug discovery for various cancers. Inspired by a dinner with renowned cancer physician Siddhartha Mukherjee, the company's cofounder and CEO, its mission statement is to "shift drug discovery from a decade-long process to one that takes a few years." But Hoffman's enthusiasm for generative AI, in particular, stretches far beyond novel drug targets and small molecules.


Heterogeneity-Aware Personalized Federated Learning for Industrial Predictive Analytics

Hu, Yuhan, Fang, Xiaolei

arXiv.org Machine Learning

Federated prognostics enable clients (e.g., companies, factories, and production lines) to collaboratively develop a failure time prediction model while keeping each client's data local and confidential. However, traditional federated models often assume homogeneity in the degradation processes across clients, an assumption that may not hold in many industrial settings. To overcome this, this paper proposes a personalized federated prognostic model designed to accommodate clients with heterogeneous degradation processes, allowing them to build tailored prognostic models. The prognostic model iteratively facilitates the underlying pairwise collaborations between clients with similar degradation patterns, which enhances the performance of personalized federated learning. To estimate parameters jointly using decentralized datasets, we develop a federated parameter estimation algorithm based on proximal gradient descent. The proposed approach addresses the limitations of existing federated prognostic models by simultaneously achieving model personalization, preserving data privacy, and providing comprehensive failure time distributions. The superiority of the proposed model is validated through extensive simulation studies and a case study using the turbofan engine degradation dataset from the NASA repository.


fastml: Guarded Resampling Workflows for Safer Automated Machine Learning in R

Korkmaz, Selcuk, Goksuluk, Dincer, Karaismailoglu, Eda

arXiv.org Machine Learning

Preprocessing leakage arises when scaling, imputation, or other data-dependent transformations are estimated before resampling, inflating apparent performance while remaining hard to detect. We present fastml, an R package that provides a single-call interface for leakage-aware machine learning through guarded resampling, where preprocessing is re-estimated inside each resample and applied to the corresponding assessment data. The package supports grouped and time-ordered resampling, blocks high-risk configurations, audits recipes for external dependencies, and includes sandboxed execution and integrated model explanation. We evaluate fastml with a Monte Carlo simulation contrasting global and fold-local normalization, a usability comparison with tidymodels under matched specifications, and survival benchmarks across datasets of different sizes. The simulation demonstrates that global preprocessing substantially inflates apparent performance relative to guarded resampling. fastml matched held-out performance obtained with tidymodels while reducing workflow orchestration, and it supported consistent benchmarking of multiple survival model classes through a unified interface.


Adaptive Gaussian Process Search for Simulation-Based Sample Size Estimation in Clinical Prediction Models: Validation of the pmsims R Package

Olaniran, Oyebayo Ridwan, Shamsutdinova, Diana, Markham, Sarah, Zimmer, Felix, Stahl, Daniel, Forbes, Gordon, Carr, Ewan

arXiv.org Machine Learning

Background: Determining an adequate sample size is essential for developing reliable and generalisable clinical prediction models, yet practical guidance on selecting appropriate methods remains limited. Existing analytical and simulation-based approaches often rely on restrictive assumptions and focus on mean-based criteria. We present and validate pmsims, an R package that uses Gaussian process surrogate modelling to provide a flexible and computationally efficient simulation-based framework for sample size determination across diverse prediction settings. Methods: We conducted a comprehensive simulation study with two aims. First, we compared three search engines implemented in pmsims: a Gaussian process-based adaptive method, a deterministic bisection method, and a hybrid approach, across binary, continuous, and survival outcomes. Second, we benchmarked the best-performing pmsims engine against existing analytical (pmsampsize) and simulation-based (samplesizedev) methods, evaluating recommended sample sizes, computational time, and achieved performance on large independent validation datasets. Results: The Gaussian process-based method consistently produced the most stable sample size estimates, particularly in low-signal, high-dimensional settings. In benchmarking, pmsims achieved performance close to prespecified targets across all outcome types, matching simulation-based approaches and outperforming analytical methods in more challenging scenarios. Conclusions: pmsims provides an efficient and flexible framework for principled sample size planning in clinical prediction modelling, requiring fewer model evaluations than non-adaptive simulation approaches.


End-to-End Differentiable Physics for Learning and Control

Neural Information Processing Systems

We present a differentiable physics engine that can be integrated as a module in deep neural networks for end-to-end learning. As a result, structured physics knowledge can be embedded into larger systems, allowing them, for example, to match observations by performing precise simulations, while achieves high sample efficiency. Specifically, in this paper we demonstrate how to perform backpropagation analytically through a physical simulator defined via a linear complementarity problem. Unlike traditional finite difference methods, such gradients can be computed analytically, which allows for greater flexibility of the engine. Through experiments in diverse domains, we highlight the system's ability to learn physical parameters from data, efficiently match and simulate observed visual behavior, and readily enable control via gradient-based planning methods. Code for the engine and experiments is included with the paper.





Engaging look at friction shows how it keeps our world rubbing along

New Scientist

How much do you know about friction? Jennifer R. Vail's charming, if sometimes technical, biography of the force showcases its amazing and largely overlooked role in everything from climate change to dark matter, says Karmela Padavic-Callaghan IN 2009, World Aquatics banned a specific type of swimsuit from all international competitions in water sports, ruling that it gave athletes an unfair advantage. The development of this swimsuit included using NASA's testing facilities and sophisticated computer software. Some versions had ultrasonically welded seams instead of traditional stitches. Swimmers who wore the suit broke 23 of the 25 world records set at the Beijing Olympics in 2008.


Yahoo is adding generative AI to its search engine

Engadget

Apple could unveil Gemini-powered Siri in Feb. Yahoo Scout will be powered by Claude and is integrated across the company's products. Yahoo has a new AI-powered answer engine, dubbed Yahoo Scout. The new tool is available now in beta and is powered by Anthropic's . The company says Scout synthesizes info from the web, as well as Yahoo's own data and content when constructing responses to user's natural-language search queries. Yahoo says the interface will include interactive digital media, structured lists and tables and visible source links aimed at making answers easier to verify.