AITopics | Wolf, Guy

Collaborating Authors

Wolf, Guy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Random Forest Autoencoders for Guided Representation Learning

Aumon, Adrien, Ni, Shuang, Lizotte, Myriam, Wolf, Guy, Moon, Kevin R., Rhodes, Jake S.

arXiv.org Artificial IntelligenceMar-16-2025

Decades of research have produced robust methods for unsupervised data visualization, yet supervised visualization$\unicode{x2013}$where expert labels guide representations$\unicode{x2013}$remains underexplored, as most supervised approaches prioritize classification over visualization. Recently, RF-PHATE, a diffusion-based manifold learning method leveraging random forests and information geometry, marked significant progress in supervised visualization. However, its lack of an explicit mapping function limits scalability and prevents application to unseen data, posing challenges for large datasets and label-scarce scenarios. To overcome these limitations, we introduce Random Forest Autoencoders (RF-AE), a neural network-based framework for out-of-sample kernel extension that combines the flexibility of autoencoders with the supervised learning strengths of random forests and the geometry captured by RF-PHATE. RF-AE enables efficient out-of-sample supervised visualization and outperforms existing methods, including RF-PHATE's standard kernel extension, in both accuracy and interpretability. Additionally, RF-AE is robust to the choice of hyper-parameters and generalizes to any kernel-based dimensionality reduction method.

artificial intelligence, machine learning, random forest autoencoder, (15 more...)

arXiv.org Artificial Intelligence

2502.13257

Country:

North America > Canada > Quebec (0.14)
North America > United States > Utah (0.14)
North America > United States > New York (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Immunology (0.67)
Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Principal Curvatures Estimation with Applications to Single Cell Data

Zhang, Yanlei, Mezrag, Lydia, Sun, Xingzhi, Xu, Charles, Macdonald, Kincaid, Bhaskar, Dhananjay, Krishnaswamy, Smita, Wolf, Guy, Rieck, Bastian

arXiv.org Artificial IntelligenceFeb-5-2025

The rapidly growing field of single-cell transcriptomic sequencing (scRNAseq) presents challenges for data analysis due to its massive datasets. A common method in manifold learning consists in hypothesizing that datasets lie on a lower dimensional manifold. This allows to study the geometry of point clouds by extracting meaningful descriptors like curvature. In this work, we will present Adaptive Local PCA (AdaL-PCA), a data-driven method for accurately estimating various notions of intrinsic curvature on data manifolds, in particular principal curvatures for surfaces. The model relies on local PCA to estimate the tangent spaces. The evaluation of AdaL-PCA on sampled surfaces shows state-of-the-art results. Combined with a PHATE embedding, the model applied to single-cell RNA sequencing data allows us to identify key variations in the cellular differentiation.

artificial intelligence, curvature, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2502.0375

Country:

North America > Canada > Quebec (0.14)
North America > United States > New York (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Non-Uniform Parameter-Wise Model Merging

Camacho, Albert Manuel Orozco, Horoi, Stefan, Wolf, Guy, Belilovsky, Eugene

arXiv.org Artificial IntelligenceDec-19-2024

Combining multiple machine learning models has long been a technique for enhancing performance, particularly in distributed settings. Traditional approaches, such as model ensembles, work well, but are expensive in terms of memory and compute. Recently, methods based on averaging model parameters have achieved good results in some settings and have gained popularity. However, merging models initialized differently that do not share a part of their training trajectories can yield worse results than simply using the base models, even after aligning their neurons. In this paper, we introduce a novel approach, Non-uniform Parameter-wise Model Merging, or NP Merge, which merges models by learning the contribution of each parameter to the final model using gradient-based optimization. We empirically demonstrate the effectiveness of our method for merging models of various architectures in multiple settings, outperforming past methods. We also extend NP Merge to handle the merging of multiple models, showcasing its scalability and robustness.

artificial intelligence, machine learning, proceedings, (15 more...)

arXiv.org Artificial Intelligence

2412.15467

Country: North America > Canada > Quebec (0.29)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Towards Graph Foundation Models: A Study on the Generalization of Positional and Structural Encodings

Franks, Billy Joe, Eliasof, Moshe, Cantürk, Semih, Wolf, Guy, Schönlieb, Carola-Bibiane, Fellenz, Sophie, Kloft, Marius

arXiv.org Artificial IntelligenceDec-10-2024

Recent advances in integrating positional and structural encodings (PSEs) into graph neural networks (GNNs) have significantly enhanced their performance across various graph learning tasks. However, the general applicability of these encodings and their potential to serve as foundational representations for graphs remain uncertain. This paper investigates the fine-tuning efficiency, scalability with sample size, and generalization capability of learnable PSEs across diverse graph datasets. Specifically, we evaluate their potential as universal pre-trained models that can be easily adapted to new tasks with minimal fine-tuning and limited data. Furthermore, we assess the expressivity of the learned representations, particularly, when used to augment downstream GNNs. We demonstrate through extensive benchmarking and empirical analysis that PSEs generally enhance downstream models. However, some datasets may require specific PSE-augmentations to achieve optimal performance. Nevertheless, our findings highlight their significant potential to become integral components of future graph foundation models. We provide new insights into the strengths and limitations of PSEs, contributing to the broader discourse on foundation models in graph learning.

artificial intelligence, gpse, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2412.07407

Country: Europe (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Reaction-conditioned De Novo Enzyme Design with GENzyme

Hua, Chenqing, Lu, Jiarui, Liu, Yong, Zhang, Odin, Tang, Jian, Ying, Rex, Jin, Wengong, Wolf, Guy, Precup, Doina, Zheng, Shuangjia

arXiv.org Artificial IntelligenceNov-9-2024

The introduction of models like RFDiffusionAA, AlphaFold3, AlphaProteo, and Chai1 has revolutionized protein structure modeling and interaction prediction, primarily from a binding perspective, focusing on creating ideal lock-and-key models. However, these methods can fall short for enzyme-substrate interactions, where perfect binding models are rare, and induced fit states are more common. To address this, we shift to a functional perspective for enzyme design, where the enzyme function is defined by the reaction it catalyzes. Here, we introduce \textsc{GENzyme}, a \textit{de novo} enzyme design model that takes a catalytic reaction as input and generates the catalytic pocket, full enzyme structure, and enzyme-substrate binding complex. \textsc{GENzyme} is an end-to-end, three-staged model that integrates (1) a catalytic pocket generation and sequence co-design module, (2) a pocket inpainting and enzyme inverse folding module, and (3) a binding and screening module to optimize and predict enzyme-substrate complexes. The entire design process is driven by the catalytic reaction being targeted. This reaction-first approach allows for more accurate and biologically relevant enzyme design, potentially surpassing structure-based and binding-focused models in creating enzymes capable of catalyzing specific reactions. We provide \textsc{GENzyme} code at https://github.com/WillHua127/GENzyme.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2411.16694

Country: North America (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

Geometry-Aware Generative Autoencoders for Warped Riemannian Metric Learning and Generative Modeling on Data Manifolds

Sun, Xingzhi, Liao, Danqi, MacDonald, Kincaid, Zhang, Yanlei, Liu, Chen, Huguet, Guillaume, Wolf, Guy, Adelstein, Ian, Rudner, Tim G. J., Krishnaswamy, Smita

arXiv.org Machine LearningOct-18-2024

Rapid growth of high-dimensional datasets in fields such as single-cell RNA sequencing and spatial genomics has led to unprecedented opportunities for scientific discovery, but it also presents unique computational and statistical challenges. Traditional methods struggle with geometry-aware data generation, interpolation along meaningful trajectories, and transporting populations via feasible paths. To address these issues, we introduce Geometry-Aware Generative Autoencoder (GAGA), a novel framework that combines extensible manifold learning with generative modeling. GAGA constructs a neural network embedding space that respects the intrinsic geometries discovered by manifold learning and learns a novel warped Riemannian metric on the data space. This warped metric is derived from both the points on the data manifold and negative samples off the manifold, allowing it to characterize a meaningful geometry across the entire latent space. Using this metric, GAGA can uniformly sample points on the manifold, generate points along geodesics, and interpolate between populations across the learned manifold using geodesic-guided flows. GAGA shows competitive performance in simulated and real-world datasets, including a 30% improvement over the state-of-the-art methods in single-cell population-level trajectory inference.

artificial intelligence, machine learning, metric learning and generative modeling, (3 more...)

arXiv.org Machine Learning

2410.12779

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

Add feedback

EnzymeFlow: Generating Reaction-specific Enzyme Catalytic Pockets through Flow Matching and Co-Evolutionary Dynamics

Hua, Chenqing, Liu, Yong, Zhang, Dinghuai, Zhang, Odin, Luan, Sitao, Yang, Kevin K., Wolf, Guy, Precup, Doina, Zheng, Shuangjia

arXiv.org Artificial IntelligenceSep-30-2024

Enzyme design is a critical area in biotechnology, with applications ranging from drug development to synthetic biology. Traditional methods for enzyme function prediction or protein binding pocket design often fall short in capturing the dynamic and complex nature of enzyme-substrate interactions, particularly in catalytic processes. To address the challenges, we introduce EnzymeFlow, a generative model that employs flow matching with hierarchical pre-training and enzyme-reaction co-evolution to generate catalytic pockets for specific substrates and catalytic reactions. Additionally, we introduce a large-scale, curated, and validated dataset of enzyme-reaction pairs, specifically designed for the catalytic pocket generation task, comprising a total of $328,192$ pairs. By incorporating evolutionary dynamics and reaction-specific adaptations, EnzymeFlow becomes a powerful model for designing enzyme pockets, which is capable of catalyzing a wide range of biochemical reactions. Experiments on the new dataset demonstrate the model's effectiveness in designing high-quality, functional enzyme catalytic pockets, paving the way for advancements in enzyme engineering and synthetic biology. We provide EnzymeFlow code at https://github.com/WillHua127/EnzymeFlow with notebook demonstration at https://github.com/WillHua127/EnzymeFlow/blob/main/enzymeflow_demo.ipynb.

artificial intelligence, enzymeflow, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2410.00327

Country:

Europe (0.14)
North America (0.14)

Genre: Research Report (0.81)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Heterophilic Graph Learning Handbook: Benchmarks, Models, Theoretical Analysis, Applications and Challenges

Luan, Sitao, Hua, Chenqing, Lu, Qincheng, Ma, Liheng, Wu, Lirong, Wang, Xinyu, Xu, Minkai, Chang, Xiao-Wen, Precup, Doina, Ying, Rex, Li, Stan Z., Tang, Jian, Wolf, Guy, Jegelka, Stefanie

arXiv.org Artificial IntelligenceJul-12-2024

Homophily principle, \ie{} nodes with the same labels or similar attributes are more likely to be connected, has been commonly believed to be the main reason for the superiority of Graph Neural Networks (GNNs) over traditional Neural Networks (NNs) on graph-structured data, especially on node-level tasks. However, recent work has identified a non-trivial set of datasets where GNN's performance compared to the NN's is not satisfactory. Heterophily, i.e. low homophily, has been considered the main cause of this empirical observation. People have begun to revisit and re-evaluate most existing graph models, including graph transformer and its variants, in the heterophily scenario across various kinds of graphs, e.g. heterogeneous graphs, temporal graphs and hypergraphs. Moreover, numerous graph-related applications are found to be closely related to the heterophily problem. In the past few years, considerable effort has been devoted to studying and addressing the heterophily issue. In this survey, we provide a comprehensive review of the latest progress on heterophilic graph learning, including an extensive summary of benchmark datasets and evaluation of homophily metrics on synthetic graphs, meticulous classification of the most updated supervised and unsupervised learning methods, thorough digestion of the theoretical analysis on homophily/heterophily, and broad exploration of the heterophily-related applications. Notably, through detailed experiments, we are the first to categorize benchmark heterophilic datasets into three sub-categories: malignant, benign and ambiguous heterophily. Malignant and ambiguous datasets are identified as the real challenging datasets to test the effectiveness of new models on the heterophily challenge. Finally, we propose several challenges and future directions for heterophilic graph representation learning.

artificial intelligence, heterophilic graph learning handbook, machine learning, (2 more...)

arXiv.org Artificial Intelligence

2407.09618

Genre: Overview (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.44)

Add feedback

Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis

Horoi, Stefan, Camacho, Albert Manuel Orozco, Belilovsky, Eugene, Wolf, Guy

arXiv.org Machine LearningJul-7-2024

Combining the predictions of multiple trained models through ensembling is generally a good way to improve accuracy by leveraging the different learned features of the models, however it comes with high computational and storage costs. Model fusion, the act of merging multiple models into one by combining their parameters reduces these costs but doesn't work as well in practice. Indeed, neural network loss landscapes are high-dimensional and non-convex and the minima found through learning are typically separated by high loss barriers. Numerous recent works have been focused on finding permutations matching one network features to the features of a second one, lowering the loss barrier on the linear path between them in parameter space. However, permutations are restrictive since they assume a one-to-one mapping between the different models' neurons exists. We propose a new model merging algorithm, CCA Merge, which is based on Canonical Correlation Analysis and aims to maximize the correlations between linear combinations of the model features. We show that our alignment method leads to better performances than past methods when averaging models trained on the same, or differing data splits. We also extend this analysis into the harder setting where more than 2 models are merged, and we find that CCA Merge works significantly better than past methods. Our code is publicly available at https://github.com/shoroi/align-n-merge

artificial intelligence, machine learning, neuron, (15 more...)

arXiv.org Machine Learning

2407.05385

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Enhancing Supervised Visualization through Autoencoder and Random Forest Proximities for Out-of-Sample Extension

Ni, Shuang, Aumon, Adrien, Wolf, Guy, Moon, Kevin R., Rhodes, Jake S.

arXiv.org Machine LearningJun-6-2024

The value of supervised dimensionality reduction lies in its ability to uncover meaningful connections between data features and labels. Common dimensionality reduction methods embed a set of fixed, latent points, but are not capable of generalizing to an unseen test set. In this paper, we provide an out-of-sample extension method for the random forest-based supervised dimensionality reduction method, RF-PHATE, combining information learned from the random forest model with the function-learning capabilities of autoencoders. Through quantitative assessment of various autoencoder architectures, we identify that networks that reconstruct random forest proximities are more robust for the embedding extension problem. Furthermore, by leveraging proximity-based prototypes, we achieve a 40% reduction in training time without compromising extension quality. Our method does not require label information for out-of-sample points, thus serving as a semi-supervised method, and can achieve consistent quality using only 10% of the training data.

artificial intelligence, decision tree learning, machine learning, (18 more...)

arXiv.org Machine Learning

2406.04421

Country: North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)

Add feedback