AITopics | Poczos, Barnabas

Collaborating Authors

Poczos, Barnabas

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Diffusion Models in $\textit{De Novo}$ Drug Design

Alakhdar, Amira, Poczos, Barnabas, Washburn, Newell

arXiv.org Artificial IntelligenceJun-7-2024

Diffusion models have emerged as powerful tools for molecular generation, particularly in the context of 3D molecular structures. Inspired by non-equilibrium statistical physics, these models can generate 3D molecular structures with specific properties or requirements crucial to drug discovery. Diffusion models were particularly successful at learning 3D molecular geometries' complex probability distributions and their corresponding chemical and physical properties through forward and reverse diffusion processes. This review focuses on the technical implementation of diffusion models tailored for 3D molecular generation. It compares the performance, evaluation methods, and implementation details of various diffusion models used for molecular generation tasks. We cover strategies for atom and bond representation, architectures of reverse diffusion denoising networks, and challenges associated with generating stable 3D molecular structures. This review also explores the applications of diffusion models in $\textit{de novo}$ drug design and related areas of computational chemistry, such as structure-based drug design, including target-specific molecular generation, molecular docking, and molecular dynamics of protein-ligand complexes. We also cover conditional generation on physical properties, conformation generation, and fragment-based drug design. By summarizing the state-of-the-art diffusion models for 3D molecular generation, this review sheds light on their role in advancing drug discovery as well as their current limitations.

artificial intelligence, machine learning, molecule, (18 more...)

arXiv.org Artificial Intelligence

2406.08511

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.45)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Controllable Text Generation in the Instruction-Tuning Era

Ashok, Dhananjay, Poczos, Barnabas

arXiv.org Artificial IntelligenceMay-2-2024

While most research on controllable text generation has focused on steering base Language Models, the emerging instruction-tuning and prompting paradigm offers an alternate approach to controllability. We compile and release ConGenBench, a testbed of 17 different controllable generation tasks, using a subset of it to benchmark the performance of 9 different baselines and methods on Instruction-tuned Language Models. To our surprise, we find that prompting-based approaches outperform controllable text generation methods on most datasets and tasks, highlighting a need for research on controllable text generation with Instruction-tuned Language Models in specific. Prompt-based approaches match human performance on most stylistic tasks while lagging on structural tasks, foregrounding a need to study more varied constraints and more challenging stylistic tasks. To facilitate such research, we provide an algorithm that uses only a task dataset and a Large Language Model with in-context capabilities to automatically generate a constraint dataset. This method eliminates the fields dependence on pre-curated constraint datasets, hence vastly expanding the range of constraints that can be studied in the future.

constraint, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2405.0149

Country:

Asia > Middle East > Iraq (0.14)
North America > United States > Oregon (0.14)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Government (1.00)
Health & Medicine (0.96)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Task-Based MoE for Multitask Multilingual Machine Translation

Pham, Hai, Kim, Young Jin, Mukherjee, Subhabrata, Woodruff, David P., Poczos, Barnabas, Awadalla, Hany Hassan

arXiv.org Artificial IntelligenceOct-24-2023

Mixture-of-experts (MoE) architecture has been proven a powerful method for diverse tasks in training deep models in many applications. However, current MoE implementations are task agnostic, treating all tokens from different tasks in the same manner. In this work, we instead design a novel method that incorporates task information into MoE models at different granular levels with shared dynamic task-based adapters. Our experiments and analysis show the advantages of our approaches over the dense and canonical MoE models on multi-task multilingual machine translations. With task-specific adapters, our models can additionally generalize to new tasks efficiently.

multitask multilingual machine translation, task-based moe

arXiv.org Artificial Intelligence

2308.15772

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.60)

Add feedback

Objective-Agnostic Enhancement of Molecule Properties via Multi-Stage VAE

Zhou, Chenghui, Poczos, Barnabas

arXiv.org Artificial IntelligenceSep-9-2023

Variational autoencoder (VAE) is a popular method for drug discovery and various architectures and pipelines have been proposed to improve its performance. However, VAE approaches are known to suffer from poor manifold recovery when the data lie on a low-dimensional manifold embedded in a higher dimensional ambient space [Dai and Wipf, 2019]. The consequences of it in drug discovery are somewhat under-explored. In this paper, we explore applying a multi-stage VAE approach, that can improve manifold recovery on a synthetic dataset, to the field of drug discovery. We experimentally evaluate our multi-stage VAE approach using the ChEMBL dataset and demonstrate its ability to improve the property statistics of generated molecules substantially from pre-existing methods without incorporating property predictors into the training pipeline. We further fine-tune our models on two curated and much smaller molecule datasets that target different proteins. Our experiments show an increase in the number of active molecules generated by the multi-stage VAE in comparison to their one-stage equivalent. For each of the two tasks, our baselines include methods that use learned property predictors to incorporate target metrics directly into the training objective and we discuss complications that arise with this methodology.

artificial intelligence, machine learning, molecule, (15 more...)

arXiv.org Artificial Intelligence

2308.13066

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Improving Molecule Properties Through 2-Stage VAE

Zhou, Chenghui, Poczos, Barnabas

arXiv.org Artificial IntelligenceDec-5-2022

Variational autoencoder (VAE) is a popular method for drug discovery and there had been a great deal of architectures and pipelines proposed to improve its performance. But the VAE model itself suffers from deficiencies such as poor manifold recovery when data lie on low-dimensional manifold embedded in higher dimensional ambient space and they manifest themselves in each applications differently. The consequences of it in drug discovery is somewhat under-explored. In this paper, we study how to improve the similarity of the data generated via VAE and the training dataset by improving manifold recovery via a 2-stage VAE where the second stage VAE is trained on the latent space of the first one. We experimentally evaluated our approach using the ChEMBL dataset as well as a polymer datasets. In both dataset, the 2-stage VAE method is able to improve the property statistics significantly from a pre-existing method.

artificial intelligence, machine learning, molecule, (13 more...)

arXiv.org Artificial Intelligence

2212.0275

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Modeling Task Effects on Meaning Representation in the Brain via Zero-Shot MEG Prediction

Toneva, Mariya, Stretcu, Otilia, Poczos, Barnabas, Wehbe, Leila, Mitchell, Tom M.

arXiv.org Artificial IntelligenceSep-17-2020

How meaning is represented in the brain is still one of the big open questions in neuroscience. Does a word (e.g., bird) always have the same representation, or does the task under which the word is processed alter its representation (answering "can you eat it?" versus "can it fly?")? The brain activity of subjects who read the same word while performing different semantic tasks has been shown to differ across tasks. However, it is still not understood how the task itself contributes to this difference. In the current work, we study Magnetoencephalography (MEG) brain recordings of participants tasked with answering questions about concrete nouns. We investigate the effect of the task (i.e. the question being asked) on the processing of the concrete noun by predicting the millisecond-resolution MEG recordings as a function of both the semantics of the noun and the task. Using this approach, we test several hypotheses about the task-stimulus interactions by comparing the zero-shot predictions made by these hypotheses for novel tasks and nouns not seen during training. We find that incorporating the task semantics significantly improves the prediction of MEG recordings, across participants. The improvement occurs 475-550ms after the participants first see the word, which corresponds to what is considered to be the ending time of semantic processing for a word. These results suggest that only the end of semantic processing of a word is task-dependent, and pose a challenge for future research to formulate new hypotheses for earlier task effects as a function of the task and stimuli.

neurology, representation, text processing, (23 more...)

arXiv.org Artificial Intelligence

2009.08424

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Nonlinear ISA with Auxiliary Variables for Learning Speech Representations

Setlur, Amrith, Poczos, Barnabas, Black, Alan W

arXiv.org Machine LearningJul-25-2020

This paper extends recent work on nonlinear Independent Component Analysis (ICA) by introducing a theoretical framework for nonlinear Independent Subspace Analysis (ISA) in the presence of auxiliary variables. Observed high dimensional acoustic features like log Mel spectrograms can be considered as surface level manifestations of nonlinear transformations over individual multivariate sources of information like speaker characteristics, phonological content etc. Under assumptions of energy based models we use the theory of nonlinear ISA to propose an algorithm that learns unsupervised speech representations whose subspaces are independent and potentially highly correlated with the original non-stationary multivariate sources. We show how nonlinear ICA with auxiliary variables can be extended to a generic identifiable model for subspaces as well while also providing sufficient conditions for the identifiability of these high dimensional subspaces. Our proposed methodology is generic and can be integrated with standard unsupervised approaches to learn speech representations with subspaces that can theoretically capture independent higher order speech signals. We evaluate the gains of our algorithm when integrated with the Autoregressive Predictive Decoding (APC) model by showing empirical results on the speaker verification and phoneme recognition tasks.

neural network, speech recognition, subspace, (17 more...)

arXiv.org Machine Learning

2007.12948

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Covariate Distribution Aware Meta-learning

Setlur, Amrith, Dingliwal, Saket, Poczos, Barnabas

arXiv.org Machine LearningJul-8-2020

Meta-learning has proven to be successful at few-shot learning across the regression, classification and reinforcement learning paradigms. Recent approaches have adopted Bayesian interpretations to improve gradient based meta-learners by quantifying the uncertainty of the post-adaptation estimates. Most of these works almost completely ignore the latent relationship between the covariate distribution (p(x)) of a task and the corresponding conditional distribution p(y|x). In this paper, we identify the need to explicitly model the meta-distribution over the task covariates in a hierarchical Bayesian framework. We begin by introducing a graphical model that explicitly leverages very few samples drawn from p(x) to better infer the posterior over the optimal parameters of the conditional distribution (p(y|x)) for each task. Based on this model we provide an inference strategy and a corresponding meta-algorithm that explicitly accounts for the meta-distribution over task covariates. Finally, we demonstrate the significant gains of our proposed algorithm on a synthetic regression dataset.

artificial intelligence, bayesian inference, covariate distribution, (13 more...)

arXiv.org Machine Learning

2007.02523

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

Add feedback

Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels

Du, Simon S., Hou, Kangcheng, Salakhutdinov, Russ R., Poczos, Barnabas, Wang, Ruosong, Xu, Keyulu

Neural Information Processing SystemsMar-18-2020, 22:48:06 GMT

While graph kernels (GKs) are easy to train and enjoy provable theoretical guarantees, their practical performances are limited by their expressive power, as the kernel function often depends on hand-crafted combinatorial features of graphs. Compared to graph kernels, graph neural networks (GNNs) usually achieve better practical performance, as GNNs use multi-layer architectures and non-linear activation functions to extract high-order information of graphs as features. However, due to the large number of hyper-parameters and the non-convex nature of the training procedure, GNNs are harder to train. Theoretical guarantees of GNNs are also not well-understood. Furthermore, the expressive power of GNNs scales with the number of parameters, and thus it is hard to exploit the full power of GNNs when computing resources are limited.

artificial intelligence, kernel, neural network, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.99)

Add feedback

Neural Architecture Search with Bayesian Optimisation and Optimal Transport

Kandasamy, Kirthevasan, Neiswanger, Willie, Schneider, Jeff, Poczos, Barnabas, Xing, Eric P.

Neural Information Processing SystemsFeb-14-2020, 09:28:39 GMT

Bayesian Optimisation (BO) refers to a class of methods for global optimisation of a function f which is only accessible via point evaluations. It is typically used in settings where f is expensive to evaluate. A common use case for BO in machine learning is model selection, where it is not possible to analytically model the generalisation performance of a statistical model, and we resort to noisy and expensive training and validation procedures to choose the best model. Conventional BO methods have focused on Euclidean and categorical domains, which, in the context of model selection, only permits tuning scalar hyper-parameters of machine learning algorithms. However, with the surge of interest in deep learning, there is an increasing demand to tune neural network architectures.

artificial intelligence, deep learning, neural network, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback