Goto

Collaborating Authors

 progen


ProGen: Revisiting Probabilistic Spatial-Temporal Time Series Forecasting from a Continuous Generative Perspective Using Stochastic Differential Equations

arXiv.org Machine Learning

Accurate forecasting of spatiotemporal data remains challenging due to complex spatial dependencies and temporal dynamics. The inherent uncertainty and variability in such data often render deterministic models insufficient, prompting a shift towards probabilistic approaches, where diffusion-based generative models have emerged as effective solutions. In this paper, we present ProGen, a novel framework for probabilistic spatiotemporal time series forecasting that leverages Stochastic Differential Equations (SDEs) and diffusion-based generative modeling techniques in the continuous domain. By integrating a novel denoising score model, graph neural networks, and a tailored SDE, ProGen provides a robust solution that effectively captures spatiotemporal dependencies while managing uncertainty. Our extensive experiments on four benchmark traffic datasets demonstrate that ProGen outperforms state-of-the-art deterministic and probabilistic models. This work contributes a continuous, diffusion-based generative approach to spatiotemporal forecasting, paving the way for future research in probabilistic modeling and stochastic processes.


GOLD: Generalized Knowledge Distillation via Out-of-Distribution-Guided Language Data Generation

arXiv.org Artificial Intelligence

Knowledge distillation from LLMs is essential for the efficient deployment of language models. Prior works have proposed data generation using LLMs for preparing distilled models. We argue that generating data with LLMs is prone to sampling mainly from the center of original content distribution. This limitation hinders the distilled model from learning the true underlying data distribution and to forget the tails of the distributions (samples with lower probability). To this end, we propose GOLD, a task-agnostic data generation and knowledge distillation framework, which employs an iterative out-of-distribution-guided feedback mechanism for the LLM. As a result, the generated data improves the generalizability of distilled models. An energy-based OOD evaluation approach is also introduced to deal with noisy generated data. Our extensive experiments on 10 different classification and sequence-to-sequence tasks in NLP show that GOLD respectively outperforms prior arts and the LLM with an average improvement of 5% and 14%. We will also show that the proposed method is applicable to less explored and novel tasks. The code is available.


AI Has Successfully Imitated Human Evolution--and Might Do It Even Better

#artificialintelligence

Artificial intelligence is a master of imitation. Every time scientists design an AI--whether to mimic human language or master a game like chess--it either matches or far exceeds the capabilities of its biological creators. Now, AI has proven that it can even master the art of biology itself. Researchers at the University of California-San Francisco, the University of California-Berkeley, and Salesforce Research, a science arm of the SF-based software company, developed an AI capable of copying evolution itself. This doesn't mean the AI created some sort of evolutionary superior superhuman (yet), but instead, the AI designed sequences of 20 amino acids that make up proteins.


AI Has Successfully demonstrated Human Evolution - BLOCKGENI

#artificialintelligence

An AI that can mimic evolution itself was created by researchers at the Universities of California, San Francisco, Berkeley, and Salesforce Research, the science division of the software firm based in San Francisco. This doesn't mean the AI produced a kind of superhuman evolutionarily superior, however; rather, it constructed the protein-building sequences of 20 amino acids. Some of the sequences performed equally well when compared to those produced by millions of years of evolution, which is nature's workmanship. It's interesting that researchers didn't create an AI from scratch but rather modified a language model from a different subject. The "sentences" of biological proteins, which are essentially a language of amino acids, were the focus of the study, which made use of Salesforce's ProGen natural language processing capabilities.


This Week's Awesome Tech Stories From Around the Web (Through January 28)

#artificialintelligence

AI Has Designed Bacteria-Killing Proteins From Scratch--and They Work Karmela Padavic-Callaghan New Scientist "The AI, called ProGen, works in a similar way to AIs that can generate text. ProGen learned how to generate new proteins by learning the grammar of how amino acids combine to form 280 million existing proteins. Instead of the researchers choosing a topic for the AI to write about, they could specify a group of similar proteins for it to focus on. In this case, they chose a group of proteins with antimicrobial activity." BuzzFeed to Use ChatGPT Creator OpenAI to Help Create Quizzes and Other Content Alexandra Bruell The Wall Street Journal "BuzzFeed Inc. said it would rely on ChatGPT creator OpenAI to enhance its quizzes and personalize some content for its audiences, becoming the latest digital publisher to embrace artificial intelligence. In a memo to staff sent Thursday morning, which was reviewed by The Wall Street Journal, Chief Executive Jonah Peretti said he intends for AI to play a larger role in the company's editorial and business operations this year."


AI has designed bacteria-killing proteins from scratch โ€“ and they work

New Scientist

An AI has designed anti-microbial proteins that were then tested in real life and shown to work. The same approach could eventually be used to make new medicines. Proteins are made of chains of amino acids. The sequence of those acids determine the protein's shape and function. Ali Madani at Salesforce Research in California and his colleagues used an AI to design millions of new proteins, then created a small sample of those to test whether they worked.


ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback

arXiv.org Artificial Intelligence

Recently, dataset-generation-based zero-shot learning has shown promising results by training a task-specific model with a dataset synthesized from large pre-trained language models (PLMs). The final task-specific model often achieves compatible or even better performance than PLMs under the zero-shot setting, with orders of magnitude fewer parameters. However, synthetic datasets have their drawbacks. They have long been suffering from low-quality issues (e.g., low informativeness and redundancy). This explains why the massive synthetic data does not lead to better performance -- a scenario we would expect in the human-labeled data. To improve the quality of dataset synthesis, we propose a progressive zero-shot dataset generation framework, ProGen, which leverages the feedback from the task-specific model to guide the generation of new training data via in-context examples. Extensive experiments on five text classification datasets demonstrate the effectiveness of the proposed approach. We also show ProGen achieves on-par or superior performance with only 1\% synthetic dataset size compared to baseline methods without in-context feedback.


ProGen: Language Modeling for Protein Generation

arXiv.org Machine Learning

Generative modeling for protein engineering is key to solving fundamental problems in synthetic biology, medicine, and material science. We pose protein engineering as an unsupervised sequence generation problem in order to leverage the exponentially growing set of proteins that lack costly, structural annotations. We train a 1.2B-parameter language model, ProGen, on ~280M protein sequences conditioned on taxonomic and keyword tags such as molecular function and cellular component. This provides ProGen with an unprecedented range of evolutionary sequence diversity and allows it to generate with fine-grained control as demonstrated by metrics based on primary sequence similarity, secondary structure accuracy, and conformational energy.