synthesis process
CCVS: Context-aware Controllable Video Synthesis
This presentation introduces a self-supervised learning approach to the synthesis of new videos clips from old ones, with several new key elements for improved spatial resolution and realism: It conditions the synthesis process on contextual information for temporal continuity and ancillary information for fine control. The prediction model is doubly autoregressive, in the latent space of an autoencoder for forecasting, and in image space for updating contextual information, which is also used to enforce spatio-temporal consistency through a learnable optical flow module. Adversarial training of the autoencoder in the appearance and temporal domains is used to further improve the realism of its output. A quantizer inserted between the encoder and the transformer in charge of forecasting future frames in latent space (and its inverse inserted between the transformer and the decoder) adds even more flexibility by affording simple mechanisms for handling multimodal ancillary information for controlling the synthesis process (e.g., a few sample frames, an audio track, a trajectory in image space) and taking into account the intrinsically uncertain nature of the future by allowing multiple predictions. Experiments with an implementation of the proposed approach give very good qualitative and quantitative results on multiple tasks and standard benchmarks.
Alias-Free Generative Adversarial Networks
We observe that despite their hierarchical convolutional nature, the synthesis process of typical generative adversarial networks depends on absolute pixel coordinates in an unhealthy manner. This manifests itself as, e.g., detail appearing to be glued to image coordinates instead of the surfaces of depicted objects. We trace the root cause to careless signal processing that causes aliasing in the generator network. Interpreting all signals in the network as continuous, we derive generally applicable, small architectural changes that guarantee that unwanted information cannot leak into the hierarchical synthesis process. The resulting networks match the FID of StyleGAN2 but differ dramatically in their internal representations, and they are fully equivariant to translation and rotation even at subpixel scales. Our results pave the way for generative models better suited for video and animation.
Self-reflecting Large Language Models: A Hegelian Dialectical Approach
Abdali, Sara, Goksen, Can, Amizadeh, Saeed, Koishida, Kazuhito
Investigating NLP through a philosophical lens has recently caught researcher's eyes as it connects computational methods with classical schools of philosophy. This paper introduces a philosophical approach inspired by the Hegelian Dialectic for LLMs' self-reflection, utilizing a self-dialectical approach to emulate internal critiques and then synthesize new ideas by resolving the contradicting points. Moreover, this paper investigates the effect of LLMs' temperature for generation by establishing a dynamic annealing approach, which promotes the creativity in the early stages and gradually refines it by focusing on the nuances, as well as a fixed temperature strategy for generation. Our proposed approach is examined to determine its ability to generate novel ideas from an initial proposition. Additionally, a Multi Agent Majority Voting (MAMV) strategy is leveraged to assess the validity and novelty of the generated ideas, which proves beneficial in the absence of domain experts. Our experiments show promise in generating new ideas and provide a stepping stone for future research.
COOL: Efficient and Reliable Chain-Oriented Objective Logic with Neural Networks Feedback Control for Program Synthesis
Program synthesis methods, whether formal or neural-based, lack fine-grained control and flexible modularity, which limits their adaptation to complex software development. These limitations stem from rigid Domain-Specific Language (DSL) frameworks and neural network incorrect predictions. To this end, we propose the Chain of Logic (CoL), which organizes the synthesis process into an activity flow and provides heuristic control to guide the process. Furthermore, by integrating neural networks with libraries and introducing a Neural Network Feedback Control (NNFC) mechanism, our approach modularizes synthesis and mitigates the impact of neural network mispredictions. Experiments on relational and symbolic synthesis tasks show that CoL significantly enhances the efficiency and reliability of DSL program synthesis across multiple metrics. Specifically, CoL improves accuracy by 70% while reducing tree operations by 91% and time by 95%. Additionally, NNFC further boosts accuracy by 6%, with a 64% reduction in tree operations under challenging conditions such as insufficient training data, increased difficulty, and multidomain synthesis. These improvements confirm COOL as a highly efficient and reliable program synthesis framework.
CCVS: Context-aware Controllable Video Synthesis
This presentation introduces a self-supervised learning approach to the synthesis of new videos clips from old ones, with several new key elements for improved spatial resolution and realism: It conditions the synthesis process on contextual information for temporal continuity and ancillary information for fine control. The prediction model is doubly autoregressive, in the latent space of an autoencoder for forecasting, and in image space for updating contextual information, which is also used to enforce spatio-temporal consistency through a learnable optical flow module. Adversarial training of the autoencoder in the appearance and temporal domains is used to further improve the realism of its output. A quantizer inserted between the encoder and the transformer in charge of forecasting future frames in latent space (and its inverse inserted between the transformer and the decoder) adds even more flexibility by affording simple mechanisms for handling multimodal ancillary information for controlling the synthesis process (e.g., a few sample frames, an audio track, a trajectory in image space) and taking into account the intrinsically uncertain nature of the future by allowing multiple predictions. Experiments with an implementation of the proposed approach give very good qualitative and quantitative results on multiple tasks and standard benchmarks.
Alias-Free Generative Adversarial Networks
We observe that despite their hierarchical convolutional nature, the synthesis process of typical generative adversarial networks depends on absolute pixel coordinates in an unhealthy manner. This manifests itself as, e.g., detail appearing to be glued to image coordinates instead of the surfaces of depicted objects. We trace the root cause to careless signal processing that causes aliasing in the generator network. Interpreting all signals in the network as continuous, we derive generally applicable, small architectural changes that guarantee that unwanted information cannot leak into the hierarchical synthesis process. The resulting networks match the FID of StyleGAN2 but differ dramatically in their internal representations, and they are fully equivariant to translation and rotation even at subpixel scales.
An Artificial Intelligence (AI) workflow for catalyst design and optimization
Lai, Nung Siong, Tew, Yi Shen, Zhong, Xialin, Yin, Jun, Li, Jiali, Yan, Binhang, Wang, Xiaonan
In the pursuit of novel catalyst development to address pressing environmental concerns and energy demand, conventional design and optimization methods often fall short due to the complexity and vastness of the catalyst parameter space. The advent of Machine Learning (ML) has ushered in a new era in the field of catalyst optimization, offering potential solutions to the shortcomings of traditional techniques. However, existing methods fail to effectively harness the wealth of information contained within the burgeoning body of scientific literature on catalyst synthesis. To address this gap, this study proposes an innovative Artificial Intelligence (AI) workflow that integrates Large Language Models (LLMs), Bayesian optimization, and an active learning loop to expedite and enhance catalyst optimization. Our methodology combines advanced language understanding with robust optimization strategies, effectively translating knowledge extracted from diverse literature into actionable parameters for practical experimentation and optimization. In this article, we demonstrate the application of this AI workflow in the optimization of catalyst synthesis for ammonia production. The results underscore the workflow's ability to streamline the catalyst development process, offering a swift, resource-efficient, and highprecision alternative to conventional methods. Keywords: Catalysts; Large Language Models; Active Learning; Bayesian Optimization; Ammonia Synthesis 1. Introduction The development of novel catalysts to address increasing energy demand and consumption has become an urgent task in the realm of renewable energy This surge is driven not only by escalating demands from applications in process optimization, yield improvement, and energy saving but also by a heightened awareness and concern for environmental issues, particularly the increase in carbon dioxide emissions. Several optimization strategies are conventionally employed to identify the optimal set of condition parameters, thereby enhancing the performance of the catalyst. The'One Factor At a Time' (OFAT) method is frequently employed as an alternative technique for chemical process optimization and comprehension While these conventional optimization methods and their advancements have undeniably made significant contributions to the field, certain gaps persist that limit their full potential in optimizing catalyst synthesis. The predominant reliance on the empirical knowledge and intuition of seasoned chemists, while invaluable, is not systematically scalable and transferable. Techniques like OFAT and DoE, though statistically rigorous, are often unable to keep pace with the sheer complexity and vastness of the catalyst parameter space, leaving much of it unexplored and underutilized.
A knowledge-driven AutoML architecture
Cofaru, Corneliu, Loeckx, Johan
Automated machine learning (AutoML) gathered a significant amount of attention in recent years as a way of automating some of the typical workflows in machine learning (ML) and data science more broadly. For a comprehensive and systematic view on the subject, there is an already growing number of survey works that cover the state-of-the-art Hutter et al. (2019); Yao et al. (2018); Elshawi et al. (2019); Zöller and Huber (2021); Truong et al. (2019); He et al. (2021); Hospedales et al. (2020); Vanschoren (2018Santu"); Karmaker Santu"Santu". Currently, it is becoming apparent that the size of the potential problem space, required solution sophistication, transparency and legal constraints Roscher et al. (2020); Drozdal et al. (2020); Rudin et al. (2021); Veale and Borgesius (2021); Smuha et al. (2021) render AutoML a problem extremely difficult to define and solve either holistically or agnostically.
DiffAR: Denoising Diffusion Autoregressive Model for Raw Speech Waveform Generation
Benita, Roi, Elad, Michael, Keshet, Joseph
Diffusion models have recently been shown to be relevant for high-quality speech generation. Most work has been focused on generating spectrograms, and as such, they further require a subsequent model to convert the spectrogram to a waveform (i.e., a vocoder). This work proposes a diffusion probabilistic end-to-end model for generating a raw speech waveform. The proposed model is autoregressive, generating overlapping frames sequentially, where each frame is conditioned on a portion of the previously generated one. Hence, our model can effectively synthesize an unlimited speech duration while preserving high-fidelity synthesis and temporal coherence. We implemented the proposed model for unconditional and conditional speech generation, where the latter can be driven by an input sequence of phonemes, amplitudes, and pitch values. Working on the waveform directly has some empirical advantages. Specifically, it allows the creation of local acoustic behaviors, like vocal fry, which makes the overall waveform sounds more natural. Furthermore, the proposed diffusion model is stochastic and not deterministic; therefore, each inference generates a slightly different waveform variation, enabling abundance of valid realizations. Experiments show that the proposed model generates speech with superior quality compared with other state-of-the-art neural speech generation systems.
Siddhi Vinayak Pandey on LinkedIn: </> InsightML - 01: Material Synthesis Process
The first process involves synthesizing materials without machine learning. In this process, ideas such as designing experimental setups, determining the chemical composition, and listing the measurement conditions are initially planned. The material is then synthesized, and its characterization is performed in the next step. If the material does not meet the required expectations, the synthesis process is repeated by changing methods such as the chemical composition or varying the measurement conditions until the desired results are achieved. On the other hand, the second technique, which is the'Machine Learning Assisted Synthesis Process', works differently.