Goto

Collaborating Authors

 Overview


Brief Review -- Chinchilla: Training Compute-Optimal Large Language Models

#artificialintelligence

On all subsets, Chinchilla outperforms Gopher. On this benchmark, Chinchilla significantly outperforms Gopher despite being much smaller, with an average accuracy of 67.6% (improving upon Gopher by 7.6%). Chinchilla outperforms Gopher by 7.6% on average, performing better on 51/57 individual tasks, the same on 2/57, and worse on only 4/57 tasks. On RACE-h and RACE-m, Chinchilla considerably improves performance over Gopher. On LAMBADA, Chinchilla outperforms both Gopher and MT-NLG 530B.


3D GANs and Latent Space: A comprehensive survey

arXiv.org Artificial Intelligence

Generative Adversarial Networks (GANs) have emerged as a significant player in generative modeling by mapping lower-dimensional random noise to higher-dimensional spaces. These networks have been used to generate high-resolution images and 3D objects. The efficient modeling of 3D objects and human faces is crucial in the development process of 3D graphical environments such as games or simulations. 3D GANs are a new type of generative model used for 3D reconstruction, point cloud reconstruction, and 3D semantic scene completion. The choice of distribution for noise is critical as it represents the latent space. Understanding a GAN's latent space is essential for fine-tuning the generated samples, as demonstrated by the morphing of semantically meaningful parts of images. In this work, we explore the latent space and 3D GANs, examine several GAN variants and training methods to gain insights into improving 3D GAN training, and suggest potential future directions for further research.


Deep Generative Modeling with Backward Stochastic Differential Equations

arXiv.org Artificial Intelligence

This paper proposes a novel deep generative model, called BSDE-Gen, which combines the flexibility of backward stochastic differential equations (BSDEs) with the power of deep neural networks for generating high-dimensional complex target data, particularly in the field of image generation. The incorporation of stochasticity and uncertainty in the generative modeling process makes BSDE-Gen an effective and natural approach for generating high-dimensional data. The paper provides a theoretical framework for BSDE-Gen, describes its model architecture, presents the maximum mean discrepancy (MMD) loss function used for training, and reports experimental results.


Build Your Own TALK-GPT Chatbot with Python & OpenAI

#artificialintelligence

In this tutorial, we will guide you through the process of building your very own chatbot using Python and OpenAI's TALK-GPT model. With the power of TALK-GPT, you can create a chatbot that is capable of carrying out complex conversations with users. First, we will provide an overview of TALK-GPT and its capabilities. Then, we will guide you through the steps of setting up your development environment and installing the necessary Python packages. After that, we will show you how to create a basic chatbot using TALK-GPT.


Pallet Detection from Synthetic Data Using Game Engines

arXiv.org Artificial Intelligence

This research sets out to assess the viability of using game engines to generate synthetic training data for machine learning in the context of pallet segmentation. Using synthetic data has been proven in prior research to be a viable means of training neural networks and saves hours of manual labour due to the reduced need for manual image annotation. Machine vision for pallet detection can benefit from synthetic data as the industry increases the development of autonomous warehousing technologies. As per our methodology, we developed a tool capable of automatically generating large amounts of annotated training data from 3D models at pixel-perfect accuracy and a much faster rate than manual approaches. Regarding image segmentation, a Mask R-CNN pipeline was used, which achieved an AP50 of 86% for individual pallets.


On Efficient Training of Large-Scale Deep Learning Models: A Literature Review

arXiv.org Artificial Intelligence

The field of deep learning has witnessed significant progress, particularly in computer vision (CV), natural language processing (NLP), and speech. The use of large-scale models trained on vast amounts of data holds immense promise for practical applications, enhancing industrial productivity and facilitating social development. With the increasing demands on computational capacity, though numerous studies have explored the efficient training, a comprehensive summarization on acceleration techniques of training deep learning models is still much anticipated. In this survey, we present a detailed review for training acceleration. We consider the fundamental update formulation and split its basic components into five main perspectives: (1) data-centric: including dataset regularization, data sampling, and data-centric curriculum learning techniques, which can significantly reduce the computational complexity of the data samples; (2) model-centric, including acceleration of basic modules, compression training, model initialization and model-centric curriculum learning techniques, which focus on accelerating the training via reducing the calculations on parameters; (3) optimization-centric, including the selection of learning rate, the employment of large batchsize, the designs of efficient objectives, and model average techniques, which pay attention to the training policy and improving the generality for the large-scale models; (4) budgeted training, including some distinctive acceleration methods on source-constrained situations; (5) system-centric, including some efficient open-source distributed libraries/systems which provide adequate hardware support for the implementation of acceleration algorithms. By presenting this comprehensive taxonomy, our survey presents a comprehensive review to understand the general mechanisms within each component and their joint interaction.


Complex QA and language models hybrid architectures, Survey

arXiv.org Artificial Intelligence

This paper reviews the state-of-the-art of language models architectures and strategies for "complex" question-answering (QA, CQA, CPS) with a focus on hybridization. Large Language Models (LLM) are good at leveraging public data on standard problems but once you want to tackle more specific complex questions or problems (e.g. How does the concept of personal freedom vary between different cultures ? What is the best mix of power generation methods to reduce climate change ?) you may need specific architecture, knowledge, skills, methods, sensitive data protection, explainability, human approval and versatile feedback... Recent projects like ChatGPT and GALACTICA have allowed non-specialists to grasp the great potential as well as the equally strong limitations of LLM in complex QA. In this paper, we start by reviewing required skills and evaluation techniques. We integrate findings from the robust community edited research papers BIG, BLOOM and HELM which open source, benchmark and analyze limits and challenges of LLM in terms of tasks complexity and strict evaluation on accuracy (e.g. fairness, robustness, toxicity, ...) as a baseline. We discuss some challenges associated with complex QA, including domain adaptation, decomposition and efficient multi-step QA, long form and non-factoid QA, safety and multi-sensitivity data protection, multimodal search, hallucinations, explainability and truthfulness, temporal reasoning. We analyze current solutions and promising research trends, using elements such as: hybrid LLM architectural patterns, training and prompting strategies, active human reinforcement learning supervised with AI, neuro-symbolic and structured knowledge grounding, program synthesis, iterated decomposition and others.


Towards Automated Urban Planning: When Generative and ChatGPT-like AI Meets Urban Planning

arXiv.org Artificial Intelligence

The two fields of urban planning and artificial intelligence (AI) arose and developed separately. However, there is now cross-pollination and increasing interest in both fields to benefit from the advances of the other. In the present paper, we introduce the importance of urban planning from the sustainability, living, economic, disaster, and environmental perspectives. We review the fundamental concepts of urban planning and relate these concepts to crucial open problems of machine learning, including adversarial learning, generative neural networks, deep encoder-decoder networks, conversational AI, and geospatial and temporal machine learning, thereby assaying how AI can contribute to modern urban planning. Thus, a central problem is automated land-use configuration, which is formulated as the generation of land uses and building configuration for a target area from surrounding geospatial, human mobility, social media, environment, and economic activities. Finally, we delineate some implications of AI for urban planning and propose key research areas at the intersection of both topics.


Towards Automated 3D Search Planning for Emergency Response Missions

arXiv.org Artificial Intelligence

The ability to efficiently plan and execute automated and precise search missions using unmanned aerial vehicles (UAVs) during emergency response situations is imperative. Precise navigation between obstacles and time-efficient searching of 3D structures and buildings are essential for locating survivors and people in need in emergency response missions. In this work we address this challenging problem by proposing a unified search planning framework that automates the process of UAV-based search planning in 3D environments. Specifically, we propose a novel search planning framework which enables automated planning and execution of collision-free search trajectories in 3D by taking into account low-level mission constrains (e.g., the UAV dynamical and sensing model), mission objectives (e.g., the mission execution time and the UAV energy efficiency) and user-defined mission specifications (e.g., the 3D structures to be searched and minimum detection probability constraints). The capabilities and performance of the proposed approach are demonstrated through extensive simulated 3D search scenarios.


A Novel Approach to Prediction of the 3-Dimensional Structures of Protein Backbones by Neural Networks

Neural Information Processing Systems

Three-dimensional (3D) structures of protein backbones have been pre(cid:173) dicted using neural networks. A feed forward neural network was trained on a class of functionally, but not structurally, homologous proteins, us(cid:173) ing backpropagation learning. The network generated tertiary structure information in the form of binary distance constraints for the Co atoms in the protein backbone. The binary distance between two Co atoms was o if the distance between them was less than a certain threshold distance, and 1 otherwise. The distance constraints predicted by the trained neu(cid:173) ral network were utilized to generate a folded conformation of the protein backbone, using a steepest descent minimization approach.