Goto

Collaborating Authors

 Generative AI


The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)

arXiv.org Artificial Intelligence

Large multimodal models (LMMs) extend large language models (LLMs) with multi-sensory skills, such as visual understanding, to achieve stronger generic intelligence. In this paper, we analyze the latest model, GPT-4V(ision), to deepen the understanding of LMMs. The analysis focuses on the intriguing tasks that GPT-4V can perform, containing test samples to probe the quality and genericity of GPT-4V's capabilities, its supported inputs and working modes, and the effective ways to prompt the model. In our approach to exploring GPT-4V, we curate and organize a collection of carefully designed qualitative samples spanning a variety of domains and tasks. Observations from these samples demonstrate that GPT-4V's unprecedented ability in processing arbitrarily interleaved multimodal inputs and the genericity of its capabilities together make GPT-4V a powerful multimodal generalist system. Furthermore, GPT-4V's unique capability of understanding visual markers drawn on input images can give rise to new human-computer interaction methods such as visual referring prompting. We conclude the report with in-depth discussions on the emerging application scenarios and the future research directions for GPT-4V-based systems. We hope that this preliminary exploration will inspire future research on the next-generation multimodal task formulation, new ways to exploit and enhance LMMs to solve real-world problems, and gaining better understanding of multimodal foundation models. Finally, we acknowledge that the model under our study is solely the product of OpenAI's innovative work, and they should be fully credited for its development. Please see the GPT-4V contributions paper for the authorship and credit attribution: https://cdn.openai.com/contributions/gpt-4v.pdf


MatChat: A Large Language Model and Application Service Platform for Materials Science

arXiv.org Artificial Intelligence

The prediction of chemical synthesis pathways plays a pivotal role in materials science research. Challenges, such as the complexity of synthesis pathways and the lack of comprehensive datasets, currently hinder our ability to predict these chemical processes accurately. However, recent advancements in generative artificial intelligence (GAI), including automated text generation and question-answering systems, coupled with fine-tuning techniques, have facilitated the deployment of large-scale AI models tailored to specific domains. In this study, we harness the power of the LLaMA2-7B model and enhance it through a learning process that incorporates 13,878 pieces of structured material knowledge data. This specialized AI model, named MatChat, focuses on predicting inorganic material synthesis pathways. MatChat exhibits remarkable proficiency in generating and reasoning with knowledge in materials science. Although MatChat requires further refinement to meet the diverse material design needs, this research undeniably highlights its impressive reasoning capabilities and innovative potential in the field of materials science. MatChat is now accessible online and open for use, with both the model and its application framework available as open source. This study establishes a robust foundation for collaborative innovation in the integration of generative AI in materials science.


State of the Art on Diffusion Models for Visual Computing

arXiv.org Artificial Intelligence

The field of visual computing is rapidly advancing due to the emergence of generative artificial intelligence (AI), which unlocks unprecedented capabilities for the generation, editing, and reconstruction of images, videos, and 3D scenes. In these domains, diffusion models are the generative AI architecture of choice. Within the last year alone, the literature on diffusion-based tools and applications has seen exponential growth and relevant papers are published across the computer graphics, computer vision, and AI communities with new works appearing daily on arXiv. This rapid growth of the field makes it difficult to keep up with all recent developments. The goal of this state-of-the-art report (STAR) is to introduce the basic mathematical concepts of diffusion models, implementation details and design choices of the popular Stable Diffusion model, as well as overview important aspects of these generative AI tools, including personalization, conditioning, inversion, among others. Moreover, we give a comprehensive overview of the rapidly growing literature on diffusion-based generation and editing, categorized by the type of generated medium, including 2D images, videos, 3D objects, locomotion, and 4D scenes. Finally, we discuss available datasets, metrics, open challenges, and social implications. This STAR provides an intuitive starting point to explore this exciting topic for researchers, artists, and practitioners alike.


Adobe brings more generative AI features to Express

Engadget

Few tech companies have embraced generative AI as wholeheartedly as Adobe. At Adobe Max, its annual creativity conference, it unveiled a new version of the Firefly GAI model. Not only that, the company announced more GAI features for Adobe Express, just weeks after making Firefly more broadly available in the app. Adobe Express now includes features such as Generative Fill. This enables users to add, remove or replace items, people and other aspects of images using text prompts.


Generative AI deployment: Strategies for smooth scaling

MIT Technology Review

One-quarter of respondents expect generative AI's primary effect to be a reduction in their workforce. The figure was higher in industrial sectors like energy and utilities (43%), manufacturing (34%), and transport and logistics (31%). It was lowest in IT and telecommunications (7%). Overall, this is a modest figure compared to the more dystopian job replacement scenarios in circulation. Demand for skills is increasing in technical fields that focus on operationalizing AI models and in organizational and management positions tackling thorny topics including ethics and risk. AI is democratizing technical skills across the workforce in ways that could lead to new job opportunities and increased employee satisfaction.


VerifAI: Verified Generative AI

arXiv.org Artificial Intelligence

Generative AI has made significant strides, yet concerns about the accuracy and reliability of its outputs continue to grow. Such inaccuracies can have serious consequences such as inaccurate decision-making, the spread of false information, privacy violations, legal liabilities, and more. Although efforts to address these risks are underway, including explainable AI and responsible AI practices such as transparency, privacy protection, bias mitigation, and social and environmental responsibility, misinformation caused by generative AI will remain a significant challenge. We propose that verifying the outputs of generative AI from a data management perspective is an emerging issue for generative AI. This involves analyzing the underlying data from multi-modal data lakes, including text files, tables, and knowledge graphs, and assessing its quality and consistency. By doing so, we can establish a stronger foundation for evaluating the outputs of generative AI models. Such an approach can ensure the correctness of generative AI, promote transparency, and enable decision-making with greater confidence. Our vision is to promote the development of verifiable generative AI and contribute to a more trustworthy and responsible use of AI.


Tertiary Lymphoid Structures Generation through Graph-based Diffusion

arXiv.org Artificial Intelligence

Graph-based representation approaches have been proven to be successful in the analysis of biomedical data, due to their capability of capturing intricate dependencies between biological entities, such as the spatial organization of different cell types in a tumor tissue. However, to further enhance our understanding of the underlying governing biological mechanisms, it is important to accurately capture the actual distributions of such complex data. Graph-based deep generative models are specifically tailored to accomplish that. In this work, we leverage state-of-the-art graph-based diffusion models to generate biologically meaningful cell-graphs. In particular, we show that the adopted graph diffusion model is able to accurately learn the distribution of cells in terms of their tertiary lymphoid structures (TLS) content, a well-established biomarker for evaluating the cancer progression in oncology research. Additionally, we further illustrate the utility of the learned generative models for data augmentation in a TLS classification task. To the best of our knowledge, this is the first work that leverages the power of graph diffusion models in generating meaningful biological cell structures.


The Morning After: ChatGPT creator OpenAI might start making its own AI chips

Engadget

According to Reuters, OpenAI is exploring making its own artificial intelligence chips, even looking into an acquisition. OpenAI CEO Sam Altman previously blamed GPU shortages for users' concerns regarding the company API's speed and reliability, leading to these moves. OpenAI using its own chips could reduce its costs too. Based on analysis by Bernstein Research, each ChatGPT query costs the company around four cents. At the moment, NVIDIA controls the market for chips that power AI applications. The Microsoft supercomputer OpenAI used to develop its technology, for instance, uses 10,000 NVIDIA GPUs.


G7 to draw up AI code of conduct this autumn: Kishida

The Japan Times

Prime Minister Fumio Kishida unveiled a plan on Monday to hold a video conference with Group of Seven leaders this autumn to formulate international guidelines and a code of conduct for developers of artificial intelligence (AI) tools. Kishida showed the plan in a speech at a special session of the U.N.-sponsored Internet Governance Forum in Kyoto. The theme of the guidelines and code of conduct is part of the Hiroshima AI Process, an initiative for international best practices regarding generative AI, according to the Japanese leader. Kishida also said that the Japanese government's new economic package, planned to be drawn up late this month, will include aid for the development of computational resources, used for processing huge volumes of data needed for AI development and use, and of basic computational models, as well as stepping up the introduction of AI in small businesses and the medical field. The Hiroshima AI Process, which was agreed on at the G7 summit held in Hiroshima in May, also calls for creating international guidelines by the end of the year that will also cover generative AI users.


Subsurface Characterization using Ensemble-based Approaches with Deep Generative Models

arXiv.org Machine Learning

Estimating spatially distributed properties such as hydraulic conductivity (K) from available sparse measurements is a great challenge in subsurface characterization. However, the use of inverse modeling is limited for ill-posed, high-dimensional applications due to computational costs and poor prediction accuracy with sparse datasets. In this paper, we combine Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP), a deep generative model that can accurately capture complex subsurface structure, and Ensemble Smoother with Multiple Data Assimilation (ES-MDA), an ensemble-based inversion method, for accurate and accelerated subsurface characterization. WGAN-GP is trained to generate high-dimensional K fields from a low-dimensional latent space and ES-MDA then updates the latent variables by assimilating available measurements. Several subsurface examples are used to evaluate the accuracy and efficiency of the proposed method and the main features of the unknown K fields are characterized accurately with reliable uncertainty quantification. Furthermore, the estimation performance is compared with a widely-used variational, i.e., optimization-based, inversion approach, and the proposed approach outperforms the variational inversion method, especially for the channelized and fractured field examples. We explain such superior performance by visualizing the objective function in the latent space: because of nonlinear and aggressive dimension reduction via generative modeling, the objective function surface becomes extremely complex while the ensemble approximation can smooth out the multi-modal surface during the minimization. This suggests that the ensemble-based approach works well over the variational approach when combined with deep generative models at the cost of forward model runs unless convergence-ensuring modifications are implemented in the variational inversion.