Generative AI
Octopus v4: Graph of language models
Language models have been effective in a wide range of applications, yet the most sophisticated models are often proprietary. For example, GPT-4 by OpenAI and various models by Anthropic are expensive and consume substantial energy. In contrast, the open-source community has produced competitive models, like Llama3. Furthermore, niche-specific smaller language models, such as those tailored for legal, medical or financial tasks, have outperformed their proprietary counterparts. This paper introduces a novel approach that employs \textit{functional tokens} to integrate \textbf{multiple open-source models}, each optimized for particular tasks. Our newly developed Octopus v4 model leverages \textit{functional tokens} to intelligently direct user queries to the most appropriate vertical model and reformat the query to achieve the best performance. Octopus v4, an evolution of the Octopus v1, v2, and v3 models, excels in selection and parameter understanding and reformatting. Additionally, we explore the use of graph as a versatile data structure that effectively coordinates multiple open-source models by harnessing the capabilities of the Octopus model and \textit{functional tokens}. Use our open-sourced GitHub (\url{https://www.nexa4ai.com/}) to try Octopus v4 models (\url{https://huggingface.co/NexaAIDev/Octopus-v4}), and contrite to a larger graph of language models. By activating models less than 10B parameters, we achieved SOTA MMLU score of 74.8 among the same level models.
A University Framework for the Responsible use of Generative AI in Research
Smith, Shannon, Tate, Melissa, Freeman, Keri, Walsh, Anne, Ballsun-Stanton, Brian, Hooper, Mark, Lane, Murray
Generative Artificial Intelligence (generative AI) poses both opportunities and risks for the integrity of research. Universities must guide researchers in using generative AI responsibly, and in navigating a complex regulatory landscape subject to rapid change. By drawing on the experiences of two Australian universities, we propose a framework to help institutions promote and facilitate the responsible use of generative AI. We provide guidance to help distil the diverse regulatory environment into a principles-based position statement. Further, we explain how a position statement can then serve as a foundation for initiatives in training, communications, infrastructure, and process change. Despite the growing body of literature about AI's impact on academic integrity for undergraduate students, there has been comparatively little attention on the impacts of generative AI for research integrity, and the vital role of institutions in helping to address those challenges. This paper underscores the urgency for research institutions to take action in this area and suggests a practical and adaptable framework for so doing.
The Emerging AI Divide in the United States
Daepp, Madeleine I. G., Counts, Scott
The digital divide describes disparities in access to and usage of digital tooling between social and economic groups. Emerging generative artificial intelligence tools, which strongly affect productivity, could magnify the impact of these divides. However, the affordability, multi-modality, and multilingual capabilities of these tools could also make them more accessible to diverse users in comparison with previous forms of digital tooling. In this study, we characterize spatial differences in U.S. residents' knowledge of a new generative AI tool, ChatGPT, through an analysis of state- and county-level search query data. In the first six months after the tool's release, we observe the highest rates of users searching for ChatGPT in West Coast states and persistently low rates of search in Appalachian and Gulf states. Counties with the highest rates of search are relatively more urbanized and have proportionally more educated, more economically advantaged, and more Asian residents in comparison with other counties or with the U.S. average. In multilevel models adjusting for socioeconomic and demographic factors as well as industry makeup, education is the strongest positive predictor of rates of search for generative AI tooling. Although generative AI technologies may be novel, early differences in uptake appear to be following familiar paths of digital marginalization.
Leveraging Active Subspaces to Capture Epistemic Model Uncertainty in Deep Generative Models for Molecular Design
Abeer, A N M Nafiz, Jantre, Sanket, Urban, Nathan M, Yoon, Byung-Jun
Deep generative models have been accelerating the inverse design process in material and drug design. Unlike their counterpart property predictors in typical molecular design frameworks, generative molecular design models have seen fewer efforts on uncertainty quantification (UQ) due to computational challenges in Bayesian inference posed by their large number of parameters. In this work, we focus on the junction-tree variational autoencoder (JT-VAE), a popular model for generative molecular design, and address this issue by leveraging the low dimensional active subspace to capture the uncertainty in the model parameters. Specifically, we approximate the posterior distribution over the active subspace parameters to estimate the epistemic model uncertainty in an extremely high dimensional parameter space. The proposed UQ scheme does not require alteration of the model architecture, making it readily applicable to any pre-trained model. Our experiments demonstrate the efficacy of the AS-based UQ and its potential impact on molecular optimization by exploring the model diversity under epistemic uncertainty.
OpenAI will train its AI models on the Financial Times' journalism
The Financial Times has become the latest news organization to strike a deal with OpenAI. In a joint announcement on Monday, the Financial Times and OpenAI said that maker of ChatGPT will use the Financial Times' journalism to train its AI models and collaborate on developing new AI products and features for the publication's readers. "It is right, of course, that AI platforms pay publishers for the use of their material," said Financial Times CEO John Ridding in a statement and added that the Times is "committed to human journalism." Neither company disclosed the financial terms of the agreement. Earlier this year, The Information reported that OpenAI offers publishers between 1 million and 5 million a year to license their content to train its AI models.
OpenAI hit with another privacy complaint over ChatGPT's love of making stuff up
OpenAI has been hit with a privacy complaint in Austria by an advocacy group called NOYB, which stands for None Of Your Business. The complaint alleges that the company's ChatGPT bot repeatedly provided incorrect information about a real individual (who for privacy reasons is not named in the complaint), as reported by Reuters. This may breach EU privacy rules. The chatbot allegedly spat out incorrect birthdate information for the individual, instead of just saying it didn't know the answer to the query. Like politicians, AI chatbots like to confidently make stuff up and hope we don't notice.
OpenAI to use FT journalism to train artificial intelligence systems
The Financial Times has struck a deal with ChatGPT developer OpenAI that allows its content to be used in training artificial intelligence systems. The FT will receive an undisclosed payment as part of the deal, which is the latest to be agreed between OpenAI and news publishers. Under the arrangement, ChatGPT users will receive summaries and quotes from FT journalism, as well as links to articles, in responses to prompts, where appropriate. John Ridding, the chief executive of the FT Group, said it was "right" that AI companies paid publishers for their material. The New York Times is suing OpenAI and its largest investor, Microsoft, over use of its content to train large language models, the technology that underpins chatbots like ChatGPT.
ProMoAI: Process Modeling with Generative AI
Kourani, Humam, Berti, Alessandro, Schuster, Daniel, van der Aalst, Wil M. P.
ProMoAI is a novel tool that leverages Large Language Models (LLMs) to automatically generate process models from textual descriptions, incorporating advanced prompt engineering, error handling, and code generation techniques. Beyond automating the generation of complex process models, ProMoAI also supports process model optimization. Users can interact with the tool by providing feedback on the generated model, which is then used for refining the process model. ProMoAI utilizes the capabilities LLMs to offer a novel, AI-driven approach to process modeling, significantly reducing the barrier to entry for users without deep technical knowledge in process modeling.
Foundations of Multisensory Artificial Intelligence
Building multisensory AI systems that learn from multiple sensory inputs such as text, speech, video, real-world sensors, wearable devices, and medical data holds great promise for impact in many scientific areas with practical benefits, such as in supporting human health and well-being, enabling multimedia content processing, and enhancing real-world autonomous agents. By synthesizing a range of theoretical frameworks and application domains, this thesis aims to advance the machine learning foundations of multisensory AI. In the first part, we present a theoretical framework formalizing how modalities interact with each other to give rise to new information for a task. These interactions are the basic building blocks in all multimodal problems, and their quantification enables users to understand their multimodal datasets, design principled approaches to learn these interactions, and analyze whether their model has succeeded in learning. In the second part, we study the design of practical multimodal foundation models that generalize over many modalities and tasks, which presents a step toward grounding large language models to real-world sensory modalities. We introduce MultiBench, a unified large-scale benchmark across a wide range of modalities, tasks, and research areas, followed by the cross-modal attention and multimodal transformer architectures that now underpin many of today's multimodal foundation models. Scaling these architectures on MultiBench enables the creation of general-purpose multisensory AI systems, and we discuss our collaborative efforts in applying these models for real-world impact in affective computing, mental health, cancer prognosis, and robotics. Finally, we conclude this thesis by discussing how future work can leverage these ideas toward more general, interactive, and safe multisensory AI.
SIDBench: A Python Framework for Reliably Assessing Synthetic Image Detection Methods
Schinas, Manos, Papadopoulos, Symeon
The generative AI technology offers an increasing variety of tools for generating entirely synthetic images that are increasingly indistinguishable from real ones. Unlike methods that alter portions of an image, the creation of completely synthetic images presents a unique challenge and several Synthetic Image Detection (SID) methods have recently appeared to tackle it. Yet, there is often a large gap between experimental results on benchmark datasets and the performance of methods in the wild. To better address the evaluation needs of SID and help close this gap, this paper introduces a benchmarking framework that integrates several state-of-the-art SID models. Our selection of integrated models was based on the utilization of varied input features, and different network architectures, aiming to encompass a broad spectrum of techniques. The framework leverages recent datasets with a diverse set of generative models, high level of photo-realism and resolution, reflecting the rapid improvements in image synthesis technology. Additionally, the framework enables the study of how image transformations, common in assets shared online, such as JPEG compression, affect detection performance. SIDBench is available on https://github.com/mever-team/sidbench and is designed in a modular manner to enable easy inclusion of new datasets and SID models.