Goto

Collaborating Authors

 generative


Event Stream GPT: A Data Pre-processing and Modeling Library for Generative, Pre-trained Transformers over Continuous-time Sequences of Complex Events

Neural Information Processing Systems

Generative, pre-trained transformers (GPTs, a type of Foundation Models) have reshaped natural language processing (NLP) through their versatility in diverse downstream tasks. However, their potential extends far beyond NLP. This paper provides a software utility to help realize this potential, extending the applicability of GPTs to continuous-time sequences of complex events with internal dependencies, such as medical record datasets. Despite their potential, the adoption of foundation models in these domains has been hampered by the lack of suitable tools for model construction and evaluation. To bridge this gap, we introduce Event Stream GPT (ESGPT), an open-source library designed to streamline the end-to-end process for building GPTs for continuous-time event sequences. ESGPT allows users to (1) build flexible, foundation-model scale input datasets by specifying only a minimal configuration file, (2) leverage a Hugging Face compatible modeling API for GPTs over this modality that incorporates intra-event causal dependency structures and autoregressive generation capabilities, and (3) evaluate models via standardized processes that can assess few and even zero-shot performance of pre-trained models on user-specified fine-tuning tasks.


No AI Without PI! Object-Centric Process Mining as the Enabler for Generative, Predictive, and Prescriptive Artificial Intelligence

arXiv.org Artificial Intelligence

The uptake of Artificial Intelligence (AI) impacts the way we work, interact, do business, and conduct research. However, organizations struggle to apply AI successfully in industrial settings where the focus is on end-to-end operational processes. Here, we consider generative, predictive, and prescriptive AI and elaborate on the challenges of diagnosing and improving such processes. We show that AI needs to be grounded using Object-Centric Process Mining (OCPM). Process-related data are structured and organization-specific and, unlike text, processes are often highly dynamic. OCPM is the missing link connecting data and processes and enables different forms of AI. We use the term Process Intelligence (PI) to refer to the amalgamation of process-centric data-driven techniques able to deal with a variety of object and event types, enabling AI in an organizational context. This paper explains why AI requires PI to improve operational processes and highlights opportunities for successfully combining OCPM and generative, predictive, and prescriptive AI.


Persuasion Should be Double-Blind: A Multi-Domain Dialogue Dataset With Faithfulness Based on Causal Theory of Mind

arXiv.org Artificial Intelligence

Persuasive dialogue plays a pivotal role in human communication, influencing various domains. Recent persuasive dialogue datasets often fail to align with real-world interpersonal interactions, leading to unfaithful representations. For instance, unrealistic scenarios may arise, such as when the persuadee explicitly instructs the persuader on which persuasion strategies to employ, with each of the persuadee's questions corresponding to a specific strategy for the persuader to follow. This issue can be attributed to a violation of the "Double Blind" condition, where critical information is fully shared between participants. In actual human interactions, however, key information such as the mental state of the persuadee and the persuasion strategies of the persuader is not directly accessible. The persuader must infer the persuadee's mental state using Theory of Mind capabilities and construct arguments that align with the persuadee's motivations. To address this gap, we introduce ToMMA, a novel multi-agent framework for dialogue generation that is guided by causal Theory of Mind. This framework ensures that information remains undisclosed between agents, preserving "double-blind" conditions, while causal ToM directs the persuader's reasoning, enhancing alignment with human-like persuasion dynamics. Consequently, we present CToMPersu, a multi-domain, multi-turn persuasive dialogue dataset that tackles both double-blind and logical coherence issues, demonstrating superior performance across multiple metrics and achieving better alignment with real human dialogues. Our dataset and prompts are available at https://github.com/DingyiZhang/ToMMA-CToMPersu .


Event Stream GPT: A Data Pre-processing and Modeling Library for Generative, Pre-trained Transformers over Continuous-time Sequences of Complex Events

Neural Information Processing Systems

Generative, pre-trained transformers (GPTs, a type of "Foundation Models") have reshaped natural language processing (NLP) through their versatility in diverse downstream tasks. However, their potential extends far beyond NLP. This paper provides a software utility to help realize this potential, extending the applicability of GPTs to continuous-time sequences of complex events with internal dependencies, such as medical record datasets. Despite their potential, the adoption of foundation models in these domains has been hampered by the lack of suitable tools for model construction and evaluation. To bridge this gap, we introduce Event Stream GPT (ESGPT), an open-source library designed to streamline the end-to-end process for building GPTs for continuous-time event sequences. ESGPT allows users to (1) build flexible, foundation-model scale input datasets by specifying only a minimal configuration file, (2) leverage a Hugging Face compatible modeling API for GPTs over this modality that incorporates intra-event causal dependency structures and autoregressive generation capabilities, and (3) evaluate models via standardized processes that can assess few and even zero-shot performance of pre-trained models on user-specified fine-tuning tasks.


Improving the Security of Smartwatch Payment with Deep Learning

arXiv.org Artificial Intelligence

Making contactless payments using a smartwatch is increasingly popular, but this payment medium lacks traditional biometric security measures such as facial or fingerprint recognition. In 2022, Sturgess et al. proposed WatchAuth, a system for authenticating smartwatch payments using the physical gesture of reaching towards a payment terminal. While effective, the system requires the user to undergo a burdensome enrolment period to achieve acceptable error levels. In this dissertation, we explore whether applications of deep learning can reduce the number of gestures a user must provide to enrol into an authentication system for smartwatch payment. We firstly construct a deep-learned authentication system that outperforms the current state-of-the-art, including in a scenario where the target user has provided a limited number of gestures. We then develop a regularised autoencoder model for generating synthetic user-specific gestures. We show that using these gestures in training improves classification ability for an authentication system. Through this technique we can reduce the number of gestures required to enrol a user into a WatchAuth-like system without negatively impacting its error rates.


Generative A.I. Can Add $4.4 Trillion in Value to Global Economy, Study Says

NYT > Economy

McKinsey's report is one of the few so far to quantify the long-term impact of generative A.I. on the economy. The report arrives as Silicon Valley has been gripped by a fervor over generative A.I. tools like ChatGPT and Google's Bard, with tech companies and venture capitalists investing billions of dollars in the technology. The tools -- some of which can also generate images and video, and carry on a conversation -- have started a debate over how they will affect jobs and the world economy. Some experts have predicted that the A.I. will displace people from their work, while others have said the tools can augment individual productivity. Last week, Goldman Sachs released a report warning that A.I. could lead to worker disruption and that some companies would benefit more from the technology than others.


Generative A.I. and the New Medical Generalist

#artificialintelligence

In the journal Nature today, my colleagues and I published an article on the future directions of generative A.I. (aka Large Language or Foundation models) for the practice of medicine. These new AI models have generated a multitude of new and exciting opportunities in healthcare that we didn't have before, along with many challenges and liabilities. I'll briefly explain how we got here and what's in store. Back in 2017, Google researchers published a paper ("Attention Is All You Need") describing a new model architecture, which they dubbed Transformer, that could give different levels of attention for multiple modes of input, and go faster, to ultimately replace recurrent and convolutional deep neural networks (RNN and CNN, respectively). Foreshadowing the future to Generative AI, they concluded: "We plan to extend the Transformer to problems involving input and output modalities other than text and to investigate local, restricted attention mechanisms to efficiently handle large inputs and outputs such as images, audio and video."


AI tool: The Future of Filmmaking. Generative A.I No Code

#artificialintelligence

I've got some news for you, dear readers. It's time to prepare for the next big leap in filmmaking technology. And it comes in the form of an easy-to-use and free AI tool that will revolutionize the way we make films forever! This new tool promises to be an easy-to-use, free, and powerful platform that can automate your filmmaking process, thereby saving you valuable time and effort. We're talking about the latest in machine learning and artificial intelligence that's about to shake up the film industry as we know it. No more will we have to laboriously hand-craft every single visual effect or animation in our videos.


Generative A.I. doesn't much impress Noam Chomsky

#artificialintelligence

But just how smart are these large language models? On the last day of the conference, I interviewed legendary linguist Noam Chomsky, now 93 years old, and Gary Marcus, an emeritus professor of cognitive science at New York University who has spent much of the past decade highlighting the limits of deep learning. Both were distinctly unimpressed with today's cutting edge A.I. Chomsky's big disappointment is that these large language models don't tell us anything at all about how the human brain works. Chomsky has devoted much of his life to advancing the theory that there is a universal grammar, or at least a set of structural concepts, that underpin all human languages, and that this grammar is somehow hard-wired into the brain. Chomsky thinks this explains why human infants can master language so easily--whereas today's computer systems need to be fed what Chomsky rightly calls "astronomical amounts of data" and even then still don't actually understand language at all.


A Coming-Out Party for Generative A.I., Silicon Valley's New Craze

#artificialintelligence

In Silicon Valley, crypto and the metaverse are out. That much became clear Monday night at the San Francisco Exploratorium, where Stability AI, the start-up behind the popular Stable Diffusion image-generating algorithm, gave a party that felt a lot like a return to prepandemic exuberance. The event -- which lured tech luminaries including the Google co-founder Sergey Brin, the AngelList founder Naval Ravikant and the venture capitalist Ron Conway out of their Zoom rooms -- was billed as a launch party for Stability AI and a celebration of the company's recent $101 million fund-raising round, which reportedly valued the company at $1 billion. But it doubled as a coming-out bash for the entire field of generative A.I. -- the wonky umbrella term for A.I. that doesn't just analyze existing data but creates new text, images, videos, code snippets and more. It's been a banner year, in particular, for generative A.I. apps that turn text prompts into images -- which, unlike NFTs or virtual reality metaverses, actually have the numbers to justify the hype they've received.