Generative AI
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
Li, Peike, Chen, Boyu, Yao, Yao, Wang, Yikai, Wang, Allen, Wang, Alex
Music generation has attracted growing interest with the advancement of deep generative models. However, generating music conditioned on textual descriptions, known as text-to-music, remains challenging due to the complexity of musical structures and high sampling rate requirements. This paper introduces JEN-1, a universal high-fidelity model for text-to-music generation. JEN-1 is a diffusion model incorporating both autoregressive and non-autoregressive training. Through incontext learning, JEN-1 performs various generation tasks including text-guided music generation, music inpainting, and continuation. Evaluations demonstrate JEN-1's superior performance over state-of-the-art methods in text-music alignment and music quality while maintaining computational efficiency. Our demos are available at https://www.futureverse.com/research/jen/ "Music is the universal language of mankind." - Henry Wadsworth Longfellow Music, as an artistic expression comprising harmony, melody and rhythm, holds great cultural significance and appeal to humans. Recent years have witnessed remarkable progress in music generation with the rise of deep generative models (Liu et al., 2023; Kreuk et al., 2022; Agostinelli et al., 2023).
Google says AI systems should be able to mine publishers' work unless companies opt out
Publishers should be able to opt out of having their works mined by generative artificial intelligence systems, according to Google, but the company has not said how such a system would work. The call for a fair use exception for AI systems is a view the company has expressed to the Australian government in the past, but the notion of an opt-out option for publishers is a new argument from Google. When asked how such a system would work, a spokesperson pointed to a recent blog post by Google where the company said it wanted a discussion around creating a community-developed web standard similar to the robots.txt Google's comments come as news companies such as News Corp have already reportedly been initiating conversations with AI companies about payment for scraping news articles. Toby Murray, associate professor at the University of Melbourne's computing and information systems school, said Google's proposal would put the onus on content creators to specify whether AI systems could absorb their content or not, but he indicated existing licensing schemes such as Creative Commons already allowed creators to mark how their works can be used.
OpenAI releases webcrawler GPTBot, how to block it
CEO says OpenAI CEO Sam Altman said language and cultural inclusivity is "very important" to his company's mission as it builds and trains powerful artificial intelligence systems. OpenAI has launched web crawler GPTBot to improve artificial intelligence models. "Web pages crawled with the GPTBot user agent may potentially be used to improve future models and are filtered to remove sources that require paywall access, are known to gather personally identifiable information (PII) or have text that violates our policies," the company said in a post on its website. "Allowing GPTBot to access your site can help AI models become more accurate and improve their general capabilities and safety," OpenAI wrote. A web crawler is a type of bot.
The 'red team' race to make AI go rogue
There, top hackers from around the globe will rack up points for inducing AI models to err in various ways, with categories of challenges that include political misinformation, defamatory claims, and "algorithmic discrimination," or systemic bias. Leading AI firms such as Google, OpenAI, Anthropic and Stability have volunteered their latest chatbots and image generators to be put to the test. The competition's results will be sealed for several months afterward, organizers said, to give the companies time to address the flaws exposed in the contest before they are revealed to the world.
DiffCR: A Fast Conditional Diffusion Framework for Cloud Removal from Optical Satellite Images
Zou, Xuechao, Li, Kai, Xing, Junliang, Zhang, Yu, Wang, Shiying, Jin, Lei, Tao, Pin
Optical satellite images are a critical data source; however, cloud cover often compromises their quality, hindering image applications and analysis. Consequently, effectively removing clouds from optical satellite images has emerged as a prominent research direction. While recent advancements in cloud removal primarily rely on generative adversarial networks, which may yield suboptimal image quality, diffusion models have demonstrated remarkable success in diverse image-generation tasks, showcasing their potential in addressing this challenge. This paper presents a novel framework called DiffCR, which leverages conditional guided diffusion with deep convolutional networks for high-performance cloud removal for optical satellite imagery. Specifically, we introduce a decoupled encoder for conditional image feature extraction, providing a robust color representation to ensure the close similarity of appearance information between the conditional input and the synthesized output. Moreover, we propose a novel and efficient time and condition fusion block within the cloud removal model to accurately simulate the correspondence between the appearance in the conditional image and the target image at a low computational cost. Extensive experimental evaluations on two commonly used benchmark datasets demonstrate that DiffCR consistently achieves state-of-the-art performance on all metrics, with parameter and computational complexities amounting to only 5.1% and 5.4%, respectively, of those previous best methods. The source code, pre-trained models, and all the experimental results will be publicly available at https://github.com/XavierJiezou/DiffCR upon the paper's acceptance of this work.
Evaluating Data Attribution for Text-to-Image Models
Wang, Sheng-Yu, Efros, Alexei A., Zhu, Jun-Yan, Zhang, Richard
While large text-to-image models are able to synthesize "novel" images, these images are necessarily a reflection of the training data. The problem of data attribution in such models -- which of the images in the training set are most responsible for the appearance of a given generated image -- is a difficult yet important one. As an initial step toward this problem, we evaluate attribution through "customization" methods, which tune an existing large-scale model toward a given exemplar object or style. Our key insight is that this allows us to efficiently create synthetic images that are computationally influenced by the exemplar by construction. With our new dataset of such exemplar-influenced images, we are able to evaluate various data attribution algorithms and different possible feature spaces. Furthermore, by training on our dataset, we can tune standard models, such as DINO, CLIP, and ViT, toward the attribution problem. Even though the procedure is tuned towards small exemplar sets, we show generalization to larger sets. Finally, by taking into account the inherent uncertainty of the problem, we can assign soft attribution scores over a set of training images.
The AI Crackdown Is Coming
In April, lawyers for the airline Avianca noticed something strange. A passenger, Robert Mata, had sued the airline, alleging that a serving cart on a flight had struck and severely injured his left knee, but several cases cited in Mata's lawsuit didn't appear to exist. The judge couldn't verify them, either. It turned out that ChatGPT had made them all up, fabricating names and decisions. One of Mata's lawyers, Steven A. Schwartz, had used the chatbot as an assistant--his first time using the program for legal research--and, as Schwartz wrote in an affidavit, "was unaware of the possibility that its content could be false."
Zoom can use your private calls and messages to train its AI systems thanks to new terms and conditions that YOU agreed to
Private video calls, text messages and meetings on Zoom might be used to'train' artificial intelligence models. The San Jose company's new terms and conditions - which came into force in March but were spotted this month - have sparked a wave of outrage online, with users threatening to cancel their accounts over the change. In one section of the new T C's, it says that customers consent to Zoom using data for purposes such as'machine learning or artificial intelligence (including for the purposes of training and tuning of algorithms and models).' Artificial intelligence models are commonly trained with large amounts of publicly available data, often taken from the internet - but Zoom's move would use private customer data, raising privacy fears. The changes came in paragraph 10.4 of Zoom's Terms and Conditions (Zoom) Zoom has responded with a blog post this week, claiming that the data is only used to train AI models to summarize meetings more accurately, and only with customer consent. In a blog post, Zoom's Chief Product Officer Smita Hashim wrote: 'To reiterate: we do not use audio, video, or chat content for training our models without customer consent.'
Microsoft's AI Red Team Has Already Made the Case for Itself
For most people, the idea of using artificial intelligence tools in daily life--or even just messing around with them--has only become mainstream in recent months, with new releases of generative AI tools from a slew of big tech companies and startups, like OpenAI's ChatGPT and Google's Bard. But behind the scenes, the technology has been proliferating for years, along with questions about how best to evaluate and secure these new AI systems. On Monday, Microsoft is revealing details about the team within the company that since 2018 has been tasked with figuring out how to attack AI platforms to reveal their weaknesses. In the five years since its formation, Microsoft's AI red team has grown from what was essentially an experiment into a full interdisciplinary team of machine learning experts, cybersecurity researchers, and even social engineers. The group works to communicate its findings within Microsoft and across the tech industry using the traditional parlance of digital security, so the ideas will be accessible rather than requiring specialized AI knowledge that many people and organizations don't yet have.
Criminals Have Created Their Own ChatGPT Clones
Just months after OpenAI's ChatGPT chatbot upended the startup economy, cybercriminals and hackers are claiming to have created their own versions of the text-generating technology. The systems could, theoretically at least, supercharge criminals' ability to write malware or phishing emails that trick people into handing over their login information. Since the start of July, criminals posting on dark-web forums and marketplaces have been touting two large language models (LLMs) they say they've produced. The systems, which are said to mimic the functionalities of ChatGPT and Google's Bard, generate text to answer the questions or prompts users enter. But unlike the LLMs made by legitimate companies, these chatbots are marketed for illegal activities.