Goto

Collaborating Authors

 Generative AI


The shine began to wear off AI in 2024 as advances slowed down

New Scientist

Did artificial intelligence begin to plateau in 2024, following a boom year in 2023? It depends on who you ask. Some say AI models that were released this year and can apparently reason more effectively than their predecessors show that the lofty goal of artificial general intelligence (AGI) is still on track, but not everyone is convinced. Certainly, tech firms have continued to talk up the hype. When it launched GPT-4 in 2023, OpenAI boasted that the model had "human-level" performance on professional testsโ€ฆ


Fox News AI Newsletter: AI app helps you turn anything into LEGO models

FOX News

BUILD LEGO CREATIONS: This innovative app is here to make custom Lego creation fun and accessible for everyone, whether you're a seasoned builder or just getting started. By using advanced artificial intelligence and mobile scanning technology, Brick My World opens up a world of creative possibilities. 'OUR HOLIDAY GIFT': OpenAI released its text-to-video artificial intelligence model, Sora, this week after the completion of its testing phase. The OpenAI logo is being displayed on a smartphone with the Sora text-to-video generator visible in the background in this photo illustration, taken in Brussels, Belgium, on February 16, 2024. GRANNY FIGHTS BACK: Daisy is an artificial intelligence-powered grandma developed by Virgin Media O2 to interact with scammers.


Google's new Project Astra could be generative AI's killer app

MIT Technology Review

MIT Technology Review got to try out Astra in a closed-door live demo last week. It was a stunning experience, but there's a gulf between polished promo and live demo. Astra uses Gemini 2.0's built-in agent framework to answer questions and carry out tasks via text, speech, image, and video, calling up existing Google apps like Search, Maps, and Lens when it needs to. "It's merging together some of the most powerful information retrieval systems of our time," says Bibo Xu, product manager for Astra. Gemini 2.0 and Astra are joined by Mariner, a new agent built on top of Gemini that can browse the web for you; Jules, a new Gemini-powered coding assistant; and Gemini for Games, an experimental assistant that you can chat to and ask for tips as you play video games.


Harvard Is Releasing a Massive Free AI Training Dataset Funded by OpenAI and Microsoft

WIRED

Harvard University announced Thursday it's releasing a high-quality dataset of nearly one million public-domain books that could be used by anyone to train large language models and other AI tools. The dataset was created by Harvard's newly formed Institutional Data Initiative with funding from both Microsoft and OpenAI. Around five times the size of the notorious Books3 dataset that was used to train AI models like Meta's Llama, the Institutional Data Initiative's database spans genres, decades, and languages, with classics from Shakespeare, Charles Dickens, and Dante included alongside obscure Czech math textbooks and Welsh pocket dictionaries. Greg Leppert, executive director of the Institutional Data Initiative, says the project is an attempt to "level the playing field" by giving the general public, including small players in the AI industry and individual researchers, access to the sort of highly-refined and curated content repositories that normally only established tech giants have the resources to assemble. "It's gone through rigorous review," he says. Leppert believes the new public domain database could be used in conjunction with other licensed materials to build artificial intelligence models.


Generative AI Is My Research and Writing Partner. Should I Disclose It?

WIRED

"If I use an AI tool for research or to help me create something, should I cite it in my completed work as a source? How do you properly give attribution to AI tools when you use them?" The straightforward answer is that if you're using generative AI for research purposes, disclosure is probably not necessary. Yet, attribution is probably required if you use ChatGPT or another AI tool for composition. Anytime you're feeling ethically conflicted about disclosing your engagement with AI software, here are two guiding questions I think you should ask yourself: Did I utilize AI for research or composition?


Generative Semantic Communication: Architectures, Technologies, and Applications

arXiv.org Artificial Intelligence

This paper delves into the applications of generative artificial intelligence (GAI) in semantic communication (SemCom) and presents a thorough study. Three popular SemCom systems enabled by classical GAI models are first introduced, including variational autoencoders, generative adversarial networks, and diffusion models. For each system, the fundamental concept of the GAI model, the corresponding SemCom architecture, and the associated literature review of recent efforts are elucidated. Then, a novel generative SemCom system is proposed by incorporating the cutting-edge GAI technology-large language models (LLMs). This system features two LLM-based AI agents at both the transmitter and receiver, serving as "brains" to enable powerful information understanding and content regeneration capabilities, respectively. This innovative design allows the receiver to directly generate the desired content, instead of recovering the bit stream, based on the coded semantic information conveyed by the transmitter. Therefore, it shifts the communication mindset from "information recovery" to "information regeneration" and thus ushers in a new era of generative SemCom. A case study on point-to-point video retrieval is presented to demonstrate the superiority of the proposed generative SemCom system, showcasing a 99.98% reduction in communication overhead and a 53% improvement in retrieval accuracy compared to the traditional communication system. Furthermore, four typical application scenarios for generative SemCom are delineated, followed by a discussion of three open issues warranting future investigation. In a nutshell, this paper provides a holistic set of guidelines for applying GAI in SemCom, paving the way for the efficient implementation of generative SemCom in future wireless networks.


LatentSpeech: Latent Diffusion for Text-To-Speech Generation

arXiv.org Artificial Intelligence

Diffusion-based Generative AI gains significant attention for its superior performance over other generative techniques like Generative Adversarial Networks and Variational Autoencoders. While it has achieved notable advancements in fields such as computer vision and natural language processing, their application in speech generation remains under-explored. Mainstream Text-to-Speech systems primarily map outputs to Mel-Spectrograms in the spectral space, leading to high computational loads due to the sparsity of MelSpecs. To address these limitations, we propose LatentSpeech, a novel TTS generation approach utilizing latent diffusion models. By using latent embeddings as the intermediate representation, LatentSpeech reduces the target dimension to 5% of what is required for MelSpecs, simplifying the processing for the TTS encoder and vocoder and enabling efficient high-quality speech generation. This study marks the first integration of latent diffusion models in TTS, enhancing the accuracy and naturalness of generated speech. Experimental results on benchmark datasets demonstrate that LatentSpeech achieves a 25% improvement in Word Error Rate and a 24% improvement in Mel Cepstral Distortion compared to existing models, with further improvements rising to 49.5% and 26%, respectively, with additional training data. These findings highlight the potential of LatentSpeech to advance the state-of-the-art in TTS technology


GEE-OPs: An Operator Knowledge Base for Geospatial Code Generation on the Google Earth Engine Platform Powered by Large Language Models

arXiv.org Artificial Intelligence

As the scale and complexity of spatiotemporal data continue to grow rapidly, the use of geospatial modeling on the Google Earth Engine (GEE) platform presents dual challenges: improving the coding efficiency of domain experts and enhancing the coding capabilities of interdisciplinary users. To address these challenges and improve the performance of large language models (LLMs) in geospatial code generation tasks, we propose a framework for building a geospatial operator knowledge base tailored to the GEE JavaScript API. This framework consists of an operator syntax knowledge table, an operator relationship frequency table, an operator frequent pattern knowledge table, and an operator relationship chain knowledge table. By leveraging Abstract Syntax Tree (AST) techniques and frequent itemset mining, we systematically extract operator knowledge from 185,236 real GEE scripts and syntax documentation, forming a structured knowledge base. Experimental results demonstrate that the framework achieves over 90% accuracy, recall, and F1 score in operator knowledge extraction. When integrated with the Retrieval-Augmented Generation (RAG) strategy for LLM-based geospatial code generation tasks, the knowledge base improves performance by 20-30%. Ablation studies further quantify the necessity of each knowledge table in the knowledge base construction. This work provides robust support for the advancement and application of geospatial code modeling techniques, offering an innovative approach to constructing domain-specific knowledge bases that enhance the code generation capabilities of LLMs, and fostering the deeper integration of generative AI technologies within the field of geoinformatics.


From Noise to Nuance: Advances in Deep Generative Image Models

arXiv.org Artificial Intelligence

Deep learning-based image generation has undergone a paradigm shift since 2021, marked by fundamental architectural breakthroughs and computational innovations. Through reviewing architectural innovations and empirical results, this paper analyzes the transition from traditional generative methods to advanced architectures, with focus on compute-efficient diffusion models and vision transformer architectures. We examine how recent developments in Stable Diffusion, DALL-E, and consistency models have redefined the capabilities and performance boundaries of image synthesis, while addressing persistent challenges in efficiency and quality. Our analysis focuses on the evolution of latent space representations, cross-attention mechanisms, and parameter-efficient training methodologies that enable accelerated inference under resource constraints. While more efficient training methods enable faster inference, advanced control mechanisms like ControlNet and regional attention systems have simultaneously improved generation precision and content customization. We investigate how enhanced multi-modal understanding and zero-shot generation capabilities are reshaping practical applications across industries. Our analysis demonstrates that despite remarkable advances in generation quality and computational efficiency, critical challenges remain in developing resource-conscious architectures and interpretable generation systems for industrial applications. The paper concludes by mapping promising research directions, including neural architecture optimization and explainable generation frameworks.


Competition and Diversity in Generative AI

arXiv.org Artificial Intelligence

A growing body of literature on generative artificial intelligence reveals a surprisingly consistent stylized fact: when people use generative AI tools, the set of content they produce tends to be more homogeneous than content produced by more traditional means [4, 22, 49, 56, 67, 69, 84, 106, 108]. Across a wide range of domains including peer review [56], writing [67], digital art [108], and survey responses [106], access to generative AI tools (GAITs) leads to less diverse outcomes. Researchers refer to this phenomenon--where the use of similar or identical underlying AI tools lead to convergence in outcomes--as algorithmic monoculture [50] or homogenization [12]. Much of the empirical literature on the subject treats homogenization itself as the primary object of study, seeking to quantify and deeply understand it. Here, we begin our analysis further downstream. We ask: What are the consequences of monoculture in generation? When homogenization has negative consequences, how should we expect content producers to behave in response?