Goto

Collaborating Authors

 Generative AI


On logic and generative AI

arXiv.org Artificial Intelligence

This article was originally written for the June 2024 issue of the Bulletin of European Association for Theoretical Computer Science, in the framework of the "Logic in Computer Science" column administered by Yuri Gurevich. In the following pages, the article is reproduced as is. The ongoing AI revolution raises many foundational problems. For quite a while, I felt that the issue needs to be addressed in this column. Not being an AI expert, I was looking for volunteers. This didn't work, and so one day I took a deep breath and started to write an article myself. Andreas Blass, my long-time collaborator, was reluctant to join me, but eventually he agreed. A hundred years ago, logic was almost synonymous with foundational studies. I tried to rekindle that tradition in [5]. The goal of the following dialog is to provoke young logicians with a taste for foundations to notice the foundational problems raised by the ongoing AI revolution. I think the most beautiful thing about deep learning is that it actually works. Q: I just learned that Daniel Kahneman, Nobel laureate in economics and the author of "Thinking, fast and slow" [7], passed away on March 27, 2024. I heard a lot about this book but have never read it.


The Ability of Large Language Models to Evaluate Constraint-satisfaction in Agent Responses to Open-ended Requests

arXiv.org Artificial Intelligence

Generative AI agents are often expected to respond to complex user requests that have No One Right Answer (NORA), e.g., "design a vegetarian meal plan below 1800 calories". Such requests may entail a set of constraints that the agent should adhere to. To successfully develop agents for NORA scenarios, an accurate automatic evaluation framework is essential, and specifically - one capable of validating the satisfaction of constraints in the agent's response. Recently, large language models (LLMs) have been adopted as versatile evaluators for many NORA tasks, but their ability to evaluate constraint-satisfaction in generated text remains unclear. To study this, we develop and release a novel Arithmetic Constraint-Satisfaction (ACS) benchmarking dataset. The dataset consists of complex user requests with corresponding constraints, agent responses and human labels indicating each constraint's satisfaction level in the response. A unique property of this dataset is that validating many of its constraints requires reviewing the response as a whole (in contrast to many other benchmarks that require the validation of a single independent item). Moreover, it assesses LLMs in performing reasoning, in-context data extraction, arithmetic calculations, and counting. We then benchmark both open and proprietary LLMs on evaluating constraint-satisfaction, and show that most models still have a significant headroom for improvement, and that errors primarily stem from reasoning issues. In addition, most models exhibit a skewed constraint-satisfaction prediction pattern, with higher accuracy where the ground-truth label is "satisfied". Lastly, few-shot prompting for our task proved to be rather challenging, since many of the studied models showed a degradation in performance when it was introduced.


OpenAI staffers reportedly 'taken aback' by 'ominous' logo rebranding

Engadget

OpenAI could undergo massive changes next year, which include getting a brand new logo. According to Fortune, though, staff members were less than enthused when they got a sneak peek of its supposed new logo at a recent company-wide meeting. The company's hexagonal flower symbol, which has become pretty recognizable thanks to ChatGPT's popularity, is gone. Instead, it's replaced by a large black "O" or a simple ring or circle that staffers reportedly found to be devoid of creativity -- ominous, even. Based on how the publication's sources described it, the new logo sounds like the complete opposite of OpenAI's current one, which was designed to represent "precision, potential and optimism."


The use of GPT-4o and Other Large Language Models for the Improvement and Design of Self-Assessment Scales for Measurement of Interpersonal Communication Skills

arXiv.org Artificial Intelligence

OpenAI's ChatGPT (GPT-4 and GPT-4o) and other Large Language Models (LLMs) like Microsoft's Copilot, Google's Gemini 1.5 Pro, and Antrophic's Claude 3.5 Sonnet can be effectively used in various phases of scientific research. Their performance in diverse verbal tasks and reasoning is close to or above the average human level and rapidly increasing, providing those models with a capacity that resembles a relatively high level of theory of mind. The current ability of LLMs to process information about human psychology and communication creates an opportunity for their scientific use in the fields of personality psychology and interpersonal communication skills. This article illustrates the possible uses of GPT-4o and other advanced LLMs for typical tasks in designing self-assessment scales for interpersonal communication skills measurement like the selection and improvement of scale items and evaluation of content validity of scales. The potential for automated item generation and application is illustrated as well. The case study examples are accompanied by prompts for LLMs that can be useful for these purposes. Finally, a summary is provided of the potential benefits of using LLMs in the process of evaluation, design, and improvement of interpersonal communication skills self-assessment scales.


Massive AI energy demand is bringing Three Mile Island back from the dead

Popular Science

Power-hungry generative AI models are quickly making Big Tech sizable energy requirements even more demanding and forcing companies to seek out energy from unlikely places. While Meta and Google are exploring modern geothermal tech and other newer experimental energy sources, Microsoft is stepping back in time. This week, the company signed a 20-year-deal to source energy from the storied Three Mile Island nuclear facility in Pennsylvania, a site once known for the worst reactor accident in US history. If successful, the effort would breathe life back into the iconic symbol of US nuclear power and potentially provide Microsoft with around 800 megawatts of clean-burning energy to help satiate its growing energy appetite. "This agreement is a major milestone in Microsoft's efforts to help decarbonize the grid in support of our commitment to become carbon negative," Microsoft VP of Energy Bobby Hollis, said in a statement.


Diffusion model approach tackles aspect ratio problem in generative AI images

AIHub

The picture on the left was generated by a standard method while the picture on the right was generated by ElasticDiffusion. The prompt for both images was, "Photo of an athlete cat explaining its latest scandal at a press conference to journalists." Generative artificial intelligence (AI) has notoriously struggled to create consistent images, often getting details like fingers and facial symmetry wrong. Moreover, these models can completely fail when prompted to generate images at different image sizes and resolutions. Rice University computer scientists' new method of generating images with pre-trained diffusion models a class of generative AI models that "learn" by adding layer after layer of random noise to the images they are trained on and then generate new images by removing the added noise could help correct such issues.


OpenAI unveils new ChatGPT that can reason through math and science

The Japan Times

The computer code they generate is often buggy and incomplete. From time to time, they even make stuff up. On Thursday, OpenAI unveiled a new version of ChatGPT that could alleviate these flaws. The company said the chatbot, underpinned by new artificial intelligence technology called OpenAI o1, could "reason" through tasks involving math, coding and science. "With previous models like ChatGPT, you ask them a question and they immediately start responding," said Jakub Pachocki, OpenAI's chief scientist.


SpaceBlender: Creating Context-Rich Collaborative Spaces Through Generative 3D Scene Blending

arXiv.org Artificial Intelligence

There is increased interest in using generative AI to create 3D spaces for Virtual Reality (VR) applications. However, today's models produce artificial environments, falling short of supporting collaborative tasks that benefit from incorporating the user's physical context. To generate environments that support VR telepresence, we introduce SpaceBlender, a novel pipeline that utilizes generative AI techniques to blend users' physical surroundings into unified virtual spaces. This pipeline transforms user-provided 2D images into context-rich 3D environments through an iterative process consisting of depth estimation, mesh alignment, and diffusion-based space completion guided by geometric priors and adaptive text prompts. In a preliminary within-subjects study, where 20 participants performed a collaborative VR affinity diagramming task in pairs, we compared SpaceBlender with a generic virtual environment and a state-of-the-art scene generation framework, evaluating its ability to create virtual spaces suitable for collaboration. Participants appreciated the enhanced familiarity and context provided by SpaceBlender but also noted complexities in the generative environments that could detract from task focus. Drawing on participant feedback, we propose directions for improving the pipeline and discuss the value and design of blended spaces for different scenarios.


Generative AI Carries Non-Democratic Biases and Stereotypes: Representation of Women, Black Individuals, Age Groups, and People with Disability in AI-Generated Images across Occupations

arXiv.org Artificial Intelligence

AI governance and ethics in AI development have become critical concerns, prompting active discussions among tech companies, governments, and researchers about the potential risks AI poses to our democracies. This short essay aims to highlight one such risk: how generative AI includes or excludes equity-deserving groups in its outputs. The findings reveal that generative AI is not equitably inclusive regarding gender, race, age, and visible disability. Mutual Impacts: Technology and Democracy Technology is a human creation and, as such, inherently reflects our values, prejudices, and biases. Additionally, it plays a crucial role in shaping societal norms and social contracts.


Multi-Modality Conditioned Variational U-Net for Field-of-View Extension in Brain Diffusion MRI

arXiv.org Artificial Intelligence

An incomplete field-of-view (FOV) in diffusion magnetic resonance imaging (dMRI) can severely hinder the volumetric and bundle analyses of whole-brain white matter connectivity. Although existing works have investigated imputing the missing regions using deep generative models, it remains unclear how to specifically utilize additional information from paired multi-modality data and whether this can enhance the imputation quality and be useful for downstream tractography. To fill this gap, we propose a novel framework for imputing dMRI scans in the incomplete part of the FOV by integrating the learned diffusion features in the acquired part of the FOV to the complete brain anatomical structure. We hypothesize that by this design the proposed framework can enhance the imputation performance of the dMRI scans and therefore be useful for repairing whole-brain tractography in corrupted dMRI scans with incomplete FOV. We tested our framework on two cohorts from different sites with a total of 96 subjects and compared it with a baseline imputation method that treats the information from T1w and dMRI scans equally. The proposed framework achieved significant improvements in imputation performance, as demonstrated by angular correlation coefficient (p < 1E-5), and in downstream tractography accuracy, as demonstrated by Dice score (p < 0.01). Results suggest that the proposed framework improved imputation performance in dMRI scans by specifically utilizing additional information from paired multi-modality data, compared with the baseline method. The imputation achieved by the proposed framework enhances whole brain tractography, and therefore reduces the uncertainty when analyzing bundles associated with neurodegenerative.