AITopics

2410.12341

Country:

North America > United States > District of Columbia > Washington (0.15)
Europe > Italy > Tuscany > Pisa Province > Pisa (0.05)
Oceania > New Zealand > North Island > Wellington Region > Wellington (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Government > Space Agency (0.46)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.71)

arXiv.org Artificial IntelligenceOct-16-2024

Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization

Wang, Xingqi, Yi, Xiaoyuan, Xie, Xing, Jia, Jia

Recent advancements in diffusion models trained on large-scale data have enabled the generation of indistinguishable human-level images, yet they often produce harmful content misaligned with human values, e.g., social bias, and offensive content. Despite extensive research on Large Language Models (LLMs), the challenge of Text-to-Image (T2I) model alignment remains largely unexplored. Addressing this problem, we propose LiVO (Lightweight Value Optimization), a novel lightweight method for aligning T2I models with human values. LiVO only optimizes a plug-and-play value encoder to integrate a specified value principle with the input prompt, allowing the control of generated images over both semantics and values. Specifically, we design a diffusion model-tailored preference optimization loss, which theoretically approximates the Bradley-Terry model used in LLM alignment but provides a more flexible trade-off between image quality and value conformity. To optimize the value encoder, we also develop a framework to automatically construct a text-image preference dataset of 86k (prompt, aligned image, violating image, value principle) samples. Without updating most model parameters and through adaptive value selection from the input prompt, LiVO significantly reduces harmful outputs and achieves faster convergence, surpassing several strong baselines and taking an initial step towards ethically aligned T2I models.

large language model, machine learning, natural language, (17 more...)

doi: 10.1145/3664647.3681652

2410.127

Country:

Oceania > Australia > Victoria > Melbourne (0.05)
Asia > China > Beijing > Beijing (0.04)
North America > United States (0.04)

Genre: Research Report > New Finding (0.45)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.67)

MIT Technology ReviewOct-15-2024, 17:00:00 GMT

OpenAI says ChatGPT treats us all the same (most of the time)

Bias in AI is a huge problem. Ethicists have long studied the impact of bias when companies use AI models to screen résumés or loan applications, for example--instances of what the OpenAI researchers call third-person fairness. But the rise of chatbots, which enable individuals to interact with models directly, brings a new spin to the problem. "We wanted to study how it shows up in ChatGPT in particular," Alex Beutel, a researcher at OpenAI, told MIT Technology Review in an exclusive preview of results published today. Instead of screening a résumé you've already written, you might ask ChatGPT to write one for you, says Beutel: "If it knows my name, how does that affect the response?"

large language model, machine learning, natural language, (15 more...)

MIT Technology Review

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.88)

PCWorldOct-15-2024, 14:26:22 GMT

Adobe brings generative AI video to Premiere Pro

Adobe is now adding its AI-based video generator, Firefly, to its video editing software Premiere Pro. The Firefly model can be used to extend a video clip or generate video from still images or text instructions. This was first brought to our attention by The Verge. The Generative Extend tool will initially be available in beta and can extend the length of a video clip by up to two seconds with an image resolution of 720p or 1080p and a refresh rate of 24 frames-per-second. This tool will also be applicable to ambient sounds and sound effects, but not to music or speech.

adobe bring generative ai video, premiere, video, (4 more...)

PCWorld

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.40)

Hayagreevan, Hari, Khamaru, Souvik

Security of and by Generative AI platforms

This whitepaper highlights the dual importance of securing generative AI (genAI) platforms and leveraging genAI for cybersecurity. As genAI technologies proliferate, their misuse poses significant risks, including data breaches, model tampering, and malicious content generation. Securing these platforms is critical to protect sensitive data, ensure model integrity, and prevent adversarial attacks. Simultaneously, genAI presents opportunities for enhancing security by automating threat detection, vulnerability analysis, and incident response. The whitepaper explores strategies for robust security frameworks around genAI systems, while also showcasing how genAI can empower organizations to anticipate, detect, and mitigate sophisticated cyber threats.

machine learning, natural language, platform, (18 more...)

2410.13899

Country: Europe > United Kingdom (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.39)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Facing Identity: The Formation and Performance of Identity via Face-Based Artificial Intelligence Technologies

Santo, Wells Lucas

How is identity constructed and performed in the digital via face-based artificial intelligence technologies? While questions of identity on the textual Internet have been thoroughly explored, the Internet has progressed to a multimedia form that not only centers the visual, but specifically the face. At the same time, a wealth of scholarship has and continues to center the topics of surveillance and control through facial recognition technologies (FRTs), which have extended the logics of the racist pseudoscience of physiognomy. Much less work has been devoted to understanding how such face-based artificial intelligence technologies have influenced the formation and performance of identity. This literature review considers how such technologies interact with faciality, which entails the construction of what a face may represent or signify, along axes of identity such as race, gender, and sexuality. In grappling with recent advances in AI such as image generation and deepfakes, I propose that we are now in an era of "post-facial" technologies that build off our existing culture of facility while eschewing the analog face, complicating our relationship with identity vis-á-vis the face. Drawing from previous frameworks of identity play in the digital, as well as trans practices that have historically played with or transgressed the boundaries of identity classification, we can develop concepts adequate for analyzing digital faciality and identity given the current landscape of post-facial artificial intelligence technologies that allow users to interface with the digital in an entirely novel manner. To ground this framework of transgression, I conclude by proposing an interview study with VTubers -- online streamers who perform using motion-captured avatars instead of their real-life faces -- to gain qualitative insight on the experience and perceptions of users of post-facial technologies and how these sociotechnical experiences interface with our relationships with identity and the digital anew.

artificial intelligence, machine learning, social media, (18 more...)

2410.12148

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > China (0.04)
Asia > Japan (0.04)
(11 more...)

Genre:

Research Report (0.50)
Personal > Interview (0.46)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (0.93)
(4 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.47)

Generative AI's aggregated knowledge versus web-based curated knowledge

Selker, Ted, Wu, Yunzi

his paper explores what kinds of questions are best served by the way generative AI (GenAI) using Large Language Models(LLMs) that aggregate and package knowledge, and when traditional curated web-sourced search results serve users better. An experiment compared product searches using ChatGPT, Google search engine, or both helped us understand more about the compelling nature of generated responses. The experiment showed GenAI can speed up some explorations and decisions. We describe how search can deepen the testing of facts, logic, and context. We show where existing and emerging knowledge paradigms can help knowledge exploration in different ways. Experimenting with searches, our probes showed the value for curated web search provides for very specific, less popularly-known knowledge. GenAI excelled at bringing together knowledge for broad, relatively well-known topics. The value of curated and aggregated knowledge for different kinds of knowledge reflected in different user goals. We developed a taxonomy to distinguishing when users are best served by these two approaches.

large language model, machine learning, natural language, (21 more...)

2410.12091

Country:

North America > United States > Colorado > Boulder County > Boulder (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Pennsylvania > Northampton County > Bethlehem (0.04)
(5 more...)

Genre:

Research Report > New Finding (0.87)
Research Report > Experimental Study (0.68)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Education (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.93)
Health & Medicine > Epidemiology (0.68)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.62)

Nishal, Sachita, Lee, Eric, Diakopoulos, Nicholas

De-jargonizing Science for Journalists with GPT-4: A Pilot Study

This study offers an initial evaluation of a human-in-the-loop system leveraging GPT-4 (a large language model or LLM), and Retrieval-Augmented Generation (RAG) to identify and define jargon terms in scientific abstracts, based on readers' self-reported knowledge. The system achieves fairly high recall in identifying jargon and preserves relative differences in readers' jargon identification, suggesting personalization as a feasible use-case for LLMs to support sense-making of complex information. Surprisingly, using only abstracts for context to generate definitions yields slightly more accurate and higher quality definitions than using RAG-based context from the fulltext of an article. The findings highlight the potential of generative AI for assisting science reporters, and can inform future work on developing tools to simplify dense documents.

large language model, machine learning, natural language, (17 more...)

2410.12069

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.05)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Media > News (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)

M2Diffuser: Diffusion-based Trajectory Optimization for Mobile Manipulation in 3D Scenes

Yan, Sixu, Zhang, Zeyu, Han, Muzhi, Wang, Zaijin, Xie, Qi, Li, Zhitian, Li, Zhehan, Liu, Hangxin, Wang, Xinggang, Zhu, Song-Chun

Recent advances in diffusion models have opened new avenues for research into embodied AI agents and robotics. Despite significant achievements in complex robotic locomotion and skills, mobile manipulation-a capability that requires the coordination of navigation and manipulation-remains a challenge for generative AI techniques. This is primarily due to the high-dimensional action space, extended motion trajectories, and interactions with the surrounding environment. In this paper, we introduce M2Diffuser, a diffusion-based, scene-conditioned generative model that directly generates coordinated and efficient whole-body motion trajectories for mobile manipulation based on robot-centric 3D scans. M2Diffuser first learns trajectory-level distributions from mobile manipulation trajectories provided by an expert planner. Crucially, it incorporates an optimization module that can flexibly accommodate physical constraints and task objectives, modeled as cost and energy functions, during the inference process. This enables the reduction of physical violations and execution errors at each denoising step in a fully differentiable manner. Through benchmarking on three types of mobile manipulation tasks across over 20 scenes, we demonstrate that M2Diffuser outperforms state-of-the-art neural planners and successfully transfers the generated trajectories to a real-world robot. Our evaluations underscore the potential of generative AI to enhance the generalization of traditional planning and learning-based robotic methods, while also highlighting the critical role of enforcing physical constraints for safe and robust execution.

artificial intelligence, machine learning, natural language, (17 more...)

2410.11402

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > China > Beijing > Beijing (0.05)
Asia > China > Shanghai > Shanghai (0.04)
(5 more...)

Genre: Research Report (0.81)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.54)

Søltoft, Johan Irving, Kocksch, Laura, Munk, Anders Kristian

Synthetic Interlocutors. Experiments with Generative AI to Prolong Ethnographic Encounters

This paper introduces "Synthetic Interlocutors" for ethnographic research. Synthetic Interlocutors are chatbots ingested with ethnographic textual material (interviews and observations) by using Retrieval Augmented Generation (RAG). We integrated an open-source large language model with ethnographic data from three projects to explore two questions: Can RAG digest ethnographic material and act as ethnographic interlocutor? And, if so, can Synthetic Interlocutors prolong encounters with the field and extend our analysis? Through reflections on the process of building our Synthetic Interlocutors and an experimental collaborative workshop, we suggest that RAG can digest ethnographic materials, and it might lead to prolonged, yet uneasy ethnographic encounters that allowed us to partially recreate and re-visit fieldwork interactions while facilitating opportunities for novel analytic insights. Synthetic Interlocutors can produce collaborative, ambiguous and serendipitous moments.

large language model, machine learning, natural language, (21 more...)

2410.11395

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Denmark > North Jutland > Aalborg (0.04)
Oceania > Australia (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.95)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.50)