Goto

Collaborating Authors

 Generative AI


Cloudflare is taking a stand against AI website scrapers

Engadget

Cloudflare has released a new free tool that prevents AI companies' bots from scraping its clients' websites for content to train large language models. The cloud service provider is making this tool available to its entire customer base, including those on free plans. "This feature will automatically be updated over time as we see new fingerprints of offending bots we identify as widely scraping the web for model training," the company said. In a blog post announcing this update, Cloudflare's team also shared some data about how its clients are responding to the boom of bots that scrape content to train generative AI models. According to the company's internal data, 85.2 percent of customers have chosen to block even the AI bots that properly identify themselves from accessing their sites.


These Weird Pics of Donald Trump Have a Much Darker Backstory

Slate

Who is the president of the United States? But one popular artificial intelligence app thinks otherwise. While A.I. models have been known to hallucinate (i.e., make stuff up), they typically don't mess up extremely simple things like name the president. If you ask OpenAI's ChatGPT who the U.S. president is, it'll give you the correct answer, a short bio, and links to the White House website and Biden's Wikipedia article. The same is true for Anthropic's Claude and Meta AI, while Google's Gemini straight-up refuses to answer the question because it's related to politics.


Artists criticize Apple's lack of transparency around Apple Intelligence data

Engadget

Later this year, millions of Apple devices will begin running Apple Intelligence, Cupertino's take on generative AI that, among other things, lets people create images from text prompts. But some members of the creative community are unhappy about what they say is the company's lack of transparency around the raw information powering the AI model that makes this possible. "I wish Apple would have explained to the public in a more transparent way how they collected their training data," Jon Lam, a video games artist and a creators' rights activist based in Vancouver, told Engadget. "I think their announcement could not have come at a worse time." Creatives have historically been some of the most loyal customers of Apple, a company whose founder famously positioned it at the "intersection of technology and liberal arts."


HEMM: Holistic Evaluation of Multimodal Foundation Models

arXiv.org Artificial Intelligence

Multimodal foundation models that can holistically process text alongside images, video, audio, and other sensory modalities are increasingly used in a variety of real-world applications. However, it is challenging to characterize and study progress in multimodal foundation models, given the range of possible modeling decisions, tasks, and domains. In this paper, we introduce Holistic Evaluation of Multimodal Models (HEMM) to systematically evaluate the capabilities of multimodal foundation models across a set of 3 dimensions: basic skills, information flow, and real-world use cases. Basic multimodal skills are internal abilities required to solve problems, such as learning interactions across modalities, fine-grained alignment, multi-step reasoning, and the ability to handle external knowledge. Information flow studies how multimodal content changes during a task through querying, translation, editing, and fusion. Use cases span domain-specific challenges introduced in real-world multimedia, affective computing, natural sciences, healthcare, and human-computer interaction applications. Through comprehensive experiments across the 30 tasks in HEMM, we (1) identify key dataset dimensions (e.g., basic skills, information flows, and use cases) that pose challenges to today's models, and (2) distill performance trends regarding how different modeling dimensions (e.g., scale, pre-training data, multimodal alignment, pre-training, and instruction tuning objectives) influence performance. Our conclusions regarding challenging multimodal interactions, use cases, and tasks requiring reasoning and external knowledge, the benefits of data and model scale, and the impacts of instruction tuning yield actionable insights for future work in multimodal foundation models.


Collaborative Quest Completion with LLM-driven Non-Player Characters in Minecraft

arXiv.org Artificial Intelligence

The use of generative AI in video game development is on the rise, and as the conversational and other capabilities of large language models continue to improve, we expect LLM-driven non-player characters (NPCs) to become widely deployed. In this paper, we seek to understand how human players collaborate with LLM-driven NPCs to accomplish in-game goals. We design a minigame within Minecraft where a player works with two GPT4-driven NPCs to complete a quest. We perform a user study in which 28 Minecraft players play this minigame and share their feedback. On analyzing the game logs and recordings, we find that several patterns of collaborative behavior emerge from the NPCs and the human players. We also report on the current limitations of language-only models that do not have rich game-state or visual understanding. We believe that this preliminary study and analysis will inform future game developers on how to better exploit these rapidly improving generative AI models for collaborative roles in games.


Synthetic data: How could it be used for infectious disease research?

arXiv.org Artificial Intelligence

Over the last three to five years, it has become possible to generate machine learning synthetic data for healthcare-related uses. However, concerns have been raised about potential negative factors associated with the possibilities of artificial dataset generation. These include the potential misuse of generative artificial intelligence (AI) in fields such as cybercrime, the use of deepfakes and fake news to deceive or manipulate, and displacement of human jobs across various market sectors. Here, we consider both current and future positive advances and possibilities with synthetic datasets. Synthetic data offers significant benefits, particularly in data privacy, research, in balancing datasets and reducing bias in machine learning models. Generative AI is an artificial intelligence genre capable of creating text, images, video or other data using generative models. The recent explosion of interest in GenAI was heralded by the invention and speedy move to use of large language models (LLM). These computational models are able to achieve general-purpose language generation and other natural language processing tasks and are based on transformer architectures, which made an evolutionary leap from previous neural network architectures. Fuelled by the advent of improved GenAI techniques and wide scale usage, this is surely the time to consider how synthetic data can be used to advance infectious disease research. In this commentary we aim to create an overview of the current and future position of synthetic data in infectious disease research.


Exploring LGBTQ+ Bias in Generative AI Answers across Different Country and Religious Contexts

arXiv.org Artificial Intelligence

Previous discussions have highlighted the need for generative AI tools to become more culturally sensitive, yet often neglect the complexities of handling content about minorities, who are perceived differently across cultures and religions. Our study examined how two generative AI systems respond to homophobic statements with varying cultural and religious context information. Findings showed ChatGPT 3.5's replies exhibited cultural relativism, in contrast to Bard's, which stressed human rights and provided more support for LGBTQ+ issues. Both demonstrated significant change in responses based on contextual information provided in the prompts, suggesting that AI systems may adjust in their responses the degree and forms of support for LGBTQ+ people according to information they receive about the user's background. The study contributes to understanding the social and ethical implications of AI responses and argues that any work to make generative AI outputs more culturally diverse requires a grounding in fundamental human rights.


How Microsoft and Nvidia bet correctly to leapfrog Apple

BBC News

Speaking to the FT, Citi's Stuart Kaiser said that while AI remained a big theme in the world of stocks and shares, "just saying AI 15 times isn't going to cut it anymore". In addition, there is increased awareness of current generative AI products not exactly living up to their own hype. And early AI-enabled physical devices like the Rabbit R1 and Humane Pin have received bad reviews. "We're seeing the market around generative AI mature a little right now – early experiments set a lot of grand expectations, but when the rubber hit the road there were too many unexpected outcomes," says Chris Weston, chief digital and information officer of the tech service firm Jumar. "Businesses have a lot of value tied up in goodwill – the trust and comfort that their clients have in their services. Introducing ungovernable chatbots is a step too far for many right now." Tech analyst Paolo Pescatore agrees that the pressure is on for AI firms to deliver on their promises.


AI companies are finally being forced to cough up for training data

MIT Technology Review

AI companies have pillaged the internet for training data, and many websites and data set owners have started restricting the ability to scrape their websites. We've also seen a backlash against the AI sector's practice of indiscriminately scraping online data, in the form of users opting out of making their data available for training and lawsuits from artists, writers, and the New York Times, claiming that AI companies have taken their intellectual property without consent or compensation. My colleague James O'Donnell dissects the lawsuits in his story and points out that these lawsuits could determine the future of AI music. But this moment also sets an interesting precedent for all of generative AI development. Thanks to the scarcity of high-quality data and the immense pressure and demand to build even bigger and better models, we're in a rare moment where data owners actually have some leverage.


Crossroads of Continents: Automated Artifact Extraction for Cultural Adaptation with Large Multimodal Models

arXiv.org Artificial Intelligence

In this work, we present a comprehensive three-phase study to examine (1) the effectiveness of large multimodal models (LMMs) in recognizing cultural contexts; (2) the accuracy of their representations of diverse cultures; and (3) their ability to adapt content across cultural boundaries. We first introduce Dalle Street, a large-scale dataset generated by DALL-E 3 and validated by humans, containing 9,935 images of 67 countries and 10 concept classes. We reveal disparities in cultural understanding at the sub-region level with both open-weight (LLaVA) and closed-source (GPT-4V) models on Dalle Street and other existing benchmarks. Next, we assess models' deeper culture understanding by an artifact extraction task and identify over 18,000 artifacts associated with different countries. Finally, we propose a highly composable pipeline, CultureAdapt, to adapt images from culture to culture. Our findings reveal a nuanced picture of the cultural competence of LMMs, highlighting the need to develop culture-aware systems. Dataset and code are available at https://github.com/iamshnoo/crossroads