Goto

Collaborating Authors

 gptbot


The Race to Block OpenAI's Scraping Bots Is Slowing Down

WIRED

It's too soon to say how the spate of deals between AI companies and publishers will shake out. OpenAI has already scored one clear win, though: Its web crawlers aren't getting blocked by top news outlets at the rate they once were. The generative AI boom sparked a gold rush for data--and a subsequent data-protection rush (for most news websites, anyway) in which publishers sought to block AI crawlers and prevent their work from becoming training data without consent. When Apple debuted a new AI agent this summer, for example, a slew of top news outlets swiftly opted out of Apple's web scraping using the Robots Exclusion Protocol, or robots.txt, the file that allows webmasters to control bots. There are so many new AI bots on the scene that it can feel like playing whack-a-mole to keep up.


New York Times, CNN and Australia's ABC block OpenAI's GPTBot web crawler from accessing content

The Guardian > Technology

News outlets including the New York Times, CNN, Reuters and the Australian Broadcasting Corporation (ABC) have blocked a tool from OpenAI, limiting the company's ability to continue accessing their content. OpenAI is behind one of the best known artificial intelligence chatbots, ChatGPT. Its web crawler – known as GPTBot – may scan webpages to help improve its AI models. The Verge was first to report the New York Times had blocked GPTBot on its website. The Guardian subsequently found that other major news websites, including CNN, Reuters, the Chicago Tribune, the ABC and Australian Community Media (ACM) brands such as the Canberra Times and the Newcastle Herald, appear to have also disallowed the web crawler.


New York Times, CNN and ABC block OpenAI's GPTBot web crawler from accessing content

The Guardian

News outlets including the New York Times, CNN, Reuters and the Australian Broadcasting Corporation (ABC) have blocked a tool from OpenAI, limiting the company's ability to continue accessing their content. OpenAI is behind one of the best known artificial intelligence chatbots, ChatGPT. Its web crawler – known as GPTBot – may scan webpages to help improve its AI models. The Verge was first to report the New York Times had blocked GPTBot on its website. The Guardian subsequently found that other major news websites, including CNN, Reuters, the Chicago Tribune, the ABC and Australian Community Media (ACM) brands such as the Canberra Times and the Newcastle Herald, appear to have also disallowed the web crawler.


OpenAI releases webcrawler GPTBot, how to block it

FOX News

CEO says OpenAI CEO Sam Altman said language and cultural inclusivity is "very important" to his company's mission as it builds and trains powerful artificial intelligence systems. OpenAI has launched web crawler GPTBot to improve artificial intelligence models. "Web pages crawled with the GPTBot user agent may potentially be used to improve future models and are filtered to remove sources that require paywall access, are known to gather personally identifiable information (PII) or have text that violates our policies," the company said in a post on its website. "Allowing GPTBot to access your site can help AI models become more accurate and improve their general capabilities and safety," OpenAI wrote. A web crawler is a type of bot.