AITopics | Web

Consent in Crisis: The Rapid Decline of the AI Data Commons, Ariel Lee

Neural Information Processing SystemsJun-1-2025, 13:17:03 GMT

General-purpose artificial intelligence (AI) systems are built on massive swathes of public web data, assembled into corpora such as C4, RefinedWeb, and Dolma. To our knowledge, we conduct the first, large-scale, longitudinal audit of the consent protocols for the web domains underlying AI training corpora. Our audit of 14, 000 web domains provides an expansive view of crawlable web data and how codified data use preferences are changing over time. We observe a proliferation of AIspecific clauses to limit use, acute differences in restrictions on AI developers, as well as general inconsistencies between websites' expressed intentions in their Terms of Service and their robots.txt. We diagnose these as symptoms of ineffective web protocols, not designed to cope with the widespread re-purposing of the internet for AI.

data mining, large language model, machine learning, (22 more...)

Neural Information Processing Systems

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Media > News (1.00)
Law (1.00)
Information Technology > Services (0.93)
(3 more...)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Web (1.00)
(7 more...)

Add feedback

c2f71567cd53464161cab3336e8fc865-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsJun-1-2025, 12:47:21 GMT

Data science and engineering workflows often span multiple stages, from warehousing to orchestration, using tools like BigQuery, dbt, and Airbyte.

data mining, large language model, machine learning, (23 more...)

Neural Information Processing Systems

Country:

Asia > China (0.14)
North America > Canada (0.14)
Europe > Belgium (0.14)

Genre: Workflow (0.67)

Industry:

Information Technology > Software (0.93)
Law (0.92)
Information Technology > Services (0.68)

Technology:

Information Technology > Software (1.00)
Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
(8 more...)

Add feedback

5 projects Perplexity's new Labs AI tool can whip up for you now - in minutes

ZDNetMay-30-2025, 13:40:45 GMT

Designing a detailed web app, dashboard, or even spreadsheet might take you hours to complete. What if someone or something could do the same work in just a few minutes? In a blog post published Thursday, Perplexity explained how Labs can create anything from reports to spreadsheets to dashboards to simple web apps. The new feature is accessible only to Pro subscribers, who pay 20 per month (though there are a couple of ways to score the plan for free). This new capability is available on Perplexity's website and in its iOS and Android apps. The company has also promised its imminent arrival in its Windows and Mac apps.

artificial intelligence, chatbot, natural language, (14 more...)

ZDNet

Technology:

Information Technology > Software (0.95)
Information Technology > Communications > Web (0.73)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.30)

Add feedback

Data-Dependent Bounds for Online Portfolio Selection Without Lipschitzness and Smoothness Chung-En Tsai Department of Computer Science and Information Engineering National Taiwan University

Neural Information Processing SystemsMay-25-2025, 11:19:04 GMT

This work introduces the first small-loss and gradual-variation regret bounds for online portfolio selection, marking the first instances of data-dependent bounds for online convex optimization with non-Lipschitz, non-smooth losses. The algorithms we propose exhibit sublinear regret rates in the worst cases and achieve logarithmic regrets when the data is "easy," with per-round time almost linear in the number of investment alternatives. The regret bounds are derived using novel smoothness characterizations of the logarithmic loss, a local norm-based analysis of following the regularized leader (FTRL) with self-concordant regularizers, which are not necessarily barriers, and an implicit variant of optimistic FTRL with the log-barrier.

artificial intelligence, lemma 4, machine learning, (17 more...)

Neural Information Processing Systems

Country: Asia > Taiwan (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Communications > Web > Semantic Web (0.40)

Add feedback

617ff5271b2b41dfb217a3b0f1b3d1be-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsMay-24-2025, 02:33:39 GMT

machine learning, natural language, question answering, (19 more...)

Neural Information Processing Systems

Country: Oceania > New Zealand (0.14)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

5950bf290a1570ea401bf98882128160-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsMay-23-2025, 19:48:23 GMT

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States > New York (0.14)
North America > United States > Illinois (0.14)
(2 more...)

Genre:

Research Report (0.67)
Workflow (0.46)

Industry:

Leisure & Entertainment (0.93)
Government (0.67)
Information Technology (0.67)
(2 more...)

Technology:

Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.99)
(3 more...)

Add feedback

This Google Chrome update could change the fundamentals of browsing - here's who gets to try it first

ZDNetMay-23-2025, 11:57:49 GMT

Google's Chrome browser for MacOS and Windows is receiving an infusion of new Gemini-powered capabilities, including an AI browsing assistant contextually sensitized to a user's browsing activities. Google made the announcement this week at Google I/O 2025. Dubbed Gemini-in-Chrome, the feature will be available May 21 to Google AI Pro and Google AI Ultra subscribers in the US as well as Chrome Beta, Dev, and Canary users. The general idea behind Gemini-in-Chrome is to reorganize, aggregate, and then more sensibly redisplay the data found on one or more browser tabs while also embellishing the final output with additional but relevant Gemini-generated information. For example, during a pre-event press briefing attended by ZDNET, Google director of Chrome product management Charmaine D'Silva demonstrated how Gemini-in-Chrome could not only organize a head-to-head feature comparison chart of individual sleeping bags -- to which multiple Chrome tabs (one tab per sleeping bag) were pointing -- but could respond to text prompts about each bag's suitability to the expected temperatures for an upcoming camping trip in Maine.

artificial intelligence, gemini-in-chrome, google, (14 more...)

ZDNet

Country: North America > United States > Maine (0.25)

Industry:

Information Technology > Software (0.41)
Information Technology > Services (0.37)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Web (0.53)

Add feedback

Google's 'AI Mode' search is ready to replace a list of links

PCWorldMay-1-2025, 18:10:48 GMT

Google said Thursday that it has begun migrating its "AI Mode" out of its experimental Labs effort and into the real world. Google said that a "small percentage of people" in the "coming weeks" will see what Google calls AI Mode, or entirely AI-generated responses to queries that users ask. It's Google's response to services like Anthropic, which "answer" queries using AI, which slurps up and regurgitates answers that others have already provided. Google first began revamping its search algorithm in 2023, when it started aggregating AI-powered summaries of say, the best laptops. AI has been used elsewhere by Google services like Chrome to sum up web pages, as well.

ai mode, artificial intelligence, google, (2 more...)

PCWorld

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Web (0.39)

Add feedback

Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset

Neural Information Processing SystemsMar-27-2025, 14:33:21 GMT

Emerging ethical approaches have attempted to filter pretraining material, but such approaches have been ad hoc and failed to take context into account. We offer an approach to filtering grounded in law, which has directly addressed the tradeoffs in filtering material. First, we gather and make available the Pile of Law, a 256GB (and growing) dataset of open-source English-language legal and administrative data, covering court opinions, contracts, administrative rules, and legislative records. Pretraining on the Pile of Law may help with legal tasks that have the promise to improve access to justice. Second, we distill the legal norms that governments have developed to constrain the inclusion of toxic or private content into actionable lessons for researchers and discuss how our dataset reflects these norms. Third, we show how the Pile of Law offers researchers the opportunity to learn such filtering rules directly from the data, providing an exciting new research direction in model-based processing. Warning: this paper contains quotations that may be offensive or upsetting.

data mining, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Europe (1.00)
Asia (0.92)
North America > Canada (0.68)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.67)

Industry:

Law > Litigation (1.00)
Law > Government & the Courts (1.00)
Law > Criminal Law (1.00)
(8 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(6 more...)

Add feedback

Consent in Crisis: The Rapid Decline of the AI Data Commons, Ariel Lee

Neural Information Processing SystemsMar-27-2025, 06:58:04 GMT

General-purpose artificial intelligence (AI) systems are built on massive swathes of public web data, assembled into corpora such as C4, RefinedWeb, and Dolma. To our knowledge, we conduct the first, large-scale, longitudinal audit of the consent protocols for the web domains underlying AI training corpora. Our audit of 14, 000 web domains provides an expansive view of crawlable web data and how codified data use preferences are changing over time. We observe a proliferation of AIspecific clauses to limit use, acute differences in restrictions on AI developers, as well as general inconsistencies between websites' expressed intentions in their Terms of Service and their robots.txt. We diagnose these as symptoms of ineffective web protocols, not designed to cope with the widespread re-purposing of the internet for AI.

data mining, large language model, machine learning, (23 more...)

Neural Information Processing Systems

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (1.00)

Industry: