AITopics | data vendor

Collaborating Authors

data vendor

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

04b98fd38bd42810d0764cb6c46d10d8-Paper-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 07:19:13 GMT

data vendor, dataset, vendor, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Singapore (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(4 more...)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance (1.00)
Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

Data Distribution Valuation

Neural Information Processing SystemsOct-9-2025, 17:26:41 GMT

Data valuation is a class of techniques for quantitatively assessing the value of data for applications like pricing in data marketplaces.

data vendor, dataset, vendor, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Singapore (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(4 more...)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance (1.00)
Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

Towards Data Governance of Frontier AI Models

Hausenloy, Jason, McClements, Duncan, Thakur, Madhavendra

arXiv.org Artificial IntelligenceDec-4-2024

Data is essential to train and fine-tune today's frontier artificial intelligence (AI) models and to develop future ones. To date, academic, legal, and regulatory work has primarily addressed how data can directly harm consumers and creators, such as through privacy breaches, copyright infringements, and bias and discrimination. Our work, instead, focuses on the comparatively neglected question of how data can enable new governance capacities for frontier AI models. This approach for "frontier data governance" opens up new avenues for monitoring and mitigating risks from advanced AI models, particularly as they scale and acquire specific dangerous capabilities. Still, frontier data governance faces challenges that stem from the fundamental properties of data itself: data is non-rival, often non-excludable, easily replicable, and increasingly synthesizable. Despite these inherent difficulties, we propose a set of policy mechanisms targeting key actors along the data supply chain, including data producers, aggregators, model developers, and data vendors. We provide a brief overview of 15 governance mechanisms, of which we centrally introduce five, underexplored policy recommendations. These include developing canary tokens to detect unauthorized use for producers; (automated) data filtering to remove malicious content for pre-training and post-training datasets; mandatory dataset reporting requirements for developers and vendors; improved security for datasets and data generation algorithms; and know-your-customer requirements for vendors. By considering data not just as a source of potential harm, but as a critical governance lever, this work aims to equip policymakers with a new tool for the governance and regulation of frontier AI models.

arxiv, dataset, mechanism, (14 more...)

arXiv.org Artificial Intelligence

2412.03824

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York > Monroe County > Rochester (0.04)
North America > United States > Massachusetts (0.04)
(5 more...)

Genre:

Research Report (1.00)
Overview (0.88)

Industry:

Law > Statutes (1.00)
Information Technology > Security & Privacy (1.00)
Law > Intellectual Property & Technology Law (0.68)
(4 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Add feedback

Data Distribution Valuation

Xu, Xinyi, Wang, Shuaiqi, Foo, Chuan-Sheng, Low, Bryan Kian Hsiang, Fanti, Giulia

arXiv.org Artificial IntelligenceOct-6-2024

Data valuation is a class of techniques for quantitatively assessing the value of data for applications like pricing in data marketplaces. Existing data valuation methods define a value for a discrete dataset. However, in many use cases, users are interested in not only the value of the dataset, but that of the distribution from which the dataset was sampled. For example, consider a buyer trying to evaluate whether to purchase data from different vendors. The buyer may observe (and compare) only a small preview sample from each vendor, to decide which vendor's data distribution is most useful to the buyer and purchase. The core question is how should we compare the values of data distributions from their samples? Under a Huber characterization of the data heterogeneity across vendors, we propose a maximum mean discrepancy (MMD)-based valuation method which enables theoretically principled and actionable policies for comparing data distributions from samples. We empirically demonstrate that our method is sample-efficient and effective in identifying valuable data distributions against several existing baselines, on multiple real-world datasets (e.g., network intrusion detection, credit card fraud detection) and downstream applications (classification, regression).

data vendor, dataset, vendor, (16 more...)

arXiv.org Artificial Intelligence

2410.04386

Country:

North America > Canada > Ontario > Toronto (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(4 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Law Enforcement & Public Safety (1.00)
Information Technology > Security & Privacy (1.00)
Banking & Finance (1.00)
Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

The minimum viable data set

#artificialintelligenceSep-16-2019, 13:42:24 GMT

Very often in the context of AI, it is mentioned that enormous amounts of data are required in order to work with it in the first place. Very complex models have to be programmed and the success of a project is often associated with many unpredictabilities and risks. However, as a general rule, this is completely wrong. This article is all about giving you a perspective on how to handle situations of data scarcity and the possibilities to consider in this context. Of course, there are complex projects that place extreme demands on the amount of data in order to achieve effective results, but usually this has to do with poor planning or a deliberately high willingness to experiment.

artificial intelligence, cloud computing, machine learning, (16 more...)

#artificialintelligence

Industry: Information Technology > Services (0.96)

Technology:

Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

starmine AI On-Demand Datasets for Data Vendors, DaaS Providers and Institutions

#artificialintelligenceAug-11-2017, 05:20:10 GMT

Starmine is a robust and highly scalable platform for constructing, trading and exchanging advanced algorithmically generated on-demand datasets for Machine Learning (ML) and Artificial Intelligence (AI) efforts. Datasets remain at the core of most advances in ML and Al. A dataset is typically made up of rows and columns, similar to an organized matrix or spreadsheet. Specifically, columns contain features along with continuously valued attributes or scores. These features and their scored attributes are stored as'supercolumns' and can be automatically engineered, traded or exchanged by machines without human intervention.

artificial intelligence, dataset, starmine ai on-demand dataset, (8 more...)

#artificialintelligence

Country: North America > United States > Illinois > Champaign County > Urbana (0.05)

Industry: Banking & Finance > Trading (0.33)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

How to Better Classify Coachella With Machine Learning (Part 2)

#artificialintelligenceMay-22-2017, 07:01:16 GMT

I get often asked how to start with Machine Learning. But then I consider myself a maker. I truly believe experience is key and solving actual real world applications are the key to unlock the mysteries. Once you have your first success at building and understanding a solution that solved your problem you can dig deeper and refine the building blocks. The problem that we faced, (see Part 1), was that we have a multitude of data vendors providing us with event information.

artificial intelligence, better classify coachella, machine learning, (9 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.74)

Add feedback