AITopics | data set

Collaborating Authors

data set

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

0b6ace9e8971cf36f1782aa982a708db-Supplemental.pdf

Neural Information Processing SystemsFeb-7-2026, 10:44:27 GMT

algorithm, da -cv, distribution shift, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

MLFMF: Data Sets for Machine Learning for Mathematical Formalization

Neural Information Processing SystemsDec-26-2025, 11:09:52 GMT

We introduce MLFMF, a collection of data sets for benchmarking recommendation systems used to support formalization of mathematics with proof assistants. These systems help humans identify which previous entries (theorems, constructions, datatypes, and postulates) are relevant in proving a new theorem or carrying out a new construction. Each data set is derived from a library of formalized mathematics written in proof assistants Agda or Lean. The collection includes the largest Lean 4 library Mathlib, and some of the largest Agda libraries: the standard library, the library of univalent mathematics Agda-unimath, and the TypeTopology library. Each data set represents the corresponding library in two ways: as a heterogeneous network, and as a list of s-expressions representing the syntax trees of all the entries in the library. The network contains the (modular) structure of the library and the references between entries, while the s-expressions give complete and easily parsed information about every entry.We report baseline results using standard graph and word embeddings, tree ensembles, and instance-based learning algorithms. The MLFMF data sets provide solid benchmarking support for further investigation of the numerous machine learning approaches to formalized mathematics. The methodology used to extract the networks and the s-expressions readily applies to other libraries, and is applicable to other proof assistants. With more than $250\,000$ entries in total, this is currently the largest collection of formalized mathematical knowledge in machine learnable format.

library, machine learning, mathematical formalization, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Approximate Sampling from k-DPP Marginals The final piece of the A

Neural Information Processing SystemsOct-2-2025, 00:33:27 GMT

In view of this, Barthelmé et al. (2019) propose an approximation to k-DPPs valid for large-scale ground sets which has better numerical properties. L( h): H [0, 1] be a random variable. The first equality uses Proposition 4. The second equality uses Proposition 3 and the fact that the We decompose bound the game regret into the sum of player and sampler regret. D, then a learner player that plays SGD algorithm suffers at most regret O ( GD T) . For convex regression and classification models we use linear models.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

MLFMF: Data Sets for Machine Learning for Mathematical Formalization

Neural Information Processing SystemsJan-19-2025, 17:14:42 GMT

library, machine learning, mathematical formalization, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Russian Jeopardy! Data Set for Question-Answering Systems

Mikhalkova, Elena

arXiv.org Artificial IntelligenceOct-7-2024

Question answering (QA) is one of the most common NLP tasks that relates to named entity recognition, fact extraction, semantic search and some other fields. In industry, it is much appreciated in chatbots and corporate information systems. It is also a challenging task that attracted the attention of a very general audience at the quiz show Jeopardy! In this article we describe a Jeopardy!-like Russian QA data set collected from the official Russian quiz database Chgk (che ge ka). The data set includes 379,284 quiz-like questions with 29,375 from the Russian analogue of Jeopardy! - "Own Game". We observe its linguistic features and the related QA-task. We conclude about perspectives of a QA competition based on the data set collected from this database.

jeopardy, russian jeopardy, tournament, (17 more...)

arXiv.org Artificial Intelligence

2112.02325

Country:

Europe > Russia (0.14)
Asia > Russia > Ural Federal District > Tyumen Oblast > Tyumen (0.05)
Europe > United Kingdom > England (0.04)
(8 more...)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Jeopardy! (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)

Add feedback

The Influence of Faulty Labels in Data Sets on Human Pose Estimation

Schwarz, Arnold, Hernadi, Levente, Bießmann, Felix, Hildebrand, Kristian

arXiv.org Artificial IntelligenceSep-9-2024

In this study we provide empirical evidence demonstrating that the quality of training data impacts model performance in Human Pose Estimation (HPE). Inaccurate labels in widely used data sets, ranging from minor errors to severe mislabeling, can negatively influence learning and distort performance metrics. We perform an in-depth analysis of popular HPE data sets to show the extent and nature of label inaccuracies. Our findings suggest that accounting for the impact of faulty labels will facilitate the development of more robust and accurate HPE models for a variety of real-world applications. We show improved performance with cleansed data.

data set, faulty label, human pose estimation

arXiv.org Artificial Intelligence

2409.03887

Genre: Research Report > New Finding (0.87)

Technology: Information Technology > Artificial Intelligence > Vision > Video Understanding (0.60)

Add feedback

The US wants to use facial recognition to identify migrant children as they age

MIT Technology ReviewAug-14-2024, 20:02:01 GMT

As Boyd explained at a conference in June, the key question for OBIM is, "If we pick up someone from Panama at the southern border at age four, say, and then pick them up at age six, are we going to recognize them?" Facial recognition technology (FRT) has traditionally not been applied to children, largely because training data sets of real children's faces are few and far between, and consist of either low-quality images drawn from the internet or small sample sizes with little diversity. Such limitations reflect the significant sensitivities regarding privacy and consent when it comes to minors. According to Syracuse University's Transactional Records Access Clearinghouse (TRAC), 339,234 children arrived at the US-Mexico border in 2022, the last year for which numbers are currently available. Of those children, 150,000 were unaccompanied--the highest annual number on record.

identify migrant child, publication, use facial recognition, (6 more...)

MIT Technology Review

Country:

North America > United States (1.00)
North America > Panama (0.26)
North America > Mexico (0.26)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Government > Immigration & Customs (1.00)

Technology: Information Technology > Artificial Intelligence > Vision > Face Recognition (0.75)

Add feedback

A rational model of causal induction with continuous causes

Neural Information Processing SystemsMar-15-2024, 14:08:29 GMT

Rational models of causal induction have been successful in accounting for people's judgments about causal relationships. However, these models have focused on explaining inferences from discrete data of the kind that can be summarized in a 2 2 contingency table. This severely limits the scope of these models, since the world often provides non-binary data. We develop a new rational model of causal induction using continuous dimensions, which aims to diminish the gap between empirical and theoretical approaches and real-world causal induction. This model successfully predicts human judgments from previous studies better than models of discrete causal inference, and outperforms several other plausible models of causal induction with continuous causes in accounting for people's inferences in a new experiment.

causal induction, judgment, prediction, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

f7e6c85504ce6e82442c770f7c8606f0-Reviews.html

Neural Information Processing SystemsMar-14-2024, 00:11:12 GMT

The title of this paper is much like the paper itself: to-the-point, descriptive, and readable. "A simple example of Dirichlet process mixture inconsistency for the number of components" delivers on its promise by providing two easy-to-understand demonstrations of the severity of the problem of using Dirichlet process mixtures to estimate the number of components in a mixture model. The authors start by demonstrating that making such a component-cardinality estimate is widespread in the literature (and therefore a problem deserving of interest), briefly describe the Dirichlet process mixture (DPM) model (with particular emphasis on the popular normal likelihood case), and then demonstrate with a simple single-component mixture example how poorly estimation of component cardinality can go (their convincing answer: very poorly). Not only was the paper enjoyable to read but, refreshingly, didn't try to fit 20 pages of material into an 8 page limit. One potential criticism of this paper is that this result should be well-known in some sense in the community.

dpm, future work, proposition 3, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.31)

Add feedback

Variable selection for Na\"ive Bayes classification

Blanquero, Rafael, Carrizosa, Emilio, Ramírez-Cobo, Pepa, Sillero-Denamiel, M. Remedios

arXiv.org Artificial IntelligenceJan-31-2024

The Na\"ive Bayes has proven to be a tractable and efficient method for classification in multivariate analysis. However, features are usually correlated, a fact that violates the Na\"ive Bayes' assumption of conditional independence, and may deteriorate the method's performance. Moreover, datasets are often characterized by a large number of features, which may complicate the interpretation of the results as well as slow down the method's execution. In this paper we propose a sparse version of the Na\"ive Bayes classifier that is characterized by three properties. First, the sparsity is achieved taking into account the correlation structure of the covariates. Second, different performance measures can be used to guide the selection of features. Third, performance constraints on groups of higher interest can be included. Our proposal leads to a smart search, which yields competitive running times, whereas the flexibility in terms of performance measure for classification is integrated. Our findings show that, when compared against well-referenced feature selection approaches, the proposed sparse Na\"ive Bayes obtains competitive results regarding accuracy, sparsity and running times for balanced datasets. In the case of datasets with unbalanced (or with different importance) classes, a better compromise between classification rates for the different classes is achieved.

classifier, dataset, performance measure, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.cor.2021.105456

2401.18039

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Austria > Vienna (0.14)
South America > Uruguay > Maldonado > Maldonado (0.04)
(4 more...)

Genre: Research Report > New Finding (0.86)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback