AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Neural Information Processing SystemsNov-19-2025, 22:07:59 GMT

9547b09b722f2948ff3ddb5d86002bc0-Paper-Datasets_and_Benchmarks_Track.pdf

artificial intelligence, machine learning, natural language, (21 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
South America > Paraguay > Asunción > Asunción (0.04)
Europe > Austria (0.04)
(13 more...)

Genre:

Research Report (0.67)
Questionnaire & Opinion Survey (0.49)
Overview (0.46)

Industry: Government (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(4 more...)

Neural Information Processing SystemsOct-10-2025, 10:20:08 GMT

9547b09b722f2948ff3ddb5d86002bc0-Paper-Datasets_and_Benchmarks_Track.pdf

annotation, croissant, dataset, (17 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
South America > Paraguay > Asunción > Asunción (0.04)
Europe > Austria (0.04)
(13 more...)

Genre:

Research Report (0.67)
Questionnaire & Opinion Survey (0.49)
Overview (0.46)

Industry: Government (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(4 more...)

Lu, Zehao, van der Plas, Thijs L, Rashidi, Parinaz, Kissling, W Daniel, Athanasiadis, Ioannis N

Flexible metadata harvesting for ecology using large language models

arXiv.org Artificial IntelligenceOct-7-2025

Large, open datasets can accelerate ecological research, particularly by enabling researchers to develop new insights by reusing datasets from multiple sources. However, to find the most suitable datasets to combine and integrate, researchers must navigate diverse ecological and environmental data provider platforms with varying metadata availability and standards. To overcome this obstacle, we have developed a large language model (LLM)-based metadata harvester that flexibly extracts metadata from any dataset's landing page, and converts these to a user-defined, unified format using existing metadata standards. We validate that our tool is able to extract both structured and unstructured metadata with equal accuracy, aided by our LLM post-processing protocol. Furthermore, we utilise LLMs to identify links between datasets, both by calculating embedding similarity and by unifying the formats of extracted metadata to enable rule-based processing. Our tool, which flexibly links the metadata of different datasets, can therefore be used for ontology creation or graph-based queries, for example, to find relevant ecological and environmental datasets in a virtual research environment.

large language model, machine learning, natural language, (18 more...)

doi: 10.1007/978-3-032-06136-2_32

2508.20115

Country: Europe > Netherlands (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Neural Information Processing SystemsMay-27-2025, 09:31:44 GMT

Croissant: A Metadata Format for ML-Ready Datasets

artificial intelligence, croissant, machine learning, (2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.44)

arXiv.org Artificial IntelligenceFeb-26-2025

MathTutorBench: A Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM Tutors

Macina, Jakub, Daheim, Nico, Hakimi, Ido, Kapur, Manu, Gurevych, Iryna, Sachan, Mrinmaya

Evaluating the pedagogical capabilities of AI-based tutoring models is critical for making guided progress in the field. Yet, we lack a reliable, easy-to-use, and simple-to-run evaluation that reflects the pedagogical abilities of models. To fill this gap, we present MathTutorBench, an open-source benchmark for holistic tutoring model evaluation. MathTutorBench contains a collection of datasets and metrics that broadly cover tutor abilities as defined by learning sciences research in dialog-based teaching. To score the pedagogical quality of open-ended teacher responses, we train a reward model and show it can discriminate expert from novice teacher responses with high accuracy. We evaluate a wide set of closed- and open-weight models on MathTutorBench and find that subject expertise, indicated by solving ability, does not immediately translate to good teaching. Rather, pedagogy and subject expertise appear to form a trade-off that is navigated by the degree of tutoring specialization of the model. Furthermore, tutoring appears to become more challenging in longer dialogs, where simpler questioning strategies begin to fail. We release the benchmark, code, and leaderboard openly to enable rapid benchmarking of future models.

computational linguistic, large language model, machine learning, (18 more...)

2502.1894

Country:

Europe > Switzerland > Zürich > Zürich (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
North America > Canada > Ontario > Toronto (0.04)
(7 more...)

Genre: Research Report (0.82)

Industry:

Education > Curriculum > Subject-Specific Education (0.88)
Education > Educational Setting (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

arXiv.org Artificial IntelligenceMay-30-2024

Croissant: A Metadata Format for ML-Ready Datasets

Akhtar, Mubashara, Benjelloun, Omar, Conforti, Costanza, Gijsbers, Pieter, Giner-Miguelez, Joan, Jain, Nitisha, Kuchnik, Michael, Lhoest, Quentin, Marcenac, Pierre, Maskey, Manil, Mattson, Peter, Oala, Luis, Ruyssen, Pierre, Shinde, Rajat, Simperl, Elena, Thomas, Goeffry, Tykhonov, Slava, Vanschoren, Joaquin, van der Velde, Jos, Vogler, Steffen, Wu, Carole-Jean

Data is a critical resource for Machine Learning (ML), yet working with data remains a key friction point. This paper introduces Croissant, a metadata format for datasets that simplifies how data is used by ML tools and frameworks. Croissant makes datasets more discoverable, portable and interoperable, thereby addressing significant challenges in ML data management and responsible AI. Croissant is already supported by several popular dataset repositories, spanning hundreds of thousands of datasets, ready to be loaded into the most popular ML frameworks.

croissant, dataset, metadata, (15 more...)

doi: 10.1145/3650203.3663326

2403.19546

Country: North America > United States > Virginia (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Yang, Xiaocheng, Chen, Bingsen, Tam, Yik-Cheung

Arithmetic Reasoning with LLM: Prolog Generation & Permutation

arXiv.org Artificial IntelligenceMay-28-2024

Instructing large language models (LLMs) to solve elementary school math problems has shown great success using Chain of Thought (CoT). However, the CoT approach relies on an LLM to generate a sequence of arithmetic calculations which can be prone to cascaded calculation errors. We hypothesize that an LLM should focus on extracting predicates and generating symbolic formulas from the math problem description so that the underlying calculation can be done via an external code interpreter. We investigate using LLM to generate Prolog programs to solve mathematical questions. Experimental results show that our Prolog-based arithmetic problem-solving outperforms CoT generation in the GSM8K benchmark across three distinct LLMs. In addition, given the insensitive ordering of predicates and symbolic formulas in Prolog, we propose to permute the ground truth predicates for more robust LLM training via data augmentation.

language model, permutation, reasoning, (17 more...)

2405.17893

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > New York (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Artificial IntelligenceSep-23-2023

Pre-Training for Robots: Offline RL Enables Learning New Tasks from a Handful of Trials

Kumar, Aviral, Singh, Anikait, Ebert, Frederik, Nakamoto, Mitsuhiko, Yang, Yanlai, Finn, Chelsea, Levine, Sergey

Progress in deep learning highlights the tremendous potential of utilizing diverse robotic datasets for attaining effective generalization and makes it enticing to consider leveraging broad datasets for attaining robust generalization in robotic learning as well. However, in practice, we often want to learn a new skill in a new environment that is unlikely to be contained in the prior data. Therefore we ask: how can we leverage existing diverse offline datasets in combination with small amounts of task-specific data to solve new tasks, while still enjoying the generalization benefits of training on large amounts of data? In this paper, we demonstrate that end-to-end offline RL can be an effective approach for doing this, without the need for any representation learning or vision-based pre-training. We present pre-training for robots (PTR), a framework based on offline RL that attempts to effectively learn new tasks by combining pre-training on existing robotic datasets with rapid fine-tuning on a new task, with as few as 10 demonstrations. PTR utilizes an existing offline RL method, conservative Q-learning (CQL), but extends it to include several crucial design decisions that enable PTR to actually work and outperform a variety of prior methods. To our knowledge, PTR is the first RL method that succeeds at learning new tasks in a new domain on a real WidowX robot with as few as 10 task demonstrations, by effectively leveraging an existing dataset of diverse multi-task robot data collected in a variety of toy kitchens. We also demonstrate that PTR can enable effective autonomous fine-tuning and improvement in a handful of trials, without needing any demonstrations. An accompanying overview video can be found in the supplementary material and at thi URL: https://sites.google.com/view/ptr-final/

dataset, demonstration, fine-tuning, (15 more...)

2210.05178

Country:

North America > United States > New York (0.04)
North America > United States > Indiana > Hamilton County > Fishers (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.87)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)

#artificialintelligenceAug-29-2019, 08:52:22 GMT

Artificial intelligence showdown: Google Lens vs Bixby Vision vs Huawei HiVision

We saw this goofy-looking sheep plushie and decided to test if the software will be able to recognize what animal it is despite the weird proportions. The results were hit and miss. Google Lens left, Bixby Vision middle, HiVision right In its typical style, Google's result looks like the software is bored with your constant questions and just spits out "sheep", which is indeed correct. However, we can't blame the other two apps for suggesting "toy" since the plushie is more a toy than a real sheep, obviously. Still, Bixby Vision had a hard time realizing there's only one object it needs to recognize and suggested similar images that were of pies and other whipped-cream decorated pastries.

artificial intelligence, bixby vision middle, google lens, (14 more...)

#artificialintelligence

Industry: Telecommunications (0.52)

Technology: Information Technology > Artificial Intelligence (1.00)