AITopics | correct option

Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming

Neural Information Processing SystemsFeb-15-2026, 16:18:19 GMT

Generative models have demonstrated human-level proficiency in various benchmarks across domains like programming, natural sciences, and general knowledge.

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.67)

Industry:

Law (0.92)
Information Technology (0.92)
Government (0.67)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
(2 more...)

Add feedback

6d5e00006b65fcc55c3c1798da821663-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-15-2026, 16:18:16 GMT

lama ct, large language model, machine learning, (19 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Appendix A

Neural Information Processing SystemsFeb-15-2026, 14:25:21 GMT

Q: For what purpose was the dataset created? Q: Who created the dataset (e.g., which team, research group) and on behalf of which entity (e.g., Q: Who funded the creation of the dataset? Q: What do the instances that comprise the dataset represent (e.g., documents, photos, people, Q: How many instances are there in total (of each type, if appropriate)? As shown in Table 1, the dataset statistics are as follows: Grounding Task: 111,770 samples for training, 21,616 samples for testing. For grounding, we use only one annotation per image.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: Asia > China (0.05)

Industry: Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)

Add feedback

32b80425554e081204e5988ab1c97e9a-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 21:36:50 GMT

expert system, follow-up question, information, (16 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Monaco (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Education (1.00)
Health & Medicine > Diagnostic Medicine (0.93)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.93)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.76)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.49)
(2 more...)

Add feedback

Metric-Fair Prompting: Treating Similar Samples Similarly

Wang, Jing, Shen, Jie, Niu, Xing, Zhang, Tong, Weiss, Jeremy

arXiv.org Artificial IntelligenceDec-9-2025

We introduce \emph{Metric-Fair Prompting}, a fairness-aware prompting framework that guides large language models (LLMs) to make decisions under metric-fairness constraints. In the application of multiple-choice medical question answering, each {(question, option)} pair is treated as a binary instance with label $+1$ (correct) or $-1$ (incorrect). To promote {individual fairness}~--~treating similar instances similarly~--~we compute question similarity using NLP embeddings and solve items in \emph{joint pairs of similar questions} rather than in isolation. The prompt enforces a global decision protocol: extract decisive clinical features, map each $(\text{question}, \text{option})$ to a score $f(x)$ that acts as confidence, and impose a Lipschitz-style constraint so that similar inputs receive similar scores and, hence, consistent outputs. Evaluated on the {MedQA (US)} benchmark, Metric-Fair Prompting is shown to improve performance over standard single-item prompting, demonstrating that fairness-guided, confidence-oriented reasoning can enhance LLM accuracy on high-stakes clinical multiple-choice questions.

justification, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2512.07608

Country: North America > United States (1.00)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Education (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Appendix A

Neural Information Processing SystemsOct-11-2025, 00:26:23 GMT

Q: For what purpose was the dataset created? Q: Who created the dataset (e.g., which team, research group) and on behalf of which entity (e.g., Q: Who funded the creation of the dataset? Q: What do the instances that comprise the dataset represent (e.g., documents, photos, people, Q: How many instances are there in total (of each type, if appropriate)? As shown in Table 1, the dataset statistics are as follows: Grounding Task: 111,770 samples for training, 21,616 samples for testing. For grounding, we use only one annotation per image.

dataset, instruction, phrase respond, (17 more...)

Neural Information Processing Systems

Country: Asia > China (0.05)

Industry: Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)

Add feedback

Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming

Neural Information Processing SystemsOct-10-2025, 05:24:15 GMT

Generative models have demonstrated human-level proficiency in various benchmarks across domains like programming, natural sciences, and general knowledge.

avatar, dataset, grid, (16 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.67)

Industry:

Law (0.92)
Information Technology (0.92)
Government (0.67)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
(2 more...)

Add feedback

Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming

Neural Information Processing SystemsOct-10-2025, 05:24:11 GMT

Generative models have demonstrated human-level proficiency in various benchmarks across domains like programming, natural sciences, and general knowledge.

benchmark, lama ct, student, (14 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Industry: Education (1.00)

Technology: