AITopics | Education

Collaborating Authors

Education

Assessing the Capability of LLMs in Solving POSCOMP Questions

Viegas, Cayo, Gheyi, Rohit, Ribeiro, Márcio

arXiv.org Artificial IntelligenceNov-19-2025

--Recent advancements in Large Language Models (LLMs) have significantly expanded the capabilities of artificial intelligence in natural language processing tasks. Despite this progress, their performance in specialized domains such as computer science remains relatively unexplored. Understanding the proficiency of LLMs in these domains is critical for evaluating their practical utility and guiding future developments. The POSCOMP, a prestigious Brazilian examination used for graduate admissions in computer science promoted by the Brazlian Computer Society (SBC), provides a challenging benchmark. This study investigates whether LLMs can match or surpass human performance on the POSCOMP exam. Four LLMs - ChatGPT -4, Gemini 1.0 Advanced, Claude 3 Sonnet, and Le Chat Mistral Large - were initially evaluated on the 2022 and 2023 POSCOMP exams. The assessments measured the models' proficiency in handling complex questions typical of the exam. LLM performance was notably better on text-based questions than on image interpretation tasks. In the 2022 exam, ChatGPT - 4 led with 57 correct answers out of 69 questions, followed by Gemini 1.0 Advanced (49), Le Chat Mistral (48), and Claude 3 Sonnet (44). Similar trends were observed in the 2023 exam. ChatGPT -4 achieved the highest performance, surpassing all students who took the POSCOMP 2023 exam. LLMs, particularly ChatGPT -4, show promise in text-based tasks on the POSCOMP exam, although image interpretation remains a challenge. Given the rapid evolution of LLMs, we expanded our analysis to include more recent models - o1, Gemini 2.5 Pro, Claude 3.7 Sonnet, and o3-mini-high - evaluated on the 2022-2024 POSCOMP exams. These newer models demonstrate further improvements and consistently surpass both the average and top-performing human participants across all three years. The POSCOMP [1] is a prestigious assessment designed to test the knowledge of prospective computer science graduate students, promoted by the Brazilian Computer Society (SBC). It serves as an entry criterion for many graduate programs across Brazil. Using this exam as a benchmark for evaluating Large Language Models (LLMs) allows for a direct comparison between AI capabilities and human standards, offering valuable insights into the strengths and limitations of current AI models. Recent advancements in LLMs [2], [3] have significantly expanded the capabilities of Artificial Intelligence (AI), particularly in natural language processing tasks.

exam, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.5753/jbcs.2025.4493

2505.20338

Country:

Asia > Japan (0.28)
South America > Brazil (0.25)

Genre: Research Report > New Finding (1.00)

Industry:

Education > Assessment & Standards (0.46)
Education > Educational Setting > Higher Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Slimmed Asymmetrical Contrastive Learning and Cross Distillation for Lightweight Model Training Jian Meng, Li Y ang

Neural Information Processing SystemsNov-18-2025, 22:34:05 GMT

Contrastive learning (CL) has been widely investigated with various learning mechanisms and achieves strong capability in learning representations of data in a self-supervised manner using unlabeled data. A common fashion of contrastive learning on this line is employing large-sized encoders to achieve comparable performance as the supervised learning counterpart. Despite the success of the labelless training, current contrastive learning algorithms failed to achieve good performance with lightweight (compact) models, e.g., MobileNet, while the requirements of the heavy encoders impede the energy-efficient computation, especially for resource-constrained AI applications. Motivated by this, we propose a new self-supervised CL scheme, named SACL-XD, consisting of two technical components, S limmed A symmetrical C ontrastive L earning (SACL) and Cross - D istillation (XD), which collectively enable efficient CL with compact models.

artificial intelligence, encoder, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > North Carolina (0.04)
Asia > South Korea (0.04)

Genre: Research Report (0.68)

Industry: Education (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning

Neural Information Processing SystemsNov-18-2025, 17:11:21 GMT

Unsupervised pre-training methods utilizing large and diverse datasets have achieved tremendous success across a range of domains.

machine learning, reinforcement learning, world model, (16 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Leisure & Entertainment > Games (0.46)
Education > Educational Technology > Educational Software > Computer Based Training (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Loyola New Orleans student court overturns previous decision barring Turning Point chapter

FOX NewsNov-18-2025, 15:47:24 GMT

A student court at Loyola University New Orleans vacated the Student Government Association's decision to deny Turning Point USA official campus status.

artificial intelligence, lifestyle real estate tech science, social media, (7 more...)

FOX News

Country: North America > United States > Louisiana > Orleans Parish > New Orleans (0.63)

Industry:

Leisure & Entertainment > Sports (1.00)
Government (1.00)
Health & Medicine > Consumer Health (0.75)
(3 more...)

Technology:

Information Technology > Artificial Intelligence (0.99)
Information Technology > Communications > Social Media (0.78)

Add feedback

models (LMs). Given a fixed budget of tokens, we study how to best select data

Neural Information Processing SystemsNov-18-2025, 08:57:15 GMT

A natural way to unlock particular capabilities is to improve this training data.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Education (0.92)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.93)
(3 more...)

Add feedback

RD-Suite: A Benchmark for Ranking Distillation

Neural Information Processing SystemsNov-18-2025, 08:23:43 GMT

Moreover, inconsistent bench-marking on a wide range of tasks and datasets make it difficult to assess or invigorate advances in this field.

distillation, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California > Santa Clara County > Mountain View (0.04)

Genre: Overview (0.46)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Add feedback

What AI doesn't know: we could be creating a global 'knowledge collapse' Deepak Varuvel Dennison

The GuardianNov-18-2025, 05:00:25 GMT

What AI doesn't know: we could be creating a global'knowledge collapse' As GenAI becomes the primary way to find information, local and traditional wisdom is being lost. And we are only beginning to realise what we're missing This article was originally published as'Holes in the web' on Aeon.co A few years back, my dad was diagnosed with a tumour on his tongue - which meant we had some choices to weigh up. My family has an interesting dynamic when it comes to medical decisions. While my older sister is a trained doctor in western allopathic medicine, my parents are big believers in traditional remedies. Having grown up in a small town in India, I am accustomed to rituals. My dad had a ritual, too. Every time we visited his home village in southern Tamil Nadu, he'd get a bottle of thick, pungent, herb-infused oil from a vaithiyar, a traditional doctor practising Siddha medicine. It was his way of maintaining his connection with the kind of medicine he had always known and trusted.

large language model, machine learning, natural language, (21 more...)

The Guardian

Country:

Asia > India > Tamil Nadu (0.24)
Asia > India > Karnataka > Bengaluru (0.05)
Asia > India > NCT > Delhi (0.04)
(7 more...)

Industry:

Leisure & Entertainment > Sports (0.68)
Education (0.68)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Communications > Social Media (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.30)

Add feedback

Goal-Conditioned On-Policy Reinforcement Learning Xudong Gong

Neural Information Processing SystemsNov-18-2025, 01:32:37 GMT

This limitation prevents HER from densifying the reward.

demonstration, gcpo, learning, (14 more...)

Neural Information Processing Systems

Country:

Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China > Hunan Province (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

FOX News Media CEO Suzanne Scott participates in fireside chat with University of South Carolina students

FOX NewsNov-18-2025, 00:23:59 GMT

FOX News Media CEO Suzanne Scott participated in a fireside chat on Thursday with University of South Carolina students, discussing media operations and the business of journalism.

artificial intelligence, social media, university, (13 more...)

FOX News

Country: North America > United States > South Carolina (0.63)

Industry:

Media > News (1.00)
Education > Educational Setting > Higher Education (0.87)
Government > Regional Government > North America Government > United States Government (0.48)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (0.71)

Add feedback

Sample Complexity of Agnostic Multiclass Classification: Natarajan Dimension Strikes Back

Cohen, Alon, Erez, Liad, Hanneke, Steve, Koren, Tomer, Mansour, Yishay, Moran, Shay, Zhang, Qian

arXiv.org Machine LearningNov-18-2025

The fundamental theorem of statistical learning states that binary PAC learning is governed by a single parameter -- the Vapnik-Chervonenkis (VC) dimension -- which determines both learnability and sample complexity. Extending this to multiclass classification has long been challenging, since Natarajan's work in the late 80s proposing the Natarajan dimension (Nat) as a natural analogue of VC. Daniely and Shalev-Shwartz (2014) introduced the DS dimension, later shown by Brukhim et al. (2022) to characterize multiclass learnability. Brukhim et al. also showed that Nat and DS can diverge arbitrarily, suggesting that multiclass learning is governed by DS rather than Nat. We show that agnostic multiclass PAC sample complexity is in fact governed by two distinct dimensions. Specifically, we prove nearly tight agnostic sample complexity bounds that, up to log factors, take the form $\frac{DS^{1.5}}ε + \frac{Nat}{ε^2}$ where $ε$ is the excess risk. This bound is tight up to a $\sqrt{DS}$ factor in the first term, nearly matching known $Nat/ε^2$ and $DS/ε$ lower bounds. The first term reflects the DS-controlled regime, while the second shows that the Natarajan dimension still dictates asymptotic behavior for small $ε$. Thus, unlike binary or online classification -- where a single dimension (VC or Littlestone) controls both phenomena -- multiclass learning inherently involves two structural parameters. Our technical approach departs from traditional agnostic learning methods based on uniform convergence or reductions to realizable cases. A key ingredient is a novel online procedure based on a self-adaptive multiplicative-weights algorithm performing a label-space reduction, which may be of independent interest.

artificial intelligence, dimension, machine learning, (17 more...)

arXiv.org Machine Learning

2511.12659

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry: Education > Educational Setting > Online (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Add feedback