AITopics | Estonia

Collaborating Authors

Estonia

ReST-MCTS: LLM Self-Training via Process Reward Guided Tree Search Dan Zhang

Neural Information Processing SystemsMar-22-2025, 19:15:34 GMT

Recent methodologies in LLM self-training mostly rely on LLM generating responses and filtering those with correct output answers as training data. This approach often yields a low-quality fine-tuning training set (e.g., incorrect plans or intermediate reasoning).

arxiv preprint arxiv, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Country: Europe > Estonia (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Information Technology (0.45)
Leisure & Entertainment > Games > Go (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

RustEvo^2: An Evolving Benchmark for API Evolution in LLM-based Rust Code Generation

Liang, Linxi, Gong, Jing, Liu, Mingwei, Wang, Chong, Ou, Guangsheng, Wang, Yanlin, Peng, Xin, Zheng, Zibin

arXiv.org Artificial IntelligenceMar-21-2025

Large Language Models (LLMs) have become pivotal tools for automating code generation in software development. However, these models face significant challenges in producing version-aware code for rapidly evolving languages like Rust, where frequent Application Programming Interfaces (API) changes across versions lead to compatibility issues and correctness errors. Existing benchmarks lack systematic evaluation of how models navigate API transitions, relying on labor-intensive manual curation and offering limited version-specific insights. To address this gap, we present RustEvo, a novel framework for constructing dynamic benchmarks that evaluate the ability of LLMs to adapt to evolving Rust APIs. RustEvo automates dataset creation by synthesizing 588 API changes (380 from Rust standard libraries, 208 from 15 third-party crates) into programming tasks mirroring real-world challenges. These tasks cover four API evolution categories: Stabilizations, Signature Changes, Behavioral Changes, and Deprecations, reflecting their actual distribution in the Rust ecosystem. Experiments on state-of-the-art (SOTA) LLMs reveal significant performance variations: models achieve a 65.8% average success rate on stabilized APIs but only 38.0% on behavioral changes, highlighting difficulties in detecting semantic shifts without signature alterations. Knowledge cutoff dates strongly influence performance, with models scoring 56.1% on before-cutoff APIs versus 32.5% on after-cutoff tasks. Retrieval-Augmented Generation (RAG) mitigates this gap, improving success rates by 13.5% on average for APIs released after model training. Our findings underscore the necessity of our evolution-aware benchmarks to advance the adaptability of LLMs in fast-paced software ecosystems. The framework and the benchmarks are publicly released at https://github.com/SYSUSELab/RustEvo.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.16922

Country:

North America > United States (0.15)
Europe > Portugal (0.14)
Europe > Estonia (0.14)

Genre: Research Report > New Finding (0.86)

Industry: Leisure & Entertainment (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Supplementary material - ABCFair: an Adaptable Benchmark approach for Comparing Fairness Methods

Neural Information Processing SystemsMar-20-2025, 05:27:44 GMT

The SchoolPerformance dataset was created by Lenders and Calders [4]. We used the sex and the education of the student's parents as the sensitive attributes for this dataset. We removed all features that are other expressions of the labels (i.e. Note that this is the only folktables dataset on which we report results in the main paper. Sex, age, and rage are used as sensitive features for this datasets.

artificial intelligence, dataset, machine learning, (17 more...)

Neural Information Processing Systems

Country: Europe > Estonia (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

FORTALESA: Fault-Tolerant Reconfigurable Systolic Array for DNN Inference

Cherezova, Natalia, Jutman, Artur, Jenihhin, Maksim

arXiv.org Artificial IntelligenceMar-6-2025

The emergence of Deep Neural Networks (DNNs) in mission- and safety-critical applications brings their reliability to the front. High performance demands of DNNs require the use of specialized hardware accelerators. Systolic array architecture is widely used in DNN accelerators due to its parallelism and regular structure. This work presents a run-time reconfigurable systolic array architecture with three execution modes and four implementation options. All four implementations are evaluated in terms of resource utilization, throughput, and fault tolerance improvement. The proposed architecture is used for reliability enhancement of DNN inference on systolic array through heterogeneous mapping of different network layers to different execution modes. The approach is supported by a novel reliability assessment method based on fault propagation analysis. It is used for the exploration of the appropriate execution mode-layer mapping for DNN inference. The proposed architecture efficiently protects registers and MAC units of systolic array PEs from transient and permanent faults. The reconfigurability feature enables a speedup of up to $3\times$, depending on layer vulnerability. Furthermore, it requires $6\times$ less resources compared to static redundancy and $2.5\times$ less resources compared to the previously proposed solution for transient faults.

artificial intelligence, machine learning, systolic array, (16 more...)

arXiv.org Artificial Intelligence

2503.04426

Country: Europe > Estonia (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Mamba base PKD for efficient knowledge compression

Medina, José, Hadachi, Amnir, Honeine, Paul, Bensrhair, Abdelaziz

arXiv.org Artificial IntelligenceMar-3-2025

Deep neural networks (DNNs) have remarkably succeeded in various image processing tasks. However, their large size and computational complexity present significant challenges for deploying them in resource-constrained environments. This paper presents an innovative approach for integrating Mamba Architecture within a Progressive Knowledge Distillation (PKD) process to address the challenge of reducing model complexity while maintaining accuracy in image classification tasks. The proposed framework distills a large teacher model into progressively smaller student models, designed using Mamba blocks. Each student model is trained using Selective-State-Space Models (S-SSM) within the Mamba blocks, focusing on important input aspects while reducing computational complexity. The work's preliminary experiments use MNIST and CIFAR-10 as datasets to demonstrate the effectiveness of this approach. For MNIST, the teacher model achieves 98% accuracy. A set of seven student models as a group retained 63% of the teacher's FLOPs, approximating the teacher's performance with 98% accuracy. The weak student used only 1% of the teacher's FLOPs and maintained 72% accuracy. Similarly, for CIFAR-10, the students achieved 1% less accuracy compared to the teacher, with the small student retaining 5% of the teacher's FLOPs to achieve 50% accuracy. These results confirm the flexibility and scalability of Mamba Architecture, which can be integrated into PKD, succeeding in the process of finding students as weak learners. The framework provides a solution for deploying complex neural networks in real-time applications with a reduction in computational cost.

artificial intelligence, machine learning, student model, (17 more...)

arXiv.org Artificial Intelligence

2503.01727

Country:

Europe > Estonia (0.17)
Europe > France (0.14)

Genre: Research Report > Promising Solution (0.86)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

On the Usefulness of the Fit-on-the-Test View on Evaluating Calibration of Classifiers

Kängsepp, Markus, Valk, Kaspar, Kull, Meelis

arXiv.org Artificial IntelligenceFeb-26-2025

Every uncalibrated classifier has a corresponding true calibration map that calibrates its confidence. Deviations of this idealistic map from the identity map reveal miscalibration. Such calibration errors can be reduced with many post-hoc calibration methods which fit some family of calibration maps on a validation dataset. In contrast, evaluation of calibration with the expected calibration error (ECE) on the test set does not explicitly involve fitting. However, as we demonstrate, ECE can still be viewed as if fitting a family of functions on the test data. This motivates the fit-on-the-test view on evaluation: first, approximate a calibration map on the test data, and second, quantify its distance from the identity. Exploiting this view allows us to unlock missed opportunities: (1) use the plethora of post-hoc calibration methods for evaluating calibration; (2) tune the number of bins in ECE with cross-validation. Furthermore, we introduce: (3) benchmarking on pseudo-real data where the true calibration map can be estimated very precisely; and (4) novel calibration and evaluation methods using new calibration map families PL and PL3.

calibration error, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s10994-024-06652-6

2203.08958

Country:

Europe > Estonia (0.14)
Oceania > Australia (0.14)
North America > United States (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

Add feedback

Quantum Machine Learning in Precision Medicine and Drug Discovery -- A Game Changer for Tailored Treatments?

Bertl, Markus, Mott, Alan, Sinno, Salvatore, Bhalgamiya, Bhavika

arXiv.org Artificial IntelligenceFeb-25-2025

The digitization of healthcare presents numerous challenges, including the complexity of biological systems, vast data generation, and the need for personalized treatment plans. Traditional computational methods often fall short, leading to delayed and sometimes ineffective diagnoses and treatments. Quantum Computing (QC) and Quantum Machine Learning (QML) offer transformative advancements with the potential to revolutionize medicine. This paper summarizes areas where QC promises unprecedented computational power, enabling faster, more accurate diagnostics, personalized treatments, and enhanced drug discovery processes. However, integrating quantum technologies into precision medicine also presents challenges, including errors in algorithms and high costs. We show that mathematically-based techniques for specifying, developing, and verifying software (formal methods) can enhance the reliability and correctness of QC. By providing a rigorous mathematical framework, formal methods help to specify, develop, and verify systems with high precision. In genomic data analysis, formal specification languages can precisely (1) define the behavior and properties of quantum algorithms designed to identify genetic markers associated with diseases. Model checking tools can systematically explore all possible states of the algorithm to (2) ensure it behaves correctly under all conditions, while theorem proving techniques provide mathematical (3) proof that the algorithm meets its specified properties, ensuring accuracy and reliability. Additionally, formal optimization techniques can (4) enhance the efficiency and performance of quantum algorithms by reducing resource usage, such as the number of qubits and gate operations. Therefore, we posit that formal methods can significantly contribute to enabling QC to realize its full potential as a game changer in precision medicine.

artificial intelligence, logic & formal reasoning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2502.18639

Country:

North America > United States (0.14)
Europe > Estonia (0.14)
Europe > Austria (0.14)

Genre: Research Report > Promising Solution (0.66)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.93)
Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Haunted House: A text-based game for comparing the flexibility of mental models in humans and LLMs

Puppart, Brett, Paltmann, Paul-Henry, Aru, Jaan

arXiv.org Artificial IntelligenceFeb-12-2025

The advent of transformer-based large language models (LLMs) has reignited the philosophical debate of human significance - a question that has persisted for millennia. Aristotle thought the function of humans was to live according to the rational principle, which was something that distinguished us from other animals (Aristotle, 2014) . Back then, this might have seemed like a reasonable conclusion, as humans use complex language and abstract thinking to a degree that other animals simply do not. However, recent advancements in artificial intelligence (AI) are shining light on the possibility that in the future we might be living in a world in which our creation is more intelligent than us - or perhaps that this world is already here. In many benchmarks comparing humans and AI, LLMs have shown a trend of rapid increase in performance. In SimpleBench, which measures common sense reasoning and social intelligence, GPT-4o scored only 17.8% and o1-preview 41.7% (Philip & Hemang, 2024) .

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2503.16437

Country: Europe > Estonia (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Education (1.00)
Leisure & Entertainment > Games (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Cognitive Paradigms for Evaluating VLMs on Visual Reasoning Task

Vaishnav, Mohit, Tammet, Tanel

arXiv.org Artificial IntelligenceFeb-10-2025

Advancing machine visual reasoning requires a deeper understanding of how Vision-Language Models (VLMs) process and interpret complex visual patterns. This work introduces a novel, cognitively-inspired evaluation framework to systematically analyze VLM reasoning on natural image-based Bongard Problems. We propose three structured paradigms -- Direct Visual Rule Learning, Deductive Rule Learning, and Componential Analysis -- designed to progressively enforce step-wise reasoning and disentangle the interplay between perception and reasoning. Our evaluation shows that advanced, closed-source VLMs (GPT-4o and Gemini 2.0) achieve near-superhuman performance, particularly when provided with high-quality image descriptions, while open-source models exhibit a significant performance bottleneck due to deficiencies in perception. An ablation study further confirms that perception, rather than reasoning, is the primary limiting factor, as open-source models apply extracted rules effectively when given accurate descriptions. These findings underscore the critical role of robust multimodal perception in enhancing generalizable visual reasoning and highlight the importance of structured, step-wise reasoning paradigms for advancing machine intelligence.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2501.1362

Country:

Europe > Estonia (0.14)
North America > Canada (0.14)
Europe > Switzerland (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Cross-platform Learning-based Fault Tolerant Surfacing Controller for Underwater Robots

Hamamatsu, Yuya, Remmas, Walid, Rebane, Jaan, Kruusmaa, Maarja, Ristolainen, Asko

arXiv.org Artificial IntelligenceFeb-10-2025

In this paper, we propose a novel cross-platform fault-tolerant surfacing controller for underwater robots, based on reinforcement learning (RL). Unlike conventional approaches, which require explicit identification of malfunctioning actuators, our method allows the robot to surface using only the remaining operational actuators without needing to pinpoint the failures. The proposed controller learns a robust policy capable of handling diverse failure scenarios across different actuator configurations. Moreover, we introduce a transfer learning mechanism that shares a part of the control policy across various underwater robots with different actuators, thus improving learning efficiency and generalization across platforms. To validate our approach, we conduct simulations on three different types of underwater robots: a hovering-type AUV, a torpedo shaped AUV, and a turtle-shaped robot (U-CAT). Additionally, real-world experiments are performed, successfully transferring the learned policy from simulation to a physical U-CAT in a controlled environment. Our RL-based controller demonstrates superior performance in terms of stability and success rate compared to a baseline controller, achieving an 85.7 percent success rate in real-world tests compared to 57.1 percent with a baseline controller. This research provides a scalable and efficient solution for fault-tolerant control for diverse underwater platforms, with potential applications in real-world aquatic missions.

controller, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2502.07133

Country:

Europe > Estonia (0.14)
Oceania > Australia (0.14)

Genre: Research Report > Experimental Study (0.88)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Add feedback