Goto

Collaborating Authors

 factorial


TypePilot: Leveraging the Scala Type System for Secure LLM-generated Code

Sternfeld, Alexander, Kucharavy, Andrei, Dolamic, Ljiljana

arXiv.org Artificial Intelligence

Large language Models (LLMs) have shown remarkable proficiency in code generation tasks across various programming languages. However, their outputs often contain subtle but critical vulnerabilities, posing significant risks when deployed in security-sensitive or mission-critical systems. This paper introduces TypePilot, an agentic AI framework designed to enhance the security and robustness of LLM-generated code by leveraging strongly typed and verifiable languages, using Scala as a representative example. We evaluate the effectiveness of our approach in two settings: formal verification with the Stainless framework and general-purpose secure code generation. Our experiments with leading open-source LLMs reveal that while direct code generation often fails to enforce safety constraints, just as naive prompting for more secure code, our type-focused agentic pipeline substantially mitigates input validation and injection vulnerabilities. The results demonstrate the potential of structured, type-guided LLM workflows to improve the SotA of the trustworthiness of automated code generation in high-assurance domains.


Enhancing Mathematical Reasoning in LLMs with Background Operators

Chen, Jiajun, Tam, Yik-Cheung

arXiv.org Artificial Intelligence

We propose utilizing background operators for mathematical reasoning in large language models (LLMs). To achieve this, we define a set of fundamental mathematical predicates as the basic building blocks. For each mathematical problem, we develop a Prolog solution that includes problem-specific predicates and intermediate predicates derived from these background operators, ensuring that each solution adheres to the defined operator set. We introduce the MATH-Prolog corpus, which is derived from the counting and probability categories of the MATH corpus. For efficient data augmentation, we apply K-fold cross-validated self-training. This method incrementally generates new Prolog solutions for each fold, incorporating those verified as correct into the training set throughout the model training process. Our experimental results demonstrate that 5-fold crossvalidated self-training effectively identifies new, accurate Prolog solutions, achieving an accuracy of 84.6% on the cross-validated set, and 84.8% on the test set during fine-tuning the Meta-Llama-3.1-8B-Instruct model. This approach successfully uncovers new solutions with fully computable inference steps for previously unseen problems. Additionally, incorporating the background mathematical predicates into the prompt enhances solution coverage.


LokiLM: Technical Report

Kiefel, Justin, Shah, Shrey

arXiv.org Artificial Intelligence

In this work, we introduce LokiLM, a 1.4B parameter large language model trained on 500B tokens. Our model performs strongly in natural language reasoning tasks and achieves state-of-the-art performance among models with 1.5B parameters or less. LokiLM is trained using multi-teacher knowledge distillation and high-quality training data to achieve benchmark results competitive with larger models trained on significantly more tokens. We support these findings by introducing steps to avoid benchmark contamination and overfitting throughout our development process. Despite its promising performance, LokiLM exhibits a concerning amount of hallucinations and scores poorly on the TruthfulQA benchmark, so we do not release the model publicly.


Improving Socratic Question Generation using Data Augmentation and Preference Optimization

Kumar, Nischal Ashok, Lan, Andrew

arXiv.org Artificial Intelligence

The Socratic method is a way of guiding students toward solving a problem independently without directly revealing the solution to the problem. Although this method has been shown to significantly improve student learning outcomes, it remains a complex labor-intensive task for instructors. Large language models (LLMs) can be used to augment human effort by automatically generating Socratic questions for students. However, existing methods that involve prompting these LLMs sometimes produce invalid outputs, e.g., those that directly reveal the solution to the problem or provide irrelevant or premature questions. To alleviate this problem, inspired by reinforcement learning with AI feedback (RLAIF), we first propose a data augmentation method to enrich existing Socratic questioning datasets with questions that are invalid in specific ways. Next, we propose a method to optimize open-source LLMs such as LLama 2 to prefer ground-truth questions over generated invalid ones, using direct preference optimization (DPO). Our experiments on a Socratic questions dataset for student code debugging show that a DPO-optimized 7B LLama 2 model can effectively avoid generating invalid questions, and as a result, outperforms existing state-of-the-art prompting methods.


Can Language Models Employ the Socratic Method? Experiments with Code Debugging

Al-Hossami, Erfan, Bunescu, Razvan, Smith, Justin, Teehan, Ryan

arXiv.org Artificial Intelligence

When employing the Socratic method of teaching, instructors guide students toward solving a problem on their own rather than providing the solution directly. While this strategy can substantially improve learning outcomes, it is usually time-consuming and cognitively demanding. Automated Socratic conversational agents can augment human instruction and provide the necessary scale, however their development is hampered by the lack of suitable data for training and evaluation. In this paper, we introduce a manually created dataset of multi-turn Socratic advice that is aimed at helping a novice programmer fix buggy solutions to simple computational problems. The dataset is then used for benchmarking the Socratic debugging abilities of a number of language models, ranging from fine-tuning the instruction-based text-to-text transformer Flan-T5 to zero-shot and chain of thought prompting of the much larger GPT-4. The code and datasets are made freely available for research at the link below. https://github.com/taisazero/socratic-debugging-benchmark


Morton-Style Factorial Coding of Color in Primary Visual Cortex

Neural Information Processing Systems

We introduce the notion of Morton-style factorial coding and illustrate how it may help understand information integration and perceptual cod- ing in the brain. We show that by focusing on average responses one may miss the existence of factorial coding mechanisms that become only apparent when analyzing spike count histograms. We show evidence suggesting that the classical/non-classical receptive field organization in the cortex effectively enforces the development of Morton-style factorial codes. This may provide some cues to help understand perceptual cod- ing in the brain and to develop new unsupervised learning algorithms. While methods like ICA (Bell & Sejnowski, 1997) develop independent codes, in Morton-style coding the goal is to make two or more external aspects of the world become independent when conditioning on internal representations.


Towards Hexapod Gait Adaptation using Enumerative Encoding of Gaits: Gradient-Free Heuristics

Parque, Victor

arXiv.org Artificial Intelligence

Abstract--The quest for the efficient adaptation of multilegged robotic systems to changing conditions is expected to render new insights into robotic control and locomotion. In this paper, we study the performance frontiers of the enumerative (factorial) encoding of hexapod gaits for fast recovery to conditions of leg failures. Our computational studies using five nature-inspired gradient-free optimization heuristics have shown that it is possible to render feasible recovery gait strategies that achieve minimal deviation to desired locomotion directives with a few evaluations (trials). For instance, it is possible to generate viable recovery gait strategies reaching 2.5 cm. Our results are the potential to enable efficient adaptation to new conditions and to explore further the canonical representations for adaptation in robotic locomotion problems.


Multinomial Naїve Bayes' For Documents Classification and Natural Language Processing (NLP)

#artificialintelligence

It's formulated as several methods, widely used as an alternative to the distance-based K-Means clustering and decision tree forests, and deals with probability as the "likelihood" that data belongs to a specific class. The Gaussian and Multinomial models of the naïve Bayes exist. The multinomial model provides an ability to classify data, that cannot be represented numerically. Its main advantage is the significantly reduced complexity. It provides an ability to perform the classification, using small training sets, not requiring to be continuously re-trained.