Goto

Collaborating Authors

 type system


Hey Pentti, We Did It Again!: Differentiable vector-symbolic types that prove polynomial termination

Tomkins-Flanagan, Eilene, Hanley, Connor, Kelly, Mary A.

arXiv.org Artificial Intelligence

We present a typed computer language, Doug, in which all typed programs may be proved to halt in polynomial time, encoded in a vector-symbolic architecture (VSA). Doug is just an encoding of the light linear functional programming language (LLFPL) described by (Schimanski2009, ch. 7). The types of Doug are encoded using a slot-value encoding scheme based on holographic declarative memory (HDM; Kelly, 2020). The terms of Doug are encoded using a variant of the Lisp VSA defined by (Flanagan, 2024). Doug allows for some points on the embedding space of a neural network to be interpreted as types, where the types of nearby points are similar both in structure and content. Types in Doug are therefore learnable by a neural network. Following (Chollet, 2019), (Card, 1983), and (Newell, 1981), we view skill as the application of a procedure, or program of action, that causes a goal to be satisfied. Skill acquisition may therefore be expressed as program synthesis. Using Doug, we hope to describe a form of learning of skilled behaviour that follows a human-like pace of skill acquisition (i.e., substantially faster than brute force; Heathcote, 2000), exceeding the efficiency of all currently existing approaches (Kaplan, 2020; Jones, 2021; Chollet, 2024). Our approach brings us one step closer to modeling human mental representations, as they must actually exist in the brain, and those representations' acquisition, as they are actually learned.


Learning to Guarantee Type Correctness in Code Generation through Type-Guided Program Synthesis

Huang, Zhechong, Zhang, Zhao, Ji, Ruyi, Xia, Tingxuan, Zhu, Qihao, Cao, Qinxiang, Sun, Zeyu, Xiong, Yingfei

arXiv.org Artificial Intelligence

Language models have shown remarkable proficiency in code generation; nevertheless, ensuring type correctness remains a challenge. Although traditional methods, such as constrained decoding, alleviate this problem by externally rejecting untypable code, the model itself does not effectively learn type reasoning internally, which ultimately limits its overall performance. This paper introduces TyFlow, a novel system that internalizes type reasoning within code generation to guide the model to learn the type system. The core of our approach is a novel type-guided program synthesis system that maintains an isomorphism between type derivation trees and synthesis derivation trees, enabling a new code representation based on synthesis decision sequences rather than traditional text-based token sequences. By offloading the complexity of type system learning to the representation itself, models can redirect their computational resources toward higher-level program semantics. Our evaluation shows that TyFlow not only eliminates type errors but also significantly improves functional correctness, highlighting the importance of aligning LMs with type systems internally.


incorporate all other feedback in the reviews into the paper's final version

Neural Information Processing Systems

Thank you for your constructive feedback. We address the reviewers' main points below; however, we will also The motivation is unclear (R3, R4), in particular, as differentiable programming is not new (R4). The problem of searching in spaces of program architectures is also well-studied in classical program synthesis. Our paper's new observation, as noted by Prior efforts in machine learning research have used these datasets -- see "Generating Multi-Agent We will add further clarifying details about the DSL in the final version.


Oruga: An Avatar of Representational Systems Theory

Raggi, Daniel, Stapleton, Gem, Jamnik, Mateja, Stockdill, Aaron, Garcia, Grecia Garcia, Cheng, Peter C-H.

arXiv.org Artificial Intelligence

Humans use representations flexibly. We draw diagrams, change representations and exploit creative analogies across different domains. We want to harness this kind of power and endow machines with it to make them more compatible with human use. Previously we developed Representational Systems Theory (RST) to study the structure and transformations of representations. In this paper we present Oruga (caterpillar in Spanish; a symbol of transformation), an implementation of various aspects of RST. Oruga consists of a core of data structures corresponding to concepts in RST, a language for communicating with the core, and an engine for producing transformations using a method we call structure transfer. In this paper we present an overview of the core and language of Oruga, with a brief example of the kind of transformation that structure transfer can execute.


Inferring Pluggable Types with Machine Learning

Siddiqui, Kazi Amanul Islam, Kellogg, Martin

arXiv.org Artificial Intelligence

Pluggable type systems allow programmers to extend the type system of a programming language to enforce semantic properties defined by the programmer. Pluggable type systems are difficult to deploy in legacy codebases because they require programmers to write type annotations manually. This paper investigates how to use machine learning to infer type qualifiers automatically. We propose a novel representation, NaP-AST, that encodes minimal dataflow hints for the effective inference of type qualifiers. We evaluate several model architectures for inferring type qualifiers, including Graph Transformer Network, Graph Convolutional Network and Large Language Model. We further validated these models by applying them to 12 open-source programs from a prior evaluation of the NullAway pluggable typechecker, lowering warnings in all but one unannotated project. We discovered that GTN shows the best performance, with a recall of .89 and precision of 0.6. Furthermore, we conduct a study to estimate the number of Java classes needed for good performance of the trained model. For our feasibility study, performance improved around 16k classes, and deteriorated due to overfitting around 22k classes.


FuzzyLogic.jl: a Flexible Library for Efficient and Productive Fuzzy Inference

Ferranti, Luca, Boutellier, Jani

arXiv.org Artificial Intelligence

This paper introduces \textsc{FuzzyLogic.jl}, a Julia library to perform fuzzy inference. The library is fully open-source and released under a permissive license. The core design principles of the library are: user-friendliness, flexibility, efficiency and interoperability. Particularly, our library is easy to use, allows to specify fuzzy systems in an expressive yet concise domain specific language, has several visualization tools, supports popular inference systems like Mamdani, Sugeno and Type-2 systems, can be easily expanded with custom user settings or algorithms and can perform fuzzy inference efficiently. It also allows reading fuzzy models from other formats such as Matlab .fis, FCL or FML. In this paper, we describe the library main features and benchmark it with a few examples, showing it achieves significant speedup compared to the Matlab fuzzy toolbox.


A Practical Entity Linking System for Tables in Scientific Literature

Mulwad, Varish, Finin, Tim, Kumar, Vijay S., Williams, Jenny Weisenberg, Dixit, Sharad, Joshi, Anupam

arXiv.org Artificial Intelligence

Entity linking is an important step towards constructing knowledge graphs that facilitate advanced question answering over scientific documents, including the retrieval of relevant information included in tables within these documents. This paper introduces a general-purpose system for linking entities to items in the Wikidata knowledge base. It describes how we adapt this system for linking domain-specific entities, especially for those entities embedded within tables drawn from COVID-19-related scientific literature. We describe the setup of an efficient offline instance of the system that enables our entity-linking approach to be more feasible in practice. As part of a broader approach to infer the semantic meaning of scientific tables, we leverage the structural and semantic characteristics of the tables to improve overall entity linking performance.


Extreme Classification for Answer Type Prediction in Question Answering

Setty, Vinay

arXiv.org Artificial Intelligence

Semantic answer type prediction (SMART) is known to be a useful step towards effective question answering (QA) systems. The SMART task involves predicting the top-$k$ knowledge graph (KG) types for a given natural language question. This is challenging due to the large number of types in KGs. In this paper, we propose use of extreme multi-label classification using Transformer models (XBERT) by clustering KG types using structural and semantic features based on question text. We specifically improve the clustering stage of the XBERT pipeline using textual and structural features derived from KGs. We show that these features can improve end-to-end performance for the SMART task, and yield state-of-the-art results.


Safe-DS: A Domain Specific Language to Make Data Science Safe

Reimann, Lars, Kniesel-Wünsche, Günter

arXiv.org Artificial Intelligence

Due to the long runtime of Data Science (DS) pipelines, even small programming mistakes can be very costly, if they are not detected statically. However, even basic static type checking of DS pipelines is difficult because most are written in Python. Static typing is available in Python only via external linters. These require static type annotations for parameters or results of functions, which many DS libraries do not provide. In this paper, we show how the wealth of Python DS libraries can be used in a statically safe way via Safe-DS, a domain specific language (DSL) for DS. Safe-DS catches conventional type errors plus errors related to range restrictions, data manipulation, and call order of functions, going well beyond the abilities of current Python linters. Python libraries are integrated into Safe-DS via a stub language for specifying the interface of its declarations, and an API-Editor that is able to extract type information from the code and documentation of Python libraries, and automatically generate suitable stubs. Moreover, Safe-DS complements textual DS pipelines with a graphical representation that eases safe development by preventing syntax errors. The seamless synchronization of textual and graphic view lets developers always choose the one best suited for their skills and current task. We think that Safe-DS can make DS development easier, faster, and more reliable, significantly reducing development costs.


Fast and Correct Gradient-Based Optimisation for Probabilistic Programming via Smoothing

Khajwal, Basim, Ong, C. -H. Luke, Wagner, Dominik

arXiv.org Artificial Intelligence

We study the foundations of variational inference, which frames posterior inference as an optimisation problem, for probabilistic programming. The dominant approach for optimisation in practice is stochastic gradient descent. In particular, a variant using the so-called reparameterisation gradient estimator exhibits fast convergence in a traditional statistics setting. Unfortunately, discontinuities, which are readily expressible in programming languages, can compromise the correctness of this approach. We consider a simple (higher-order, probabilistic) programming language with conditionals, and we endow our language with both a measurable and a smoothed (approximate) value semantics. We present type systems which establish technical pre-conditions. Thus we can prove stochastic gradient descent with the reparameterisation gradient estimator to be correct when applied to the smoothed problem. Besides, we can solve the original problem up to any error tolerance by choosing an accuracy coefficient suitably. Empirically we demonstrate that our approach has a similar convergence as a key competitor, but is simpler, faster, and attains orders of magnitude reduction in work-normalised variance.