haskell
Representation of Molecules via Algebraic Data Types : Advancing Beyond SMILES & SELFIES
Goldstein, Oliver, March, Samuel
We introduce a novel molecular representation through Algebraic Data Types (ADTs) - composite data structures formed through the combination of simpler types that obey algebraic laws. By explicitly considering how the datatype of a representation constrains the operations which may be performed, we ensure meaningful inference can be performed over generative models (programs with sample} and score operations). This stands in contrast to string-based representations where string-type operations may only indirectly correspond to chemical and physical molecular properties, and at worst produce nonsensical output. The ADT presented implements the Dietz representation for molecular constitution via multigraphs and bonding systems, and uses atomic coordinate data to represent 3D information and stereochemical features. This creates a general digital molecular representation which surpasses the limitations of the string-based representations and the 2D-graph based models on which they are based. In addition, we present novel support for quantum information through representation of shells, subshells, and orbitals, greatly expanding the representational scope beyond current approaches, for instance in Molecular Orbital theory. The framework's capabilities are demonstrated through key applications: Bayesian probabilistic programming is demonstrated through integration with LazyPPL, a lazy probabilistic programming library; molecules are made instances of a group under rotation, necessary for geometric learning techniques which exploit the invariance of molecular properties under different representations; and the framework's flexibility is demonstrated through an extension to model chemical reactions. After critiquing previous representations, we provide an open-source solution in Haskell - a type-safe, purely functional programming language.
A Behavior Tree-inspired programming language for autonomous agents
We propose a design for a functional programming language for autonomous agents, built off the ideas and motivations of Behavior Trees (BTs). BTs are a popular model for designing agents behavior in robotics and AI. However, as their growth has increased dramatically, the simple model of BTs has come to be limiting. There is a growing push to increase the functionality of BTs, with the end goal of BTs evolving into a programming language in their own right, centred around the defining BT properties of modularity and reactiveness. In this paper, we examine how the BT model must be extended in order to grow into such a language. We identify some fundamental problems which must be solved: implementing `reactive' selection, 'monitoring' safety-critical conditions, and passing data between actions. We provide a variety of small examples which demonstrate that these problems are complex, and that current BT approaches do not handle them in a manner consistent with modularity. We instead provide a simple set of modular programming primitives for handling these use cases, and show how they can be combined to build complex programs. We present a full specification for our BT-inspired language, and give an implementation in the functional programming language Haskell. Finally, we demonstrate our language by translating a large and complex BT into a simple, unambiguous program.
Investigating the Performance of Language Models for Completing Code in Functional Programming Languages: a Haskell Case Study
van Dam, Tim, van der Heijden, Frank, de Bekker, Philippe, Nieuwschepen, Berend, Otten, Marc, Izadi, Maliheh
Language model-based code completion models have quickly grown in use, helping thousands of developers write code in many different programming languages. However, research on code completion models typically focuses on imperative languages such as Python and JavaScript, which results in a lack of representation for functional programming languages. Consequently, these models often perform poorly on functional languages such as Haskell. To investigate whether this can be alleviated, we evaluate the performance of two language models for code, CodeGPT and UniXcoder, on the functional programming language Haskell. We fine-tune and evaluate the models on Haskell functions sourced from a publicly accessible Haskell dataset on HuggingFace. Additionally, we manually evaluate the models using our novel translated HumanEval dataset. Our automatic evaluation shows that knowledge of imperative programming languages in the pre-training of LLMs may not transfer well to functional languages, but that code completion on functional languages is feasible. Consequently, this shows the need for more high-quality Haskell datasets. A manual evaluation on HumanEval-Haskell indicates CodeGPT frequently generates empty predictions and extra comments, while UniXcoder more often produces incomplete or incorrect predictions. Finally, we release HumanEval-Haskell, along with the fine-tuned models and all code required to reproduce our experiments on GitHub (https://github.com/AISE-TUDelft/HaskellCCEval).
An efficient, provably exact, practical algorithm for the 0-1 loss linear classification problem
He, Xi, Rahman, Waheed Ul, Little, Max A.
There has been an increasing trend to leverage machine learning (ML) for high-stakes prediction applications that deeply impact human lives. Many of these ML models are "black boxes" with highly complex, inscrutable functional forms. In high-stakes applications such as healthcare and criminal justice, black box ML predictions have incorrectly denied parole [Wexler, 2017], misclassified highly polluted air as safe to breathe [McGough, 2018], and suggested poor allocation of valuable, limited resources in medicine and energy reliability [Varshney and Alemzadeh, 2017]. In such high-stakes applications of ML, we always want the best possible prediction, and we want to know how the model makes these predictions so that we can be confident the predictions are meaningful [Rudin, 2022]. In short, the ideal model is simple enough to be easily understood (interpretable), and optimally accurate (exact). Hence, in high-stakes applications of ML, we always want the best possible prediction, and we want to know how the model makes these predictions so that we can be confident the predictions are meaningful. In short, the ideal model is simple enough to understand and optimally accurate, then our interpretations of the results can be faithful to what our model actually computes. Another compelling reason why simple models are preferable is because such low complexity models usually provide better statistical generality, in the sense that a classifier fit to some training dataset, will work well on another dataset drawn from the same distribution to which we do not have access (works well out-of-sample). The VC dimension is a key measure of the complexity of a classification model.
Measuring The Impact Of Programming Language Distribution
Orlanski, Gabriel, Xiao, Kefan, Garcia, Xavier, Hui, Jeffrey, Howland, Joshua, Malmaud, Jonathan, Austin, Jacob, Singh, Rishabh, Catasta, Michele
Current benchmarks for evaluating neural code models focus on only a small subset of programming languages, excluding many popular languages such as Go or Rust. To ameliorate this issue, we present the BabelCode framework for execution-based evaluation of any benchmark in any language. BabelCode enables new investigations into the qualitative performance of models' memory, runtime, and individual test case results. Additionally, we present a new code translation dataset called Translating Python Programming Puzzles (TP3) from the Python Programming Puzzles (Schuster et al. 2021) benchmark that involves translating expert-level python functions to any language. With both BabelCode and the TP3 benchmark, we investigate if balancing the distributions of 14 languages in a training dataset improves a large language model's performance on low-resource languages. Training a model on a balanced corpus results in, on average, 12.34% higher $pass@k$ across all tasks and languages compared to the baseline. We find that this strategy achieves 66.48% better $pass@k$ on low-resource languages at the cost of only a 12.94% decrease to high-resource languages. In our three translation tasks, this strategy yields, on average, 30.77% better low-resource $pass@k$ while having 19.58% worse high-resource $pass@k$.
Ownership of AI-Generated Code Hotly Disputed G.R. Jenkin & Associates
Ownership of AI-Generated Code Hotly Disputed Share Search: Explore by topic FOR THE TECHNOLOGY INSIDER Topics Follow IEEE Spectrum Support IEEE Spectrum IEEE Spectrum is the flagship publication of the IEEE -- the world's largest professional organization devoted to engineering and applied sciences. Our articles, podcasts, and infographics inform our readers about developments in technology, engineering, and science. A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. IEEE websites place cookies on your device to give you the best user experience. By using our websites, you agree to the placement of these cookies. To learn more, read our Privacy Policy. Enjoy more free content and benefits by creating an account Saving articles to read later requires an IEEE Spectrum account The Institute content is only available for members Downloading full PDF issues is exclusive for IEEE Members Access to Spectrum's Digital Edition is exclusive for IEEE Members Following topics is a feature exclusive for IEEE Members Adding your response to an article requires an IEEE Spectrum account Create an account to access more content and features on IEEE Spectrum, including the ability to save articles to read later, download Spectrum Collections, and participate in conversations with readers and editors. For more exclusive content and features, consider Joining IEEE . Join the world's largest professional organization devoted to engineering and applied sciences and get access to all of Spectrum's articles, archives, PDF downloads, and other benefits. Learn more Close Access Thousands of Articles -- Completely Free Create an account and get exclusive content and features: Save articles, download collections, and talk to tech insiders -- all free!
What programming language for artificial intelligence is the best? (2022) - Dataconomy
What programming language for artificial intelligence is suitable for you? It is a crucial question for your company's future. Every major tech business and even startups are working on artificial intelligence (AI), which has emerged as one of the hottest issues and largest study disciplines. It's a tremendously broad topic that covers anything from simple calculators and self-driving cars to intelligent robots that could fundamentally alter the course of human history. The core of AI is creating machines that are as intelligent as or more intelligent than humans. Better AI solutions are continuously being sought after by businesses. IDC projects that the market for artificial intelligence will reach $500 billion by 2024, with a five-year CAGR of 17.5% and total revenue of $554.3 billion.
Models of Generics and Metaprogramming: Go, Rust, Swift, D and More - Tristan Hume
In some domains of programming it's common to want to write a data structure or algorithm that can work with elements of many different types, such as a generic list or a sorting algorithm that only needs a comparison function. Different programming languages have come up with all sorts of solutions to this problem: From just pointing people to existing general features that can be useful for the purpose (e.g C, Go) to generics systems so powerful they become Turing-complete (e.g. In this post I'm going to take you on a tour of the generics systems in many different languages and how they are implemented. I'll start from how languages without a special generics system like C solve the problem and then I'll show how gradually adding extensions in different directions leads to the systems found in other languages. One reason I think generics are an interesting case is that they're a simple case of the general problem of metaprogramming: writing programs that can generate classes of other programs. As evidence I'll describe how three different fully general metaprogramming methods can be seen as extensions from different directions in the space of generics systems: dynamic languages like Python, procedural macro systems like Template Haskell, and staged compilation like Zig and Terra.
5 Best Programming Languages for AI
You might ask yourself questions such as what is the fastest path to a career in AI, or what is the best programming language for AI? The answer to these questions will depend on your knowledge and experience, the type of AI project you are interested in, and current industry trends. There is currently no dedicated AI language dedicated to this area of technology, but it does support many popular programming languages. However, in order to increase your chances of quickly launching a career in AI, you need to learn AI programming languages that are supported by several machine learning (ML) and deep learning libraries. For AI programming languages, Python is leading the way with its unparalleled community support and pre-built libraries that help accelerate AI development.