derivation rule
EcoSearch: A Constant-Delay Best-First Search Algorithm for Program Synthesis
Matricon, Théo, Fijalkow, Nathanaël, Lagarde, Guillaume
Many approaches to program synthesis perform a combinatorial search within a large space of programs to find one that satisfies a given specification. To tame the search space blowup, previous works introduced probabilistic and neural approaches to guide this combinatorial search by inducing heuristic cost functions. Best-first search algorithms ensure to search in the exact order induced by the cost function, significantly reducing the portion of the program space to be explored. We present a new best-first search algorithm called EcoSearch, which is the first constant-delay algorithm for pre-generation cost function: the amount of compute required between outputting two programs is constant, and in particular does not increase over time. This key property yields important speedups: we observe that EcoSearch outperforms its predecessors on two classic domains.
- Europe > France > Nouvelle-Aquitaine > Gironde > Bordeaux (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
A Natural Language Processing Approach to Support Biomedical Data Harmonization: Leveraging Large Language Models
Li, Zexu, Prabhu, Suraj P., Popp, Zachary T., Jain, Shubhi S., Balakundi, Vijetha, Ang, Ting Fang Alvin, Au, Rhoda, Chen, Jinying
Biomedical research requires large, diverse samples to produce unbiased results. Automated methods for matching variables across datasets can accelerate this process. Research in this area has been limited, primarily focusing on lexical matching and ontology based semantic matching. We aimed to develop new methods, leveraging large language models (LLM) and ensemble learning, to automate variable matching. Methods: We utilized data from two GERAS cohort (European and Japan) studies to develop variable matching methods. We first manually created a dataset by matching 352 EU variables with 1322 candidate JP variables, where matched variable pairs were positive and unmatched pairs were negative instances. Using this dataset, we developed and evaluated two types of natural language processing (NLP) methods, which matched variables based on variable labels and definitions from data dictionaries: (1) LLM-based and (2) fuzzy matching. We then developed an ensemble-learning method, using the Random Forest model, to integrate individual NLP methods. RF was trained and evaluated on 50 trials. Each trial had a random split (4:1) of training and test sets, with the model's hyperparameters optimized through cross-validation on the training set. For each EU variable, 1322 candidate JP variables were ranked based on NLP-derived similarity scores or RF's probability scores, denoting their likelihood to match the EU variable. Ranking performance was measured by top-n hit ratio (HRn) and mean reciprocal rank (MRR). Results:E5 performed best among individual methods, achieving 0.90 HR-30 and 0.70 MRR. RF performed better than E5 on all metrics over 50 trials (P less than 0.001) and achieved an average HR 30 of 0.98 and MRR of 0.73. LLM-derived features contributed most to RF's performance. One major cause of errors in automatic variable matching was ambiguous variable definitions within data dictionaries.
- Asia > Japan (0.25)
- North America > United States > Massachusetts > Suffolk County > Boston (0.05)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (5 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.70)
- Health & Medicine > Epidemiology (0.68)
Planning with OWL-DL Ontologies (Extended Version)
John, Tobias, Koopmann, Patrick
We introduce ontology-mediated planning, in which planning problems are combined with an ontology. Our formalism differs from existing ones in that we focus on a strong separation of the formalisms for describing planning problems and ontologies, which are only losely coupled by an interface. Moreover, we present a black-box algorithm that supports the full expressive power of OWL DL. This goes beyond what existing approaches combining automated planning with ontologies can do, which only support limited description logics such as DL-Lite and description logics that are Horn. Our main algorithm relies on rewritings of the ontology-mediated planning specifications into PDDL, so that existing planning systems can be used to solve them. The algorithm relies on justifications, which allows for a generic approach that is independent of the expressivity of the ontology language. However, dedicated optimizations for computing justifications need to be implemented to enable an efficient rewriting procedure. We evaluated our implementation on benchmark sets from several domains. The evaluation shows that our procedure works in practice and that tailoring the reasoning procedure has significant impact on the performance.
- Europe > Austria > Vienna (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Norway > Eastern Norway > Oslo (0.04)
- (7 more...)
Towards Ontology-Mediated Planning with OWL DL Ontologies (Extended Version)
John, Tobias, Koopmann, Patrick
While classical planning languages make the closed-domain and closed-world assumption, there have been various approaches to extend those with DL reasoning, which is then interpreted under the usual open-world semantics. Current approaches for planning with DL ontologies integrate the DL directly into the planning language, and practical approaches have been developed based on first-order rewritings or rewritings into datalog. We present here a new approach in which the planning specification and ontology are kept separate, and are linked together using an interface. This allows planning experts to work in a familiar formalism, while existing ontologies can be easily integrated and extended by ontology experts. Our approach for planning with those ontology-mediated planning problems is optimized for cases with comparatively small domains, and supports the whole OWL DL fragment. The idea is to rewrite the ontology-mediated planning problem into a classical planning problem to be processed by existing planning tools. Different to other approaches, our rewriting is data-dependent. A first experimental evaluation of our approach shows the potential and limitations of this approach.
- Europe > Norway > Eastern Norway > Oslo (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (4 more...)
A Template-based Method for Constrained Neural Machine Translation
Wang, Shuo, Li, Peng, Tan, Zhixing, Tu, Zhaopeng, Sun, Maosong, Liu, Yang
Machine translation systems are expected to cope with various types of constraints in many practical scenarios. While neural machine translation (NMT) has achieved strong performance in unconstrained cases, it is non-trivial to impose pre-specified constraints into the translation process of NMT models. Although many approaches have been proposed to address this issue, most existing methods can not satisfy the following three desiderata at the same time: (1) high translation quality, (2) high match accuracy, and (3) low latency. In this work, we propose a template-based method that can yield results with high translation quality and match accuracy and the inference speed of our method is comparable with unconstrained NMT models. Our basic idea is to rearrange the generation of constrained and unconstrained tokens through a template. Our method does not require any changes in the model architecture and the decoding algorithm. Experimental results show that the proposed template-based approach can outperform several representative baselines in both lexically and structurally constrained translation tasks.
OLLIE: Derivation-based Tensor Program Optimizer
Zheng, Liyan, Wang, Haojie, Zhai, Jidong, Hu, Muyan, Ma, Zixuan, Wang, Tuowei, Tang, Shizhi, Xie, Lei, Huang, Kezhao, Jia, Zhihao
Boosting the runtime performance of deep neural networks (DNNs) is critical due to their wide adoption in real-world tasks. Existing approaches to optimizing the tensor algebra expression of a DNN only consider expressions representable by a fixed set of predefined operators, missing possible optimization opportunities between general expressions. We propose OLLIE, the first derivation-based tensor program optimizer. OLLIE optimizes tensor programs by leveraging transformations between general tensor algebra expressions, enabling a significantly larger expression search space that includes those supported by prior work as special cases. OLLIE uses a hybrid derivation-based optimizer that effectively combines explorative and guided derivations to quickly discover highly optimized expressions. Evaluation on seven DNNs shows that OLLIE can outperform existing optimizers by up to 2.73$\times$ (1.46$\times$ on average) on an A100 GPU and up to 2.68$\times$ (1.51$\times$) on a V100 GPU, respectively.
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > Canada (0.04)
- Asia > Middle East > Saudi Arabia > Riyadh Province > Riyadh (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (0.89)
Scaling Neural Program Synthesis with Distribution-based Search
Fijalkow, Nathanaël, Lagarde, Guillaume, Matricon, Théo, Ellis, Kevin, Ohlmann, Pierre, Potta, Akarsh
We consider the problem of automatically constructing computer programs from input-output examples. We investigate how to augment probabilistic and neural program synthesis methods with new search algorithms, proposing a framework called distribution-based search. Within this framework, we introduce two new search algorithms: Heap Search, an enumerative method, and SQRT Sampling, a probabilistic method. We prove certain optimality guarantees for both methods, show how they integrate with probabilistic and neural techniques, and demonstrate how they can operate at scale across parallel compute environments. Collectively these findings offer theoretical and applied studies of search algorithms for program synthesis that integrate with recent developments in machine-learned program synthesizers.
- Europe > France > Île-de-France > Paris > Paris (0.14)
- North America > United States > Florida > Hillsborough County > University (0.04)
- Europe > United Kingdom (0.04)
- (2 more...)
Probabilistic Grammatical Evolution
Mégane, Jessica, Lourenço, Nuno, Machado, Penousal
Grammatical Evolution (GE) is one of the most popular Genetic Programming (GP) variants, and it has been used with success in several problem domains. Since the original proposal, many enhancements have been proposed to GE in order to address some of its main issues and improve its performance. In this paper we propose Probabilistic Grammatical Evolution (PGE), which introduces a new genotypic representation and new mapping mechanism for GE. Specifically, we resort to a Probabilistic Context-Free Grammar (PCFG) where its probabilities are adapted during the evolutionary process, taking into account the productions chosen to construct the fittest individual. The genotype is a list of real values, where each value represents the likelihood of selecting a derivation rule. We evaluate the performance of PGE in two regression problems and compare it with GE and Structured Grammatical Evolution (SGE). The results show that PGE has a a better performance than GE, with statistically significant differences, and achieved similar performance when comparing with SGE.
- Europe > Portugal > Coimbra > Coimbra (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Towards the Evolution of Multi-Layered Neural Networks: A Dynamic Structured Grammatical Evolution Approach
Assunção, Filipe, Lourenço, Nuno, Machado, Penousal, Ribeiro, Bernardete
Current grammar-based NeuroEvolution approaches have several shortcomings. On the one hand, they do not allow the generation of Artificial Neural Networks (ANNs) composed of more than one hidden-layer. On the other, there is no way to evolve networks with more than one output neuron. To properly evolve ANNs with more than one hidden-layer and multiple output nodes there is the need to know the number of neurons available in previous layers. In this paper we introduce Dynamic Structured Grammatical Evolution (DSGE): a new genotypic representation that overcomes the aforementioned limitations. By enabling the creation of dynamic rules that specify the connection possibilities of each neuron, the methodology enables the evolution of multi-layered ANNs with more than one output neuron. Results in different classification problems show that DSGE evolves effective single and multi-layered ANNs, with a varying number of output neurons.
- Europe > Germany > Berlin (0.05)
- North America > United States > Wisconsin (0.04)
- Europe > Portugal > Coimbra > Coimbra (0.04)
Small Is Beautiful: Computing Minimal Equivalent EL Concepts
Nikitina, Nadeschda (University of Oxford) | Koopmann, Patrick (University of Dresden)
Rudolph 2012; Lutz, Seylan, and Wolter 2012), ontology Logics allow equivalent facts to be expressed in many different learning (Konev, Ozaki, and Wolter 2016; Lehmann and ways. The fact that ontologies are developed by a Hitzler 2010), rewriting ontologies into less expressive logics number of different people and grow over time can lead to (Carral et al. 2014; Lutz, Piro, and Wolter 2011), concepts that are more complex than necessary. For example, abduction (Du, Wang, and Shen 2015; Klarman, Endriss, below is a simplified definition of the medical concept and Schlobach 2011), and knowledge revision (Grau, Kharlamov, Clotting from the Galen ontology (Rector et al. 1994): and Zheleznyakov 2012; Qi, Liu, and Bell 2006).