This library fills an important void in the ever-growing python-based machine learning ecosystem, where users can only use predefined kernels and are not able to customize or extend them for their own applications, that demand great flexibility owing to their diversity and need for better performing kernel. This library defines the KernelMatrix class that is central to all the kernel methods and machines. As the KernelMatrix class is a key bridge between input data and the various kernel learning algorithms, it is designed to be highly usable and extensible to different applications and data types. Besides being able to apply basic kernels on a given sample (to produce a KernelMatrix), this library provides various kernel operations, such as normalization, centering, product, alignment evaluation, linear combination and ranking (by various performance metrics) of kernel matrices. In addition, we provide several convenient classes, such as KernelSet and KernelBucket for easy management of a large collection of kernels.
The problem of accurately measuring the similarity between graphs is at the core of many applications in a variety of disciplines. Graph kernels have recently emerged as a promising approach to this problem. There are now many kernels, each focusing on different structural aspects of graphs. Here, we present GraKeL, a library that unifies several graph kernels into a common framework. The library is written in Python and is build on top of scikit-learn. It is simple to use and can be naturally combined with scikit-learn's modules to build a complete machine learning pipeline for tasks such as graph classification and clustering. The code is BSD licensed and is available at: https://github.com/ysig/GraKeL.
Tick is a statistical learning library for Python~3, with a particular emphasis on time-dependent models, such as point processes, and tools for generalized linear models and survival analysis. The core of the library is an optimization module providing model computational classes, solvers and proximal operators for regularization. tick relies on a C++ implementation and state-of-the-art optimization algorithms to provide very fast computations in a single node multi-core setting. Source code and documentation can be downloaded from https://github.com/X-DataInitiative/tick
Since they better capture complex traits in the sequences, string kernels often achieve better prediction performance. RNA interference is an important biological mechanism with many therapeutical applications, where strings can be used to represent target messenger RNAs and initiating short RNAs and string kernels can be applied for learning and prediction. However, existing string kernels are not particularly developed for RNA applications. Moreover, most existing string kernels are n-gram based and suffer from high dimensionality and inability of preserving subsequence orderings. We propose a randomized string kernel for use with support vector regression with a purpose of better predicting silencing efficacy scores for the candidate sequences and eventually improving the efficiency of biological experiments. We show the positive definiteness of this kernel and give an analysis of randomization error rates. Empirical results on biological data demonstrate that the proposed kernel performed better than existing string kernels and achieved significant improvements over kernels computed from numerical descriptors extracted according to structural and thermodynamic rules. In addition, it is computationally more efficient.