Country
Artificial Intelligence? Start Investing Now, Says Foundation Group
When it comes to artificial intelligence, the most important thing is to start investing now, says W.K. Kellogg Foundation's investments director, Neal Graziano. And that's exactly what his organization is doing, along with the Robert Wood Johnson Foundation, institutional investor network Trusted Insight, and European family offices. Together, the groups have directed a $27 million investment, announced last week, into a German venture capital firm that seeds machine learning companies. Founded in 2016, the Berlin-based Merantix is a lean operation. But it has already built three European companies, including Vara, which is Germany's first approved AI software for cancer screening and is touted to have a better accuracy rate than radiology.
The age of AI: supercharging Europe's tech transformation
Irish company SoapBox Labs says it has developed the world's most accurate and safe voice recognition technology for children. Founded in 2013, the company uses artificial intelligence tailored to help kids play and learn more immersively using their own voices. "While people are probably familiar with speech recognition for adults, in voice assistance or in smart speakers, our technology has been built specifically for children, and that's modelling their voices and speech behaviours using state of the art AI technology,"explains SoapBox Labs CEO, Patricia Scanlon. The company's technology has attracted the tech giants. The firm is already working with Microsoft to boost child literacy and tackle problems surrounding child data privacy.
Time-aware Large Kernel Convolutions
Lioutas, Vasileios, Guo, Yuhong
To date, most state-of-the-art sequence modelling architectures use attention to build generative models for language based tasks. Some of these models use all the available sequence tokens to generate an attention distribution which results in time complexity of $O(n^2)$. Alternatively, they utilize depthwise convolutions with softmax normalized kernels of size $k$ acting as a limited-window self-attention, resulting in time complexity of $O(k{\cdot}n)$. In this paper, we introduce Time-aware Large Kernel (TaLK) Convolutions, a novel adaptive convolution operation that learns to predict the size of a summation kernel instead of using the fixed-sized kernel matrix. This method yields a time complexity of $O(n)$, effectively making the sequence encoding process linear to the number of tokens. We evaluate the proposed method on large-scale standard machine translation and language modelling datasets and show that TaLK Convolutions constitute an efficient improvement over other attention/convolution based approaches.
Improving S&P stock prediction with time series stock similarity
Stock market prediction with forecasting algorithms is a popular topic these days where most of the forecasting algorithms train only on data collected on a particular stock. In this paper, we enriched the stock data with related stocks just as a professional trader would have done to improve the stock prediction models. We tested five different similarities functions and found co-integration similarity to have the best improvement on the prediction model. We evaluate the models on seven S&P stocks from various industries over five years period. The prediction model we trained on similar stocks had significantly better results with 0.55 mean accuracy, and 19.782 profit compare to the state of the art model with an accuracy of 0.52 and profit of 6.6.
Segmented Graph-Bert for Graph Instance Modeling
In graph instance representation learning, both the diverse graph instance sizes and the graph node orderless property have been the major obstacles that render existing representation learning models fail to work. In this paper, we will examine the effectiveness of GRAPH-BERT on graph instance representation learning, which was designed for node representation learning tasks originally. To adapt GRAPH-BERT to the new problem settings, we re-design it with a segmented architecture instead, which is also named as SEG-BERT (Segmented GRAPH-BERT) for reference simplicity in this paper. SEG-BERT involves no node-order-variant inputs or functional components anymore, and it can handle the graph node orderless property naturally. What's more, SEG-BERT has a segmented architecture and introduces three different strategies to unify the graph instance sizes, i.e., full-input, padding/pruning and segment shifting, respectively. SEG-BERT is pre-trainable in an unsupervised manner, which can be further transferred to new tasks directly or with necessary fine-tuning. We have tested the effectiveness of SEG-BERT with experiments on seven graph instance benchmark datasets, and SEG-BERT can out-perform the comparison methods on six out of them with significant performance advantages.
On the Complexity of Minimizing Convex Finite Sums Without Using the Indices of the Individual Functions
Arjevani, Yossi, Daniely, Amit, Jegelka, Stefanie, Lin, Hongzhou
Recent advances in randomized incremental methods for minimizing $L$-smooth $\mu$-strongly convex finite sums have culminated in tight complexity of $\tilde{O}((n+\sqrt{n L/\mu})\log(1/\epsilon))$ and $O(n+\sqrt{nL/\epsilon})$, where $\mu>0$ and $\mu=0$, respectively, and $n$ denotes the number of individual functions. Unlike incremental methods, stochastic methods for finite sums do not rely on an explicit knowledge of which individual function is being addressed at each iteration, and as such, must perform at least $\Omega(n^2)$ iterations to obtain $O(1/n^2)$-optimal solutions. In this work, we exploit the finite noise structure of finite sums to derive a matching $O(n^2)$-upper bound under the global oracle model, showing that this lower bound is indeed tight. Following a similar approach, we propose a novel adaptation of SVRG which is both \emph{compatible with stochastic oracles}, and achieves complexity bounds of $\tilde{O}((n^2+n\sqrt{L/\mu})\log(1/\epsilon))$ and $O(n\sqrt{L/\epsilon})$, for $\mu>0$ and $\mu=0$, respectively. Our bounds hold w.h.p. and match in part existing lower bounds of $\tilde{\Omega}(n^2+\sqrt{nL/\mu}\log(1/\epsilon))$ and $\tilde{\Omega}(n^2+\sqrt{nL/\epsilon})$, for $\mu>0$ and $\mu=0$, respectively.
Local Nonparametric Meta-Learning
A central goal of meta-learning is to find a learning rule that enables fast adaptation across a set of tasks, by learning the appropriate inductive bias for that set. Most meta-learning algorithms try to find a \textit{global} learning rule that encodes this inductive bias. However, a global learning rule represented by a fixed-size representation is prone to meta-underfitting or -overfitting since the right representational power for a task set is difficult to choose a priori. Even when chosen correctly, we show that global, fixed-size representations often fail when confronted with certain types of out-of-distribution tasks, even when the same inductive bias is appropriate. To address these problems, we propose a novel nonparametric meta-learning algorithm that utilizes a meta-trained local learning rule, building on recent ideas in attention-based and functional gradient-based meta-learning. In several meta-regression problems, we show improved meta-generalization results using our local, nonparametric approach and achieve state-of-the-art results in the robotics benchmark, Omnipush.
Privacy-Preserving Image Classification in the Local Setting
Image data has been greatly produced by individuals and commercial vendors in the daily life, and it has been used across various domains, like advertising, medical and traffic analysis. Recently, image data also appears to be greatly important in social utility, like emergency response. However, the privacy concern becomes the biggest obstacle that prevents further exploration of image data, due to that the image could reveal sensitive information, like the personal identity and locations. The recent developed Local Differential Privacy (LDP) brings us a promising solution, which allows the data owners to randomly perturb their input to provide the plausible deniability of the data before releasing. In this paper, we consider a two-party image classification problem, in which data owners hold the image and the untrustworthy data user would like to fit a machine learning model with these images as input. To protect the image privacy, we propose to locally perturb the image representation before revealing to the data user. Subsequently, we analyze how the perturbation satisfies {\epsilon}-LDP and affect the data utility regarding count-based and distance-based machine learning algorithm, and propose a supervised image feature extractor, DCAConv, which produces an image representation with scalable domain size. Our experiments show that DCAConv could maintain a high data utility while preserving the privacy regarding multiple image benchmark datasets.
Composing Molecules with Multiple Property Constraints
Jin, Wengong, Barzilay, Regina, Jaakkola, Tommi
Drug discovery aims to find novel compounds with specified chemical property profiles. In terms of generative modeling, the goal is to learn to sample molecules in the intersection of multiple property constraints. This task becomes increasingly challenging when there are many property constraints. We propose to offset this complexity by composing molecules from a vocabulary of substructures that we call molecular rationales. These rationales are identified from molecules as substructures that are likely responsible for each property of interest. We then learn to expand rationales into a full molecule using graph generative models. Our final generative model composes molecules as mixtures of multiple rationale completions, and this mixture is fine-tuned to preserve the properties of interest. We evaluate our model on various drug design tasks and demonstrate significant improvements over state-of-the-art baselines in terms of accuracy, diversity, and novelty of generated compounds.
Multi-task Reinforcement Learning with a Planning Quasi-Metric
Micheli, Vincent, Sinnathamby, Karthigan, Fleuret, François
We introduce a new reinforcement learning approach combining a planning quasi-metric (PQM) that estimates the number of actions required to go from a state to another, with task-specific planners that compute a target state to reach a given goal. The main advantage of this decomposition is to allow the sharing across tasks of a task-agnostic model of the quasi-metric that captures the environment's dynamics and can be learned in a dense and unsupervised manner. We demonstrate the usefulness of this approach on the standard bit-flip problem and in the MuJoCo robotic arm simulator.