Goto

Collaborating Authors

 Country


Comparison of Deep learning models on time series forecasting : a case study of Dissolved Oxygen Prediction

arXiv.org Machine Learning

Deep learning has achieved impressive prediction performance in the field of sequence learning recently. Dissolved oxygen prediction, as a kind of time-series forecasting, is suitable for this technique. Although many researchers have developed hybrid models or variant models based on deep learning techniques, there is no comprehensive and sound comparison among the deep learning models in this field currently. Plus, most previous studies focused on one-step forecasting by using a small data set. As the convenient access to high-frequency data, this paper compares multi-step deep learning forecasting by using walk-forward validation. Specifically, we test Convolutional Neural Network (CNN), Temporal Convolutional Network (TCN), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), Bidirectional Recurrent Neural Network (BiRNN) based on the real-time data recorded automatically at a fixed observation point in the Yangtze River from 2012 to 2016. By comparing the average accumulated statistical metrics of root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination in each time step, We find for multi-step time series forecasting, the average performance of each time step does not decrease linearly. GRU outperforms other models with significant advantages.


Influence-aware Memory for Deep Reinforcement Learning

arXiv.org Machine Learning

Making the right decisions when some of the state variables are hidden, involves reasoning about all the possible states of the environment. An agent receiving only partial observations needs to infer the true values of these hidden variables based on the history of experiences. Recent deep reinforcement learning methods use recurrent models to keep track of past information. However, these models are sometimes expensive to train and have convergence difficulties, especially when dealing with high dimensional input spaces. Taking inspiration from influence-based abstraction, we show that effective policies can be learned in the presence of uncertainty by only memorizing a small subset of input variables. We also incorporate a mechanism in our network that learns to automatically choose the important pieces of information that need to be remembered. The results indicate that, by forcing the agent's internal memory to focus on the selected regions while treating the rest of the observable variables as Markovian, we can outperform ordinary recurrent architectures in situations where the amount of information that the agent needs to retain represents a small fraction of the entire observation input. The method also reduces training time and obtains better scores than methods that stack multiple observations to remove partial observability in domains where long-term memory is required.


Imputing missing values with unsupervised random trees

arXiv.org Machine Learning

When designing statistical models from tabular data for supervised learning tasks such as regression or classification, oftentimes it happens that some of th e observations available for fitting such models are missing values in one or more variables, usually d ue to reasons such as poor data collection practices, loss of information, participants dropping out of a survey, or similar. Many methods such as [2] or [4] overcome this issue by using heuristics to handle missing information - decision tree methods in particular, due to their splitting nature that takes one variable at a time, are particularly well suited for implicit han dling of missing data without a-priori imputation ([16]), but other methods such as gene ralized linear models or support vector machines cannot handle missing values in the same wa y, and when using them on a dataset with missing entries, these entries have to either be dr opped or imputed. Typical strategies for imputing the missing entries include: replacing them with the column mean or median, determining the most similar observations (nearest neighbors) according to the non-missing variables and taking a simple or weighted average of the m issing variable(s) from them ([11]), producing a latent representation of the data by some low-rank matrix factorization that minimizes errors on the non-missing entries and from which the m issing entries are then reconstructed ([10]), and iterative imputation that starts with so me basic imputation for all values and then cycles through each variable by constructing a mod el to predict the missing values from the non-missing observations, replacing the earlier impu tation with the model prediction and repeating until convergence ([3], [18]).


LMLFM: Longitudinal Multi-Level Factorization Machine

arXiv.org Machine Learning

We consider the problem of learning predictive models from longitudinal data, consisting of irregularly repeated, sparse observations from a set of individuals over time. Such data often exhibit {\em longitudinal correlation} (LC) (correlations among observations for each individual over time), {\em cluster correlation} (CC) (correlations among individuals that have similar characteristics), or both. These correlations are often accounted for using {\em mixed effects models} that include {\em fixed effects} and {\em random effects}, where the fixed effects capture the regression parameters that are shared by all individuals, whereas random effects capture those parameters that vary across individuals. However, the current state-of-the-art methods are unable to select the most predictive fixed effects and random effects from a large number of variables, while accounting for complex correlation structure in the data and non-linear interactions among the variables. We propose Longitudinal Multi-Level Factorization Machine (LMLFM), to the best of our knowledge, the first model to address these challenges in learning predictive models from longitudinal data. We establish the convergence properties, and analyze the computational complexity, of LMLFM. We present results of experiments with both simulated and real-world longitudinal data which show that LMLFM outperforms the state-of-the-art methods in terms of predictive accuracy, variable selection ability, and scalability to data with large number of variables. The code and supplemental material is available at \url{https://github.com/junjieliang672/LMLFM}.


Persistent Homology as Stopping-Criterion for Natural Neighbor Interpolation

arXiv.org Machine Learning

In this study the method of natural neighbours is used to interpolate data that has been drawn from a topological space with higher homology groups on its filtration. A particular difficulty with this algorithm are the boundary points. Its core is based on the Voronoi diagram, which induces a natural dual map to the Delaunay triangulation. Advantage is taken from this fact and the persistent homology is therefore calculated after each iteration to capture the changing topology of the data. The Bottleneck and Wasserstein distance serve as a measure of quality between the original data and the interpolation. If the norm of two distances exceeds a heuristically determined threshold, the algorithm terminates. The theoretical basis for this procedure is given and the validity of this approach is justified with numerical experiments.


Deep convolutional neural networks for multi-scale time-series classification and application to disruption prediction in fusion devices

arXiv.org Machine Learning

Deep convolutional neural networks for multi-scale time-series classification and application to disruption prediction in fusion devices R.M. Churchill Theory Department Princeton Plasma Physics Laboratory 100 Stellarator Road, Princeton, NJ 08540, USA rchurchi@pppl.gov and the DIII-D team General Atomics P .O. Box 85608, San Diego, California 92186, USA Abstract The multi-scale, mutli-physics nature of fusion plasmas makes predicting plasma events challenging. Recent advances in deep convolutional neural network architectures (CNN) utilizing dilated convolutions enable accurate predictions on sequences which have long-range, multi-scale characteristics, such as the time-series generated by diagnostic instruments observing fusion plasmas. Here we apply this neural network architecture to the popular problem of disruption prediction in fusion tokamaks, utilizing raw data from a single diagnostic, the Electron Cyclotron Emission imaging (ECEi) diagnostic from the DIII-D tokamak. ECEi measures a fundamental plasma quantity (electron temperature) with high temporal resolution over the entire plasma discharge, making it sensitive to a number of potential pre-disruptions markers with different temporal and spatial scales. Promising, initial disruption prediction results are obtained training a deep CNN with large receptive field ( 30k), achieving an F 1-score of 91% on individual time-slices using only the ECEi data. 1 Introduction Plasma phenomena contain a wide range of temporal and spatial scales, often exhibiting multi-scale characteristics (see Figure 1).


Traffic4cast-Traffic Map Movie Forecasting -- Team MIE-Lab

arXiv.org Machine Learning

The recorded traffic was aggregated into 100x100 meters bins and made available as three-channel images. Within these images, the first channel depicts the traffic volume in each cell, the second one the average speed of vehicles, and the third one the majority of vehicles' directions (as one of four cardinal directions). The data spanned a whole year in 5-minute intervals, where certain days were left out from the training data, to be used for prediction and upload to the traffic4cast servers, which then would assess the quality of the prediction. The prediction itself consisted of "three images into the future" (spanning a 15-minute interval), based on the previous hour (12 images). Given the problem formalization, our efforts mostly focused on the application of well-known image processing algorithms, though we also explored various simple baselines, neural networks taking into account spatiotemporal context, as well as more complex network architectures that should be able to take advantage of the fact that the origin of the data stems from probes that move on a known graph. Ultimately, we did not manage to outperform the "simple" application of a widely-used image processing algorithm, which might be a hint that either a lot more research on networks targeted specifically at this problem or a different formulation of the problem altogether is required.


Teaching Perception

arXiv.org Artificial Intelligence

T eaching Perception Jonathan H. Connell 1 Abstract -- The visual world is very rich and generally too complex to perceive in its entirety. Y et only certain features are typically required to adequately perform some task in a given situation. Rather than hardwire-in decisions about when and what to sense, this paper describes a robotic system whose behavioral policy can be set by verbal instructions it receives. These capabilities are demonstrated in an associated video [1] showing the fully implemented system guiding the perception of a physical robot in simple scenario. The structure and functioning of the underlying natural language based symbolic reasoning system is also discussed. I. INTRODUCTION Sensing is not without costs. For any given object there are many things that can be known about it. What constitutes a reasonable amount of information to obtain? For instance, to identify an object in a scene a robot could run a DNN recognizer. But, depending on the resources available, this may take a noticeable amount of time. And, while some recognizers have Nary outputs, others are designed as one-versus-all. In this case, to classify an object a robot might have to run N separate nets.


Learning Feature Interactions with Lorentzian Factorization Machine

arXiv.org Artificial Intelligence

Learning representations for feature interactions to model user behaviors is critical for recommendation system and click-trough rate (CTR) predictions. Recent advances in this area are empowered by deep learning methods which could learn sophisticated feature interactions and achieve the state-of-the-art result in an end-to-end manner. These approaches require large number of training parameters integrated with the low-level representations, and thus are memory and computational inefficient. In this paper, we propose a new model named "LorentzFM" that can learn feature interactions embedded in a hyperbolic space in which the violation of triangle inequality for Lorentz distances is available. To this end, the learned representation is benefited by the peculiar geometric properties of hyperbolic triangles, and result in a significant reduction in the number of parameters (20\% to 80\%) because all the top deep learning layers are not required. With such a lightweight architecture, LorentzFM achieves comparable and even materially better results than the deep learning methods such as DeepFM, xDeepFM and Deep \& Cross in both recommendation and CTR prediction tasks.


Facility Location Problem with Capacity Constraints: Algorithmic and Mechanism Design Perspectives

arXiv.org Artificial Intelligence

We consider the facility location problem in the one-dimensional setting where each facility can serve a limited number of agents from the algorithmic and mechanism design perspectives. From the algorithmic perspective, we prove t hat the corresponding optimization problem, where the goal is t o locate facilities to minimize either the total cost to all ag ents or the maximum cost of any agent is NPhard. However, we show that the problem is fixed-parameter tractable, and the optimal solution can be computed in polynomial time whenever the number of facilities is bounded, or when all facilit ies have identical capacities. We then consider the problem fro m a mechanism design perspective where the agents are strategic and need not reveal their true locations. We show that sev - eral natural mechanisms studied in the uncapacitated setti ng either lose strategyproofness or a bound on the solution qua l-ity for the total or maximum cost objective. We then propose new mechanisms that are strategyproof and achieve approximation guarantees that almost match the lower bounds.