data set 2
Statistical Study of Sensor Data and Investigation of ML-based Calibration Algorithms for Inexpensive Sensor Modules: Experiments from Cape Point
Barrett, Travis, Mishra, Amit Kumar
In this paper we present the statistical analysis of data from inexpensive sensors. We also present the performance of machine learning algorithms when used for automatic calibration such sensors. In this we have used low-cost Non-Dispersive Infrared CO$_2$ sensor placed at a co-located site at Cape Point, South Africa (maintained by Weather South Africa). The collected low-cost sensor data and site truth data are investigated and compared. We compare and investigate the performance of Random Forest Regression, Support Vector Regression, 1D Convolutional Neural Network and 1D-CNN Long Short-Term Memory Network models as a method for automatic calibration and the statistical properties of these model predictions. In addition, we also investigate the drift in performance of these algorithms with time.
Encoding Temporal Statistical-space Priors via Augmented Representation
Choi, Insu, Koh, Woosung, Kang, Gimin, Jang, Yuntae, Kim, Woo Chang
Modeling time series data remains a pervasive issue as the temporal dimension is inherent to numerous domains. Despite significant strides in time series forecasting, high noise-to-signal ratio, non-normality, non-stationarity, and lack of data continue challenging practitioners. In response, we leverage a simple representation augmentation technique to overcome these challenges. Our augmented representation acts as a statistical-space prior encoded at each time step. In response, we name our method Statistical-space Augmented Representation (SSAR). The underlying high-dimensional data-generating process inspires our representation augmentation. We rigorously examine the empirical generalization performance on two data sets with two downstream temporal learning algorithms. Our approach significantly beats all five up-to-date baselines. Moreover, the highly modular nature of our approach can easily be applied to various settings. Lastly, fully-fledged theoretical perspectives are available throughout the writing for a clear and rigorous understanding.
Curriculum Learning and Imitation Learning for Model-free Control on Financial Time-series
Koh, Woosung, Choi, Insu, Jang, Yuntae, Kang, Gimin, Kim, Woo Chang
Curriculum learning and imitation learning have been leveraged extensively in the robotics domain. However, minimal research has been done on leveraging these ideas on control tasks over highly stochastic time-series data. Here, we theoretically and empirically explore these approaches in a representative control task over complex time-series data. We implement the fundamental ideas of curriculum learning via data augmentation, while imitation learning is implemented via policy distillation from an oracle. Our findings reveal that curriculum learning should be considered a novel direction in improving control-task performance over complex time-series. Our ample random-seed out-sample empirics and ablation studies are highly encouraging for curriculum learning for time-series control. These findings are especially encouraging as we tune all overlapping hyperparameters on the baseline -- giving an advantage to the baseline. On the other hand, we find that imitation learning should be used with caution.
Machine Learning Algorithms to Predict Chess960 Result and Develop Opening Themes
Deo, Shreyan, Dwivedi, Nishchal
This work focuses on the analysis of Chess 960, also known as Fischer Random Chess, a variant of traditional chess where the starting positions of the pieces are randomized. The study aims to predict the game outcome using machine learning techniques and develop an opening theme for each starting position. The first part of the analysis utilizes machine learning models to predict the game result based on certain moves in each position. The methodology involves segregating raw data from .pgn files into usable formats and creating datasets comprising approximately 500 games for each starting position. Three machine learning algorithms -- KNN Clustering, Random Forest, and Gradient Boosted Trees -- have been used to predict the game outcome. To establish an opening theme, the board is divided into five regions: center, white kingside, white queenside, black kingside, and black queenside. The data from games played by top engines in all 960 positions is used to track the movement of pieces in the opening. By analysing the change in the number of pieces in each region at specific moves, the report predicts the region towards which the game is developing. These models provide valuable insights into predicting game outcomes and understanding the opening theme in Chess 960.