Goto

Collaborating Authors

 Larocque, Denis


Improving the generalizability and robustness of large-scale traffic signal control

arXiv.org Artificial Intelligence

A number of deep reinforcement-learning (RL) approaches propose to control traffic signals. In this work, we study the robustness of such methods along two axes. First, sensor failures and GPS occlusions create missing-data challenges and we show that recent methods remain brittle in the face of these missing data. Second, we provide a more systematic study of the generalization ability of RL methods to new networks with different traffic regimes. Again, we identify the limitations of recent approaches. We then propose using a combination of distributional and vanilla reinforcement learning through a policy ensemble. Building upon the state-of-the-art previous model which uses a decentralized approach for large-scale traffic signal control with graph convolutional networks (GCNs), we first learn models using a distributional reinforcement learning (DisRL) approach. In particular, we use implicit quantile networks (IQN) to model the state-action return distribution with quantile regression. For traffic signal control problems, an ensemble of standard RL and DisRL yields superior performance across different scenarios, including different levels of missing sensor data and traffic flow patterns. Furthermore, the learning scheme of the resulting model can improve zero-shot transferability to different road network structures, including both synthetic networks and real-world networks (e.g., Luxembourg, Manhattan). We conduct extensive experiments to compare our approach to multi-agent reinforcement learning and traditional transportation approaches. Results show that the proposed method improves robustness and generalizability in the face of missing data, varying road networks, and traffic flows.


Covariance regression with random forests

arXiv.org Machine Learning

Capturing the conditional covariances or correlations among the elements of a multivariate response vector based on covariates is important to various fields including neuroscience, epidemiology and biomedicine. We propose a new method called Covariance Regression with Random Forests (CovRegRF) to estimate the covariance matrix of a multivariate response given a set of covariates, using a random forest framework. Random forest trees are built with a splitting rule specially designed to maximize the difference between the sample covariance matrix estimates of the child nodes. We also propose a significance test for the partial effect of a subset of covariates. We evaluate the performance of the proposed method and significance test through a simulation study which shows that the proposed method provides accurate covariance matrix estimates and that the Type-1 error is well controlled. An application of the proposed method to thyroid disease data is also presented. CovRegRF is implemented in a freely available R package on CRAN.


RFpredInterval: An R Package for Prediction Intervals with Random Forests and Boosted Forests

arXiv.org Machine Learning

Like many predictive models, random forests provide a point prediction for a new observation. Besides the point prediction, it is important to quantify the uncertainty in the prediction. Prediction intervals provide information about the reliability of the point predictions. We have developed a comprehensive R package, RFpredInterval, that integrates 16 methods to build prediction intervals with random forests and boosted forests. The methods implemented in the package are a new method to build prediction intervals with boosted forests (PIBF) and 15 different variants to produce prediction intervals with random forests proposed by Roy and Larocque (2020). We perform an extensive simulation study and apply real data analyses to compare the performance of the proposed method to ten existing methods to build prediction intervals with random forests. The results show that the proposed method is very competitive and, globally, it outperforms the competing methods.


Conditional canonical correlation estimation based on covariates with random forests

arXiv.org Machine Learning

Investigating the relationships between two sets of variables helps to understand their interactions and can be done with canonical correlation analysis (CCA). However, the correlation between the two sets can sometimes depend on a third set of covariates, often subject-related ones such as age, gender, or other clinical measures. In this case, applying CCA to the whole population is not optimal and methods to estimate conditional CCA, given the covariates, can be useful. We propose a new method called Random Forest with Canonical Correlation Analysis (RFCCA) to estimate the conditional canonical correlations between two sets of variables given subject-related covariates. The individual trees in the forest are built with a splitting rule specifically designed to partition the data to maximize the canonical correlation heterogeneity between child nodes. We also propose a significance test to detect the global effect of the covariates on the relationship between two sets of variables. The performance of the proposed method and the global significance test is evaluated through simulation studies that show it provides accurate canonical correlation estimations and well-controlled Type-1 error. We also show an application of the proposed method with EEG data.


Ensemble Methods for Survival Data with Time-Varying Covariates

arXiv.org Machine Learning

Survival data with time-varying covariates are common in practice. However, the traditional survival forests - conditional inference forest, relative risk forest and random survival forest - have accommodated only time-invariant covariates. Similarly, the recently proposed transformation forest, which incorporates the split statistics suitable for non-proportional hazard settings, has employed only time-invariant covariates. We generalize the conditional inference and relative risk forests to allow time-varying covariates. We compare their performance with that of the Cox model and transformation forest, adapted to accommodate time-varying covariates, through a comprehensive simulation study in which the Kaplan-Meier estimate serves as a benchmark. In general, the performance of the two proposed forests substantially improves over the Kaplan-Meier estimate when the estimation conditions become more favorable. Taking into an account all other factors, under the PH setting, the best method is always one of the two proposed forests, while under the non-PH setting, it is the adapted transformation forest. The K-fold cross-validation can be an effective tool to choose between the methods in practice. Finally, the performance of the proposed forest methods for time-invariant covariate data is broadly similar to that found for time-varying covariate data. We also propose a general framework for estimation of a survival function in the presence of time-varying covariates, which can be applied to any method that uses the counting process (pseudo-subject) approach to handling time-varying covariates. This novel estimate of a single survival function takes multiple survival estimation outputs corresponding to each pseudo-subject, and combines them in a theoretically-justified way to form a proper monotone-decreasing survival function estimate.