Materials
Seabed-Mining Robots Will Dig for Gold in Hydrothermal Vents
For decades, futurists have predicted that commercial miners would one day tap the unimaginable mineral wealth of the world's ocean floor. Soon, that subsea gold rush could finally begin: The world's first deep-sea mining robots are poised to rip into rich deposits of copper, gold, and silver 1,600 meters down at the bottom of the Bismarck Sea, near Papua New Guinea. The massive machines, which are to be tested sometime in 2016, are part of a high-stakes gamble for the Toronto-based mining company Nautilus Minerals. Nautilus's machines have been ready to go since 2012, when a dispute between the firm and the Papua New Guinean government stalled the project. What broke the impasse was the company's offer, in 2014, to provide Papua New Guinea with certain intellectual property from the mining project.
Chevron: Gorgon LNG, Mission Accomplished
During Chevron Corporation's (NYSE:CVX) Security Analyst meeting on March 8, several big pieces of news came out. A day before the meeting, Chevron issued a press release stating that its 54 billion Gorgon LNG facility in Australia had just started producing LNG (liquefied natural gas) and condensate. After originally estimated to be operational by the end of 2014 for under 30 billion USD, the project was delayed as costs skyrocketed. As the operator with a 47.3% stake, Chevron lost a lot of credibility due to the massive cost of its mishaps, as did its partners ExxonMobil (NYSE:XOM) and Royal Dutch Shell (NYSE:RDS.A) (NYSE:RDS.B), who each own 25% of the venture. The first cargo of LNG is expected to be shipped out very soon, potentially marking the beginning of a strong source of growth after all the headaches it took to get here.
Machine olfaction using time scattering of sensor multiresolution graphs
Gugel, Leonid, Shkolnisky, Yoel, Dekel, Shai
In this paper we construct a learning architecture for high dimensional time series sampled by sensor arrangements. Using a redundant wavelet decomposition on a graph constructed over the sensor locations, our algorithm is able to construct discriminative features that exploit the mutual information between the sensors. The algorithm then applies scattering networks to the time series graphs to create the feature space. We demonstrate our method on a machine olfaction problem, where one needs to classify the gas type and the location where it originates from data sampled by an array of sensors. Our experimental results clearly demonstrate that our method outperforms classical machine learning techniques used in previous studies.
Optimizing Energy Costs in a Zinc and Lead Mine
Kinsella, Alan (Boliden Tara Mines Ltd.) | Smeaton, Alan F. (Insight Centre for Data Analytics) | Hurley, Barry (Insight Centre for Data Analytics) | O' (Insight Centre for Data Analytics) | Sullivan, Barry (Insight Centre for Data Analytics) | Simonis, Helmut
Boliden Tara Mines Ltd. consumed 184.7 GWh of electricity in 2014, equating to over 1% of the national demand of Ireland or approximately 35,000 homes. Irelandโs industrial electricity prices, at an average of 13 c/KWh in 2014, are amongst the most expensive in Europe. Cost effective electricity procurement is ever more pressing for businesses to remain competitive. In parallel, the proliferation of intelligent devices has led to the industrial Internet of Things paradigm becoming mainstream. As more and more devices become equipped with network connectivity, smart metering is fast becoming a means of giving energy users access to a rich array of consumption data. These modern sensor networks have facilitated the development of applications to process, analyse, and react to continuous data streams in real-time. Subsequently, future procurement and consumption decisions can be informed by a highly detailed evaluation of energy usage. With these considerations in mind, this paper uses variable energy prices from Irelandโs Single Electricity Market, along with smart meter sensor data, to simulate the scheduling of an industrial-sized underground pump station in Tara Mines. The objective is to reduce the overall energy costs whilst still functioning within the systemโs operational constraints. An evaluation using real-world electricity prices and detailed sensor data for 2014 demonstrates significant savings of up to 10.72% over the year compared to the existing control systems.
A Universal Catalyst for First-Order Optimization
Lin, Hongzhou, Mairal, Julien, Harchaoui, Zaid
We introduce a generic scheme for accelerating first-order optimization methods in the sense of Nesterov, which builds upon a new analysis of the accelerated proximal point algorithm. Our approach consists of minimizing a convex objective by approximately solving a sequence of well-chosen auxiliary problems, leading to faster convergence. This strategy applies to a large class of algorithms, including gradient descent, block coordinate descent, SAG, SAGA, SDCA, SVRG, Finito/MISO, and their proximal variants. For all of these methods, we provide acceleration and explicit support for non-strongly convex objectives. In addition to theoretical speed-up, we also show that acceleration is useful in practice, especially for ill-conditioned problems where we measure significant improvements.
Distinguishing cause from effect using observational data: methods and benchmarks
Mooij, Joris M., Peters, Jonas, Janzing, Dominik, Zscheischler, Jakob, Schรถlkopf, Bernhard
The discovery of causal relationships from purely observational data is a fundamental problem in science. The most elementary form of such a causal discovery problem is to decide whether X causes Y or, alternatively, Y causes X, given joint observations of two variables X, Y. An example is to decide whether altitude causes temperature, or vice versa, given only joint measurements of both variables. Even under the simplifying assumptions of no confounding, no feedback loops, and no selection bias, such bivariate causal discovery problems are challenging. Nevertheless, several approaches for addressing those problems have been proposed in recent years. We review two families of such methods: Additive Noise Methods (ANM) and Information Geometric Causal Inference (IGCI). We present the benchmark CauseEffectPairs that consists of data for 100 different cause-effect pairs selected from 37 datasets from various domains (e.g., meteorology, biology, medicine, engineering, economy, etc.) and motivate our decisions regarding the "ground truth" causal directions of all pairs. We evaluate the performance of several bivariate causal discovery methods on these real-world benchmark data and in addition on artificially simulated data. Our empirical results on real-world data indicate that certain methods are indeed able to distinguish cause from effect using only purely observational data, although more benchmark data would be needed to obtain statistically significant conclusions. One of the best performing methods overall is the additive-noise method originally proposed by Hoyer et al. (2009), which obtains an accuracy of 63+-10 % and an AUC of 0.74+-0.05 on the real-world benchmark. As the main theoretical contribution of this work we prove the consistency of that method.
A Comprehensive Approach to Mode Clustering
Chen, Yen-Chi, Genovese, Christopher R., Wasserman, Larry
Mode clustering is a nonparametric method for clustering that defines clusters using the basins of attraction of a density estimator's modes. We provide several enhancements to mode clustering: (i) a soft variant of cluster assignment, (ii) a measure of connectivity between clusters, (iii) a technique for choosing the bandwidth, (iv) a method for denoising small clusters, and (v) an approach to visualizing the clusters. Combining all these enhancements gives us a complete procedure for clustering in multivariate problems. We also compare mode clustering to other clustering methods in several examples
Iteratively reweighted adaptive lasso for conditional heteroscedastic time series with applications to AR-ARCH type processes
Shrinkage algorithms are of great importance in almost every area of statistics due to the increasing impact of big data. Especially time series analysis benefits from efficient and rapid estimation techniques such as the lasso. However, currently lasso type estimators for autoregressive time series models still focus on models with homoscedastic residuals. Therefore, an iteratively reweighted adaptive lasso algorithm for the estimation of time series models under conditional heteroscedasticity is presented in a high-dimensional setting. The asymptotic behaviour of the resulting estimator is analysed. It is found that the proposed estimation procedure performs substantially better than its homoscedastic counterpart. A special case of the algorithm is suitable to compute the estimated multivariate AR-ARCH type models efficiently. Extensions to the model like periodic AR-ARCH, threshold AR-ARCH or ARMA-GARCH are discussed. Finally, different simulation results and applications to electricity market data and returns of metal prices are shown.
Variational Inference for Gaussian Process Modulated Poisson Processes
Lloyd, Chris, Gunter, Tom, Osborne, Michael A., Roberts, Stephen J.
We present the first fully variational Bayesian inference scheme for continuous Gaussian-process-modulated Poisson processes. Such point processes are used in a variety of domains, including neuroscience, geo-statistics and astronomy, but their use is hindered by the computational cost of existing inference schemes. Our scheme: requires no discretisation of the domain; scales linearly in the number of observed events; and is many orders of magnitude faster than previous sampling based approaches. The resulting algorithm is shown to outperform standard methods on synthetic examples, coal mining disaster data and in the prediction of Malaria incidences in Kenya.
The Mondrian Process for Machine Learning
This report is concerned with the Mondrian process and its applications in machine learning. The Mondrian process is a guillotine-partition-valued stochastic process that possesses an elegant self-consistency property. The first part of the report uses simple concepts from applied probability to define the Mondrian process and explore its properties. The Mondrian process has been used as the main building block of a clever online random forest classification algorithm that turns out to be equivalent to its batch counterpart. We outline a slight adaptation of this algorithm to regression, as the remainder of the report uses regression as a case study of how Mondrian processes can be utilized in machine learning. In particular, the Mondrian process will be used to construct a fast approximation to the computationally expensive kernel ridge regression problem with a Laplace kernel. The complexity of random guillotine partitions generated by a Mondrian process and hence the complexity of the resulting regression models is controlled by a lifetime hyperparameter. It turns out that these models can be efficiently trained and evaluated for all lifetimes in a given range at once, without needing to retrain them from scratch for each lifetime value. This leads to an efficient procedure for determining the right model complexity for a dataset at hand. The limitation of having a single lifetime hyperparameter will motivate the final Mondrian grid model, in which each input dimension is endowed with its own lifetime parameter. In this model we preserve the property that its hyperparameters can be tweaked without needing to retrain the modified model from scratch.