Oceania
History as a giant data set: how analysing the past could help save the future
In its first issue of 2010, the scientific journal Nature looked forward to a dazzling decade of progress. By 2020, experimental devices connected to the internet would deduce our search queries by directly monitoring our brain signals. Crops would exist that doubled their biomass in three hours. Humanity would be well on the way to ending its dependency on fossil fuels. It warned that all these advances could be derailed by mounting political instability, which was due to peak in the US and western Europe around 2020. Human societies go through predictable periods of growth, the letter explained, during which the population increases and prosperity rises. Then come equally predictable periods of decline. In recent decades, the letter went on, a number of worrying social indicators – such as wealth inequality and public debt – had started to climb in western nations, indicating that these societies were approaching a period of upheaval. The letter-writer would go on to predict that the turmoil in the US in 2020 would be less severe than the American civil war, but worse than the violence of the late 1960s and early 70s, when the murder rate spiked, civil rights and anti-Vietnam war protests intensified and domestic terrorists carried out thousands of bombings across the country. The author of this stark warning was not a historian, but a biologist.
Woodside Energy signs AI and quantum computing deal with IBM ZDNet
Woodside Energy announced on Tuesday it has signed a multi-year collaboration deal with IBM to leverage artificial intelligence (AI) and quantum computing to help it reduce operation costs and develop a "plant of the future" that can run itself. Speaking at IBM's Cloud Innovation Exchange in Sydney, Woodside Energy CEO Peter Coleman said he believes AI could help the company significantly reduce current plant maintenance costs -- an exercise that the business spends AU$1 billion on annually. "Because of the products we produce, our plants are covered in cladding and everything is insulated, so it's a huge cost for us to chase corrosion. Of course, AI will help in that. We really think AI will reduce that cost by 30%," he said.
92c/MFlops/s, Ultra-Large-Scale Neural-Network Training on a PIII Cluster
Aberdeen, Douglas, Baxter, Jonathan, Edwards, Robert
Artificial neural networks with millions of adjustable parameters and a similar number of training examples are a potential solution for difficult, large-scale pattern recognition problems in areas such as speech and face recognition, classification of large volumes of web data, and finance. The bottleneck is that neural network training involves iterative gradient descent and is extremely computationally intensive. In this paper we present a technique for distributed training of Ultra Large Scale Neural Networks (ULSNN) on Bunyip, a Linux-based cluster of 196 Pentium III processors. To illustrate ULSNN training we describe an experiment in which a neural network with 1.73 million adjustable parameters was trained to recognize machine-printed Japanese characters from a database containing 9 million training patterns. The training runs with a average performance of 163.3 GFlops/s (single precision). With a machine cost of \$150,913, this yields a price/performance ratio of 92.4c/MFlops/s (single precision). For comparison purposes, training using double precision and the ATLAS DGEMM produces a sustained performance of 70 MFlops/s or \$2.16 / MFlop/s (double precision).
Short-term forecasting of solar irradiance without local telemetry: a generalized model using satellite data
Lago, Jesus, De Brabandere, Karel, De Ridder, Fjo, De Schutter, Bart
Due to the increasing integration of solar power into the electrical grid, forecasting short-term solar irradiance has become key for many applications, e.g. In this context, as solar generators are geographically dispersed and ground measurements are not always easy to obtain, it is very important to have general models that can predict solar irradiance without the need of local data. In this paper, a model that can perform short-term forecasting of solar irradiance in any general location without the need of ground measurements is proposed. To do so, the model considers satellite-based measurements and weather-based forecasts, and employs a deep neural network structure that is able to generalize across locations; particularly, the network is trained only using a small subset of sites where ground data is available, and the model is able to generalize to a much larger number of locations where ground data does not exist. As a case study, 25 locations in The Netherlands are considered and the proposed model is compared against four local models that are individually trained for each location using ground measurements. Despite the general nature of the model, it is shown show that the proposed model is equal or better than the local models: when comparing the average performance across all the locations and prediction horizons, the proposed model obtains a 31.31% Introduction With the increasing integration of renewable sources into the electrical grid, accurate forecasting of renewable source generation has become one of the most important challenges across several applications. Among them, balancing the electrical grid via activation of reserves is arguably one of the most critical ones to ensure a stable system. In particular, due to their intermittent and unpredictable nature, the more renewables are integrated, the more complex the grid management becomes [1, 2]. This is the postprint of the article: Short-term forecasting of solar irradiance without local telemetry: a generalized model using satellite data, Solar Energy 173 (2018), 566-577 . Corresponding author Email address: j.lagogarcia@tudelft.nl (Jesus Lago) In particular, in addition to activation of reserves to manage the grid stability, short-term forecasts of solar irradiance are paramount for operational planning, switching sources, programming backup, short-term power trading, peak load matching, scheduling of power systems, congestion management, and cost reduction [2-4]. Solar irradiance forecasting The forecasting of solar irradiance can be typically divided between methods for global horizontal irradiance (GHI) and methods for direct normal irradiance (DNI) [5], with the latter being a component of the GHI (together with the diffuse solar irradiance). As in this work GHI is forecasted, [5] should be used for a complete review on methods for DNI.
A Capsule Network-based Model for Learning Node Embeddings
Nguyen, Dai Quoc, Nguyen, Tu Dinh, Nguyen, Dat Quoc, Phung, Dinh
In this paper, we focus on learning low-dimensional em-beddings of entity nodes from graph-structured data, where we can use the learned node embeddings for a downstream task of node classification. Existing node embedding models often suffer from a limitation of exploiting graph information to infer plausible embeddings of unseen nodes. To address this issue, we propose Caps2NE--a new unsupervised embedding model using a network of two capsule layers. Given a target node and its context nodes, Caps2NE applies a routing process to aggregate features of the context nodes at the first capsule layer, then feed these features into the second capsule layer to produce an embedding vector. This embedding vector is then used to infer a plausible embedding for the target node. Experimental results for the node classification task on six well-known benchmark datasets show that our Caps2NE obtains state-of-the-art performances.
Learning from the Past: Continual Meta-Learning via Bayesian Graph Modeling
Luo, Yadan, Huang, Zi, Zhang, Zheng, Wang, Ziwei, Baktashmotlagh, Mahsa, Yang, Yang
Meta-learning for few-shot learning allows a machine to leverage previously acquired knowledge as a prior, thus improving the performance on novel tasks with only small amounts of data. However, most mainstream models suffer from catastrophic forgetting and insufficient robustness issues, thereby failing to fully retain or exploit long-term knowledge while being prone to cause severe error accumulation. In this paper, we propose a novel Continual Meta-Learning approach with Bayesian Graph Neural Networks (CML-BGNN) that mathematically formulates meta-learning as continual learning of a sequence of tasks. With each task forming as a graph, the intra- and inter-task correlations can be well preserved via message-passing and history transition. To remedy topological uncertainty from graph initialization, we utilize Bayes by Backprop strategy that approximates the posterior distribution of task-specific parameters with amortized inference networks, which are seamlessly integrated into the end-to-end edge learning. Extensive experiments conducted on the miniImageNet and tieredImageNet datasets demonstrate the effectiveness and efficiency of the proposed method, improving the performance by 42.8% compared with state-of-the-art on the miniImageNet 5-way 1-shot classification task.
Selective Brain Damage: Measuring the Disparate Impact of Model Pruning
Hooker, Sara, Courville, Aaron, Dauphin, Yann, Frome, Andrea
Neural network pruning techniques have demonstrated it is possible to remove the majority of weights in a network with surprisingly little degradation to test set accuracy. However, this measure of performance conceals significant differences in how different classes and images are impacted by pruning. We find that certain examples, which we term pruning identified exemplars (PIEs), and classes are systematically more impacted by the introduction of sparsity. Removing PIE images from the test-set greatly improves top-1 accuracy for both pruned and non-pruned models. These hard-to-generalize-to images tend to be mislabelled, of lower image quality, depict multiple objects or require fine-grained classification. These findings shed light on previously unknown trade-offs, and suggest that a high degree of caution should be exercised before pruning is used in sensitive domains.
Creating Auxiliary Representations from Charge Definitions for Criminal Charge Prediction
Kang, Liangyi, Liu, Jie, Liu, Lingqiao, Shi, Qinfeng, Ye, Dan
Charge prediction, determining charges for criminal cases by analyzing the textual fact descriptions, is a promising technology in legal assistant systems. In practice, the fact descriptions could exhibit a significant intra-class variation due to factors like nonnormative use of language, which makes the prediction task very challenging, especially for charge classes with too few samples to cover the expression variation. In this work, we explore to use the charge definitions from criminal law to alleviate this issue. The key idea is that the expressions in a fact description should have corresponding formal terms in charge definitions, and those terms are shared across classes and could account for the diversity in the fact descriptions. Thus, we propose to create auxiliary fact representations from charge definitions to augment fact descriptions representation. The generated auxiliary representations are created through the interaction of fact description with the relevant charge definitions and terms in those definitions by integrated sentence-and word-level attention scheme. Experimental results on two datasets show that our model achieves significant improvement than baselines, especially for classes with few samples. Introduction The task of charge prediction is to determine appropriate charges, such as theft, seizing or robbery, for criminal cases by analyzing the textual fact descriptions.
Improving Robustness of Task Oriented Dialog Systems
Einolghozati, Arash, Gupta, Sonal, Mohit, Mrinal, Shah, Rushin
Task oriented language understanding in dialog systems is often modeled using intents (task of a query) and slots (parameters for that task). Intent detection and slot tagging are, in turn, modeled using sentence classification and word tagging techniques respectively. Similar to adversarial attack problems with computer vision models discussed in existing literature, these intent-slot tagging models are often over-sensitive to small variations in input -- predicting different and often incorrect labels when small changes are made to a query, thus reducing their accuracy and reliability. However, evaluating a model's robustness to these changes is harder for language since words are discrete and an automated change (e.g. adding `noise') to a query sometimes changes the meaning and thus labels of a query. In this paper, we first describe how to create an adversarial test set to measure the robustness of these models. Furthermore, we introduce and adapt adversarial training methods as well as data augmentation using back-translation to mitigate these issues. Our experiments show that both techniques improve the robustness of the system substantially and can be combined to yield the best results.
Machine Intelligence at the Edge with Learning Centric Power Allocation
Wang, Shuai, Wu, Yik-Chung, Xia, Minghua, Wang, Rui, Poor, H. Vincent
While machine-type communication (MTC) devices generate considerable amounts of data, they often cannot process the data due to limited energy and computation power. To empower MTC with intelligence, edge machine learning has been proposed. However, power allocation in this paradigm requires maximizing the learning performance instead of the communication throughput, for which the celebrated water-filling and max-min fairness algorithms become inefficient. To this end, this paper proposes learning centric power allocation (LCPA), which provides a new perspective to radio resource allocation in learning driven scenarios. By employing an empirical classification error model that is supported by learning theory, the LCPA is formulated as a nonconvex nonsmooth optimization problem, and is solved by majorization minimization (MM) framework. To get deeper insights into LCPA, asymptotic analysis shows that the transmit powers are inversely proportional to the channel gain, and scale exponentially with the learning parameters. This is in contrast to traditional power allocations where quality of wireless channels is the only consideration. Last but not least, to enable LCPA in large-scale settings, two optimization algorithms, termed mirror-prox LCPA and accelerated LCPA, are further proposed. Extensive numerical results demonstrate that the proposed LCPA algorithms outperform traditional power allocation algorithms, and the large-scale algorithms reduce the computation time by orders of magnitude compared with MM-based LCPA but still achieve competing learning performance.