Materials
Association Rule Hiding Based on Evolutionary Multi-Objective Optimization by Removing Items
Cheng, Peng (Harbin Institute of Technology) | Pan, Jeng-Shyang (Harbin Institute of Technology)
Today, people benefit from utilizing data mining technologies, such as association rule mining methods, to find valuable knowledge residing in a large amount of data. However, they also face the risk of exposing sensitive or confidential information, when data is shared among different organizations. Thus, a question arise: how can we prevent that sensitive knowledge is discovered, while ensuring that ordinary non-sensitive knowledge can be mined to the maximum extent possible. In this paper, we address the problem of privacy preserving in association rule mining from the perspective of multi-objective optimization. A new hiding method based evolutionary multi-objective optimization (EMO) is proposed and the side effects generated by the hiding process are formulated as optimization goals. EMO is used to find candidate transactions to modify so that side effects are minimized. Comparative experiments with exact methods on real datasets demonstrated that the proposed method can hide sensitive rules with fewer side effects.
Computing Narratives of Cognitive User Experience for Building Design Analysis: KR for Industry Scale Computer-Aided Architecture Design
Bhatt, Mehul (University of Bremen) | Schultz, Carl (University of Bremen) | Thosar, Madhura (University of Bremen)
We present a cognitive design assistance system equipped with analytical capabilities aimed at anticipating architectural building design performance with respect to people-centred functional design goals. The paper focuses on the system capability to generate "narratives of visuo-locomotive user experience" from digital computer-aided architecture design (CAAD) models. The system is based on an underlying declarative narrative representation and computation framework pertaining to conceptual, geometric, and qualitative spatial knowledge. The semantics of the declarative narrative model, i.e., the overall ย representation and computation model, is founded on: (a). conceptual knowledge formalised in an OWL ontology; (b). a general spatial representation and reasoning engine implemented in constraint logic programming; and (c). a declaratively encoded (narrative) construction process (based on search over graph structures) implemented in answer-set programming. We emphasise and demonstrate: complete system implementation, scalability, and robust performance & integration with industry-scale architecture industry tools (e.g., Revit, ArchiCAD) & standards (BIM, IFC).
Monotone Temporal Planning: Tractability, Extensions and Applications
Cooper, M., Maris, F., Rรฉgnier, P.
This paper describes a polynomially-solvable class of temporal planning problems. Polynomiality follows from two assumptions. Firstly, by supposing that each sub-goal fluent can be established by at most one action, we can quickly determine which actions are necessary in any plan. Secondly, the monotonicity of sub-goal fluents allows us to express planning as an instance of STPโ (Simple Temporal Problem with difference constraints). This class includes temporally-expressive problems requiring the concurrent execution of actions, with potential applications in the chemical, pharmaceutical and construction industries. We also show that any (temporal) planning problem has a monotone relaxation which can lead to the polynomial-time detection of its unsolvability in certain cases. Indeed we show that our relaxation is orthogonal to relaxations based on the ignore-deletes approach used in classical planning since it preserves deletes and can also exploit temporal information.
An\'alisis e implementaci\'on de algoritmos evolutivos para la optimizaci\'on de simulaciones en ingenier\'ia civil. (draft)
Gutiรฉrrez, Josรฉ Alberto Garcรญa, Dรญaz, Alejandro Mateo Hernรกndez
This paper studies the applicability of evolutionary algorithms, particularly, the evolution strategies family in order to estimate a degradation parameter in the shear design of reinforced concrete members. This problem represents a great computational task and is highly relevant in the framework of the structural engineering that for the first time is solved using genetic algorithms. You are viewing a draft, the authors appreciate corrections, comments and suggestions to this work.
Planning for Mining Operations with Time and Resource Constraints
Lipovetzky, Nir (The University of Melbourne) | Burt, Christina N. (The University of Melbourne) | Pearce, Adrian R. (The University of Melbourne) | Stuckey, Peter J. (The University of Melbourne)
We study a daily mine planning problem where, given a set of blocks we wishto mine, our task is to generate a mining sequence for the excavators suchthat blending resource constraints are met at various stages of thesequence. Such time-oriented resource constraintsare not traditionally handled well by automated planners. On the other hand,the remaining problem involves finding node-disjoint sequences withstate-dependent travel times on the arcs, which are highly challenging for a Mixed-Integer Program (MIP).In this paper, we address the problem of finding feasible sequences using a combined MIP and planning based decomposition approach. The MIP takes care of the resource constraints, and the planner solves the remaining sequence problem. We extend the notion of finding feasible sequences to finding good feasible sequences, by devising a heuristic objective function in the MIP, which improves the resulting search space for the planner.We empirically analyse the scalability of our approach on a benchmark data set, before demonstrating its effectiveness on a real world case study provided by our industry partner. These results demonstrate that by using a heuristic MIP, it is possible to obtain better makespan results with a suboptimal planner than by using an optimal planner with an uninformed MIP.
Transductive Learning for Multi-Task Copula Processes
Schneider, Markus, Ramos, Fabio
We tackle the problem of multi-task learning with copula process. Multivariable prediction in spatial and spatial-temporal processes such as natural resource estimation and pollution monitoring have been typically addressed using techniques based on Gaussian processes and co-Kriging. While the Gaussian prior assumption is convenient from analytical and computational perspectives, nature is dominated by non-Gaussian likelihoods. Copula processes are an elegant and flexible solution to handle various non-Gaussian likelihoods by capturing the dependence structure of random variables with cumulative distribution functions rather than their marginals. We show how multi-task learning for copula processes can be used to improve multivari-able prediction for problems where the simple Gaussianity prior assumption does not hold. Then, we present a trans-ductive approximation for multi-task learning and derive analytical expressions for the copula process model. The approach is evaluated and compared to other techniques in one artificial dataset and two publicly available datasets for natural resource estimation and concrete slump prediction.
Scalable Recommendation with Poisson Factorization
Gopalan, Prem, Hofman, Jake M., Blei, David M.
We develop a Bayesian Poisson matrix factorization model for forming recommendations from sparse user behavior data. These data are large user/item matrices where each user has provided feedback on only a small subset of items, either explicitly (e.g., through star ratings) or implicitly (e.g., through views or purchases). In contrast to traditional matrix factorization approaches, Poisson factorization implicitly models each user's limited attention to consume items. Moreover, because of the mathematical form of the Poisson likelihood, the model needs only to explicitly consider the observed entries in the matrix, leading to both scalable computation and good predictive performance. We develop a variational inference algorithm for approximate posterior inference that scales up to massive data sets. This is an efficient algorithm that iterates over the observed entries and adjusts an approximate posterior over the user/item representations. We apply our method to large real-world user data containing users rating movies, users listening to songs, and users reading scientific papers. In all these settings, Bayesian Poisson factorization outperforms state-of-the-art matrix factorization methods.
Bayesian Source Separation Applied to Identifying Complex Organic Molecules in Space
Knuth, Kevin H., Tse, Man Kit, Choinsky, Joshua, Maunu, Haley A., Carbon, Duane F.
Emission from a class of benzene-based molecules known as Polycyclic Aromatic Hydrocarbons (PAHs) dominates the infrared spectrum of star-forming regions. The observed emission appears to arise from the combined emission of numerous PAH species, each with its unique spectrum. Linear superposition of the PAH spectra identifies this problem as a source separation problem. It is, however, of a formidable class of source separation problems given that different PAH sources potentially number in the hundreds, even thousands, and there is only one measured spectral signal for a given astrophysical site. Fortunately, the source spectra of the PAHs are known, but the signal is also contaminated by other spectral sources. We describe our ongoing work in developing Bayesian source separation techniques relying on nested sampling in conjunction with an ON/OFF mechanism enabling simultaneous estimation of the probability that a particular PAH species is present and its contribution to the spectrum.
Bayesian Inference for NMR Spectroscopy with Applications to Chemical Quantification
Wilson, Andrew Gordon, Wu, Yuting, Holland, Daniel J., Nowozin, Sebastian, Mantle, Mick D., Gladden, Lynn F., Blake, Andrew
Nuclear magnetic resonance (NMR) spectroscopy exploits the magnetic properties of atomic nuclei to discover the structure, reaction state and chemical environment of molecules. We propose a probabilistic generative model and inference procedures for NMR spectroscopy. Specifically, we use a weighted sum of trigonometric functions undergoing exponential decay to model free induction decay (FID) signals. We discuss the challenges in estimating the components of this general model -- amplitudes, phase shifts, frequencies, decay rates, and noise variances -- and offer practical solutions. We compare with conventional Fourier transform spectroscopy for estimating the relative concentrations of chemicals in a mixture, using synthetic and experimentally acquired FID signals. We find the proposed model is particularly robust to low signal to noise ratios (SNR), and overlapping peaks in the Fourier transform of the FID, enabling accurate predictions (e.g., 1% sensitivity at low SNR) which are not possible with conventional spectroscopy (5% sensitivity).
Designed Measurements for Vector Count Data
Wang, Liming, Carlson, David E., Rodrigues, Miguel, Wilcox, David, Calderbank, Robert, Carin, Lawrence
We consider design of linear projection measurements for a vector Poisson signal model. The projections are performed on the vector Poisson rate, $X\in\mathbb{R}_+^n$, and the observed data are a vector of counts, $Y\in\mathbb{Z}_+^m$. The projection matrix is designed by maximizing mutual information between $Y$ and $X$, $I(Y;X)$. When there is a latent class label $C\in\{1,\dots,L\}$ associated with $X$, we consider the mutual information with respect to $Y$ and $C$, $I(Y;C)$. New analytic expressions for the gradient of $I(Y;X)$ and $I(Y;C)$ are presented, with gradient performed with respect to the measurement matrix. Connections are made to the more widely studied Gaussian measurement model. Example results are presented for compressive topic modeling of a document corpora (word counting), and hyperspectral compressive sensing for chemical classification (photon counting).