Bibliometric analysis and systematic review of AI applied to wastewater treatment. Wastewater treatment technology, economy, management, and reuse were discussed. Prediction accuracy of AI technologies on pollutant removal ranged 0.64–1.00. Application of AI technology could reduce operational costs by up to 30 %. Combined AI methods could provide higher accuracy and lower error. Wastewater treatment is an important step for pollutant reduction and the promotion of water environment quality.
As the multi-view data grows in the real world, multi-view clus-tering has become a prominent technique in data mining, pattern recognition, and machine learning. How to exploit the relation-ship between different views effectively using the characteristic of multi-view data has become a crucial challenge. Aiming at this, a hidden space sharing multi-view fuzzy clustering (HSS-MVFC) method is proposed in the present study. This method is based on the classical fuzzy c-means clustering model, and obtains associ-ated information between different views by introducing shared hidden space. Especially, the shared hidden space and the fuzzy partition can be learned alternatively and contribute to each other. Meanwhile, the proposed method uses maximum entropy strategy to control the weights of different views while learning the shared hidden space. The experimental result shows that the proposed multi-view clustering method has better performance than many related clustering methods.
Field canals improvement projects (FCIPs) are one of the ambitious projects constructed to save fresh water. To finance this project, Conceptual cost models are important to accurately predict preliminary costs at the early stages of the project. The first step is to develop a conceptual cost model to identify key cost drivers affecting the project. Therefore, input variables selection remains an important part of model development, as the poor variables selection can decrease model precision. The study discovered the most important drivers of FCIPs based on a qualitative approach and a quantitative approach. Subsequently, the study has developed a parametric cost model based on machine learning methods such as regression methods, artificial neural networks, fuzzy model and case-based reasoning.
Data driven soft sensor design has recently gained immense popularity, due to advances in sensory devices, and a growing interest in data mining. While partial least squares (PLS) is traditionally used in the process literature for designing soft sensors, the statistical literature has focused on sparse learners, such as Lasso and relevance vector machine (RVM), to solve the high dimensional data problem. In the current study, predictive performances of three regression techniques, PLS, Lasso and RVM were assessed and compared under various offline and online soft sensing scenarios applied on datasets from five real industrial plants, and a simulated process. In offline learning, predictions of RVM and Lasso were found to be superior to those of PLS when a large number of time-lagged predictors were used. Online prediction results gave a slightly more complicated picture. It was found that the minimum prediction error achieved by PLS under moving window (MW), or just-in-time learning scheme was decreased up to ~5-10% using Lasso, or RVM. However, when a small MW size was used, or the optimum number of PLS components was as low as ~1, prediction performance of PLS surpassed RVM, which was found to yield occasional unstable predictions. PLS and Lasso models constructed via online parameter tuning generally did not yield better predictions compared to those constructed via offline tuning. We present evidence to suggest that retaining a large portion of the available process measurement data in the predictor matrix, instead of preselecting variables, would be more advantageous for sparse learners in increasing prediction accuracy. As a result, Lasso is recommended as a better substitute for PLS in soft sensors; while performance of RVM should be validated before online application.
Often these processes result in highly dimensional data sets, with complex relationships within the data and exhibit stochastic behavior. Furthermore the anomalies by definition contain high self-information measure and therefore carry useful information about the underlying data generation process. There exist a number of similar definitions of what an anomaly is however in this paper the following definition is adopted : 1. Anomalies are different from the norm in respect to their attributes.