Islam, Mirajul, Ani, Jannatul Ferdous, Rahman, Abdur, Zaman, Zakia
Hilsa is the national fish of Bangladesh. Bangladesh is earning a lot of foreign currency by exporting this fish. Unfortunately, in recent days, some unscrupulous businessmen are selling fake Hilsa fishes to gain profit. The Sardines and Sardinella are the most sold in the market as Hilsa. The government agency of Bangladesh, namely Bangladesh Food Safety Authority said that these fake Hilsa fish contain high levels of cadmium and lead which are detrimental for humans. In this research, we have proposed a method that can readily identify original Hilsa fish and fake Hilsa fish. Based on the research available on online literature, we are the first to do research on identifying original Hilsa fish. We have collected more than 16,000 images of original and counterfeit Hilsa fish. To classify these images, we have used several deep learning-based models. Then, the performance has been compared between them. Among those models, DenseNet201 achieved the highest accuracy of 97.02%.
Moghaddam, Mohammad A., Ferre, Ty P. A., Chen, Xingyuan, Chen, Kewei, Ehsani, Mohammad Reza
We examine the ability of machine learning (ML) and deep learning (DL) algorithms to infer surface/ground exchange flux based on subsurface temperature observations. The observations and fluxes are produced from a high-resolution numerical model representing conditions in the Columbia River near the Department of Energy Hanford site located in southeastern Washington State. Random measurement error, of varying magnitude, is added to the synthetic temperature observations. The results indicate that both ML and DL methods can be used to infer the surface/ground exchange flux. DL methods, especially convolutional neural networks, outperform the ML methods when used to interpret noisy temperature data with a smoothing filter applied. However, the ML methods also performed well and they are can better identify a reduced number of important observations, which could be useful for measurement network optimization. Surprisingly, the ML and DL methods better inferred upward flux than downward flux. This is in direct contrast to previous findings using numerical models to infer flux from temperature observations and it may suggest that combined use of ML or DL inference with numerical inference could improve flux estimation beneath river systems.
Ghahramani, Mohammadhossein, Zhou, Mengchu, Molter, Anna, Pilla, Francesco
The Internet of Things (IoT) is a paradigm characterized by a network of embedded sensors and services. These sensors are incorporated to collect various information, track physical conditions, e.g., waste bins' status, and exchange data with different centralized platforms. The need for such sensors is increasing; however, proliferation of technologies comes with various challenges. For example, how can IoT and its associated data be used to enhance waste management? In smart cities, an efficient waste management system is crucial. Artificial Intelligence (AI) and IoT-enabled approaches can empower cities to manage the waste collection. This work proposes an intelligent approach to route recommendation in an IoT-enabled waste management system given spatial constraints. It performs a thorough analysis based on AI-based methods and compares their corresponding results. Our solution is based on a multiple-level decision-making process in which bins' status and coordinates are taken into account to address the routing problem. Such AI-based models can help engineers design a sustainable infrastructure system.
The multi-task learning (MTL) paradigm can be traced back to an early paper of Caruana (1997) in which it was argued that data from multiple tasks can be used with the aim to obtain a better performance over learning each task independently. A solution of MTL with conflicting objectives requires modelling the trade-off among them which is generally beyond what a straight linear combination can achieve. A theoretically principled and computationally effective strategy is finding solutions which are not dominated by others as it is addressed in the Pareto analysis. Multi-objective optimization problems arising in the multi-task learning context have specific features and require adhoc methods. The analysis of these features and the proposal of a new computational approach represent the focus of this work. Multi-objective evolutionary algorithms (MOEAs) can easily include the concept of dominance and therefore the Pareto analysis. The major drawback of MOEAs is a low sample efficiency with respect to function evaluations. The key reason for this drawback is that most of the evolutionary approaches do not use models for approximating the objective function. Bayesian Optimization takes a radically different approach based on a surrogate model, such as a Gaussian Process. In this thesis the solutions in the Input Space are represented as probability distributions encapsulating the knowledge contained in the function evaluations. In this space of probability distributions, endowed with the metric given by the Wasserstein distance, a new algorithm MOEA/WST can be designed in which the model is not directly on the objective function but in an intermediate Information Space where the objects from the input space are mapped into histograms. Computational results show that the sample efficiency and the quality of the Pareto set provided by MOEA/WST are significantly better than in the standard MOEA.
Data science interviews can be cumbersome, and rejections are merely the beginning. While an academic degree, relevant training, skills, and course work are essential to break into data science, it does not guarantee a job or job satisfaction. When it comes to interviews, there are hundreds of reasons for a company to reject a candidate. Of course, it makes more sense for a company to reject a good candidate than to hire a bad one. But, a talented data science professional stands above all, making sure to stay ahead of the curve.
Stork, Jörg, Wenzel, Philip, Landwein, Severin, Algorri, Maria-Elena, Zaefferer, Martin, Kusch, Wolfgang, Staubach, Martin, Bartz-Beielstein, Thomas, Köhn, Hartmut, Dejager, Hermann, Wolf, Christian
We have built a novel system for the surveillance of drinking water reservoirs using underwater sensor networks. We implement an innovative AI-based approach to detect, classify and localize underwater events. In this paper, we describe the technology and cognitive AI architecture of the system based on one of the sensor networks, the hydrophone network. We discuss the challenges of installing and using the hydrophone network in a water reservoir where traffic, visitors, and variable water conditions create a complex, varying environment. Our AI solution uses an autoencoder for unsupervised learning of latent encodings for classification and anomaly detection, and time delay estimates for sound localization. Finally, we present the results of experiments carried out in a laboratory pool and the water reservoir and discuss the system's potential.
Grbčić, Luka, Družeta, Siniša, Mauša, Goran, Lipić, Tomislav, Lušić, Darija Vukić, Alvir, Marta, Lučin, Ivana, Sikirica, Ante, Davidović, Davor, Travaš, Vanja, Kalafatović, Daniela, Pikelj, Kristina, Fajković, Hana, Holjević, Toni, Kranjčević, Lado
Coastal water quality management is a public health concern, as poor coastal water quality can harbor pathogens that are dangerous to human health. Tourism-oriented countries need to actively monitor the condition of coastal water at tourist popular sites during the summer season. In this study, routine monitoring data of $Escherichia\ Coli$ and enterococci across 15 public beaches in the city of Rijeka, Croatia, were used to build machine learning models for predicting their levels based on environmental parameters as well as to investigate their relationships with environmental stressors. Gradient Boosting (Catboost, Xgboost), Random Forests, Support Vector Regression and Artificial Neural Networks were trained with measurements from all sampling sites and used to predict $E.\ Coli$ and enterococci values based on environmental features. The evaluation of stability and generalizability with 10-fold cross validation analysis of the machine learning models, showed that the Catboost algorithm performed best with R$^2$ values of 0.71 and 0.68 for predicting $E.\ Coli$ and enterococci, respectively, compared to other evaluated ML algorithms including Xgboost, Random Forests, Support Vector Regression and Artificial Neural Networks. We also use the SHapley Additive exPlanations technique to identify and interpret which features have the most predictive power. The results show that site salinity measured is the most important feature for forecasting both $E.\ Coli$ and enterococci levels. Finally, the spatial and temporal accuracy of both ML models were examined at sites with the lowest coastal water quality. The spatial $E. Coli$ and enterococci models achieved strong R$^2$ values of 0.85 and 0.83, while the temporal models achieved R$^2$ values of 0.74 and 0.67. The temporal model also achieved moderate R$^2$ values of 0.44 and 0.46 at a site with high coastal water quality.
Around 163 million people in India have no access to clean water. Further, water wastage in India is appallingly high at about 125 million litres a day. The numbers are worrying and show no sign of abating. Meanwhile, developed nations are leveraging artificial intelligence to move closer towards sustainability goals. And India is catching up in a small way.
Park, Namyong, Kim, MinHyeok, Hoai, Nguyen Xuan, I., R., McKay, null, Kim, Dong-Kyun
Modeling real-world phenomena is a focus of many science and engineering efforts, such as ecological modeling and financial forecasting, to name a few. Building an accurate model for complex and dynamic systems improves understanding of underlying processes and leads to resource efficiency. Towards this goal, knowledge-driven modeling builds a model based on human expertise, yet is often suboptimal. At the opposite extreme, data-driven modeling learns a model directly from data, requiring extensive data and potentially generating overfitting. We focus on an intermediate approach, model revision, in which prior knowledge and data are combined to achieve the best of both worlds. In this paper, we propose a genetic model revision framework based on tree-adjoining grammar (TAG) guided genetic programming (GP), using the TAG formalism and GP operators in an effective mechanism to incorporate prior knowledge and make data-driven revisions in a way that complies with prior knowledge. Our framework is designed to address the high computational cost of evolutionary modeling of complex systems. Via a case study on the challenging problem of river water quality modeling, we show that the framework efficiently learns an interpretable model, with higher modeling accuracy than existing methods.
Yao, Yuling, Pirš, Gregor, Vehtari, Aki, Gelman, Andrew
Stacking is a widely used model averaging technique that yields asymptotically optimal prediction among all linear averages. We show that stacking is most effective when the model predictive performance is heterogeneous in inputs, so that we can further improve the stacked mixture with a hierarchical model. With the input-varying yet partially-pooled model weights, hierarchical stacking improves average and conditional predictions. Our Bayesian formulation includes constant-weight (complete-pooling) stacking as a special case. We generalize to incorporate discrete and continuous inputs, other structured priors, and time-series and longitudinal data. We demonstrate on several applied problems.