nan value
Early wind turbine alarm prediction based on machine learning: AlarmForecasting
Shah, Syed Shazaib, Tan, Daoliang
Alarm data is pivotal in curbing fault behavior in Wind Turbines (WTs) and forms the backbone for advancedpredictive monitoring systems. Traditionally, research cohorts have been confined to utilizing alarm data solelyas a diagnostic tool, merely indicative of unhealthy status. However, this study aims to offer a transformativeleap towards preempting alarms, preventing alarms from triggering altogether, and consequently avertingimpending failures. Our proposed Alarm Forecasting and Classification (AFC) framework is designed on twosuccessive modules: first, the regression module based on long short-term memory (LSTM) for time-series alarmforecasting, and thereafter, the classification module to implement alarm tagging on the forecasted alarm. Thisway, the entire alarm taxonomy can be forecasted reliably rather than a few specific alarms. 14 Senvion MM82turbines with an operational period of 5 years are used as a case study; the results demonstrated 82%, 52%,and 41% accurate forecasts for 10, 20, and 30 min alarm forecasts, respectively. The results substantiateanticipating and averting alarms, which is significant in curbing alarm frequency and enhancing operationalefficiency through proactive intervention.
NaN-Propagation: A Novel Method for Sparsity Detection in Black-Box Computational Functions
When numerically evaluating a function's gradient, sparsity detection can enable substantial computational speedups through Jacobian coloring and compression. However, sparsity detection techniques for black-box functions are limited, and existing finite-difference-based methods suffer from false negatives due to coincidental zero gradients. These false negatives can silently corrupt gradient calculations, leading to difficult-to-diagnose errors. We introduce NaN-propagation, which exploits the universal contamination property of IEEE 754 Not-a-Number values to trace input-output dependencies through floating-point numerical computations. By systematically contaminating inputs with NaN and observing which outputs become NaN, the method reconstructs conservative sparsity patterns that eliminate a major source of false negatives. We demonstrate this approach on an aerospace wing weight model, achieving a 1.52x speedup while uncovering dozens of dependencies missed by conventional methods -- a significant practical improvement since gradient computation is often the bottleneck in optimization workflows. The technique leverages IEEE 754 compliance to work across programming languages and math libraries without requiring modifications to existing black-box codes. Furthermore, advanced strategies such as NaN payload encoding via direct bit manipulation enable faster-than-linear time complexity, yielding speed improvements over existing black-box sparsity detection methods. Practical algorithms are also proposed to mitigate challenges from branching code execution common in engineering applications.
PCCC: The Pairwise-Confidence-Constraints-Clustering Algorithm
Baumann, Philipp, Hochbaum, Dorit S.
We consider a semi-supervised $k$-clustering problem where information is available on whether pairs of objects are in the same or in different clusters. This information is either available with certainty or with a limited level of confidence. We introduce the PCCC algorithm, which iteratively assigns objects to clusters while accounting for the information provided on the pairs of objects. Our algorithm can include relationships as hard constraints that are guaranteed to be satisfied or as soft constraints that can be violated subject to a penalty. This flexibility distinguishes our algorithm from the state-of-the-art in which all pairwise constraints are either considered hard, or all are considered soft. Unlike existing algorithms, our algorithm scales to large-scale instances with up to 60,000 objects, 100 clusters, and millions of cannot-link constraints (which are the most challenging constraints to incorporate). We compare the PCCC algorithm with state-of-the-art approaches in an extensive computational study. Even though the PCCC algorithm is more general than the state-of-the-art approaches in its applicability, it outperforms the state-of-the-art approaches on instances with all hard constraints or all soft constraints both in terms of running time and various metrics of solution quality. The source code of the PCCC algorithm is publicly available on GitHub.
Seven Killer Memory Optimization Techniques Every Pandas User Should Know
Once we load a DataFrame into the Python environment, we typically perform a wide range of modifications on the DataFrame, don't we? These include adding new columns, renaming headers, deleting columns, altering row values, replacing NaN values, and many more. Standard Assignment intends to create a new copy of the DataFrame after transformation, leaving the original DataFrame untouched. As a result of the standard assignment, two distinct Pandas DataFrames (original and transformed) co-exist in the environment (df and df_copy above), doubling the memory utilization. In contrast to the standard assignment operations, inplace assignment operations intend to modify the original DataFrame itself without creating a new Pandas DataFrame object.
30 Very Useful Pandas Functions for Everyday Data Analysis Tasks
Python's Pandas library is the most widely used library in Python. Because this is the data manipulation library that is necessary for every aspect of data analysis or machine learning. Even if you are working on data visualization or machine learning, some data manipulation will be there anyway. In this article, I will list the Pandas functions that are necessary for everyday use and arguably will be enough to perform the regular data manipulation tasks. For this article, I will use a public dataset from Kaggle called the FIFA dataset. The user license is mentioned here. Let's start talking about the functions: The first function to mention is read_csv or read_excel.
Fuel consumption prediction -- on cAInvas
Predict the quantity of fuel consumed during drives. The mileage of a vehicle is defined as the average distance traveled on a specified amount of fuel. But distance is not the only factor that affects fuel consumption. Here, we take into account multiple factors like speed, temperatures inside and outside, AC, and other weather conditions like rain or sun besides distance to predict the consumption of different types of fuels during drives. Predicting the fuel consumption given distance and other factors vice versa (predicting distance given fuel) can prove useful in planning trips as well as performing real-time predictions during driving.
Data Cleaning of an Investment Project with Python
In this article, we will discuss data cleaning and some insights from the analysis of fund investment companies. The investment company wants to know the top nine countries for investment from the data. The data is available here. We observe that the two data have column permalink and company permalink, in which the organization name at first index of both data frame is same but one is in capital and one is in lower case. Now, we will check the number of unique companies in both the data set.
Machine Learning with Low Code
Many people said that building machine learning is so hard and uses complex code. Actually, we can use a simple code to develop it. But in the real case, the important to build machine learning is knowing the problem and finding the solution. Without knowing the problem, we could not find a solution. Coding is just a bridge to find the solution.