I have been reproducing (or trying to reproduce) times series classification results for 23 years. In general, less than half the papers can be reproduced, but since the advent of deep learning, the fraction has gotten worse. A) "We made a best faith effort to reproduce the results in [x], but were unable to do so, thus we omit it from comparison" B) "We made a best faith effort to reproduce the results in [x], but were unable to do so. We do place our results using this method in Table Y with an asterisk, to denote that this is our best understanding of the algorithms performance, but just not reflect the claimed accuracy possible in [x]".
Hello everyone, I recently noticed that the use of TFRecords isn't as popular as it should be. The efficiency and advantages that it provides, including but not limited to easy integration into TPU pipelines for Tensorflow, led me to creating a repository full of command line scripts to convert data from all popular domains like Audio, Text and Images (with video support coming soon) to TFRecord formats. The scripts also contain support for automatic SQLite and CSV datatype parsing and buffering. Along with this, they support an option for multiprocessing and are made with minimization of memory footprint in mind. If you are interested then please check out the repository here, give it a star if it is helpful and let me know if you have any feedback or suggestion.
It's still quite useful, but it does not do any kind of feature selection nor does it consider any kind of feature importance when making predictions. All it does to make predictions is to calculate the distance in feature space between the new observation you wish to make a prediction for and all the other observations it has been trained on, and find the k closest old ones to the new one, then take some aggregate of the target variable of the k closest observations (usually mean for regression and mode for classification). If you add several completely random columns to your data, kNN will use them in calculating the distance to the exact same extent as the meaningful columns. This is opposed to smarter algorithms like linear models that can figure out to ignore features that contain no predictive value. If your model is getting worse when you add new features, it doesn't even mean they contain no value.