The data set would be astronomy sub-images that are either bad (edge of chip artifacts, bright star saturation and spikes, internal reflections, chip flaws) or good (populated with fuzzy-dot stars and galaxies and asteroids and stuff). Let's say the typical image is 512x512 but it varies a lot. Because the bad features tend to be big, I'd probably like to bin the images down to say 64x64 for compactness and speed. It has to run fast on tens of thousands of images. I'm sort of tempted by the solution of adopting PlaidML as my back end (if I understand what its role is), because it can compile the problem for many architectures, like CUDA, CPU-only, OpenCL.
Recently, I came across a Reddit thread on the different roles in data science and machine learning: data scientist, decision scientist, product data scientist, data engineer, machine learning engineer, machine learning tooling engineer, AI architect, etc. It's difficult to be effective when the data science process (problem framing, data engineering, ML, deployment/maintenance) is split across different people. It leads to coordination overhead, diffusion of responsibility, and lack of a big picture view. IMHO, I believe data scientists can be more effective by being end-to-end. Here, I'll discuss the benefits and counter-arguments, how to become end-to-end, and the experiences of Stitch Fix and Netflix. I find these definitions to be more prescriptive than I prefer. Instead, I have a simple (and pragmatic) definition: An end-to-end data scientist can identify and solve problems with data to deliver value.
Though the community continues to develop new algorithms, state-of-the-art results have stopped improving in the last couple of years. Since RL algorithms that use a tremendous amount of online data to learn from scratch are infeasible to apply in the real-world, much research has moved to fields such as Meta-RL, offline RL, and integrating RL with domain-knowledge, integrating RL and planning, etc. How do you unit test end-to-end ML pipelines?, by u/farmingvillein As perhaps a bit of tldr: once you've got the bare minimum data-replay testing in place ("yeah, it is probably working, because the results are pretty close to what they were before"), I'd encourage you to consider focusing your energy toward thinking of testing as outlier detection. Outliers, in real-world ML systems, tend to be harbingers of things that are wrong systematically, upstream data problems, and logic (pre-/post-processing) problems. How do you transition from a no name international college to FAIR/Brain?, by u/r-sync Coming from a no-name Indian engineering college with meh grades, you do have to get a bit creative, very persistent and build credibility for yourself. The examples above are one way to do so, but you can also maybe articulate your thoughts as really good blog posts and arxiv papers, or show great software engineering skills in open-source (i.e.
I have been on a 30-day challenge to improve my knowledge of Artificial Intelligence (AI), to understand how it works and how it impacts our lives, and this section talks about how not only have we already integrated it in our everyday lives, but in some cases already love it and depend on it. In this fifth section, we tackle "AI in Application." Exploring where AI is prevalent and the data that is being collected already is not surprising but it is humbling how much it has already penetrated our lives and how much we depend on it. Recently, a friend of mine named her baby Sirius. For those that love the Harry Potter books, the immediate connection is to Sirius Black, so of course being a Harry Potter fan I instantly loved it.
With data science and artificial intelligence evolving on a daily basis, the magnitude of information it generates can sometimes be challenging to keep pace with. And that's why all these data science news websites and blogs come with their newsletter that continually churns out relevant and significant information for readers. An excellent form of curated content, newsletters can be extremely informative and insightful for data science professionals, students as well as business leaders. These weekly newsletters provide updated trends of the industry, latest news, different methodologies as well as information on new technologies that can be an exciting learning resource for many. Further, with such a vast amount of information, it is critical for all to stay away from clickbait as well as fake news, and these newsletters can be the perfect rescue for the same.