Goto

Collaborating Authors


22 easy-to-fix worst mistakes for data scientists

@machinelearnbot

Being perfectionist: perfection is always associated with negative ROI, in the business world: 20% of your time spent on a project yields 80% of the value; the remaining 80% yields the remaining 20% (this is also known as the law of diminishing returns) Not spending enough time on documenting your analyses, spreadsheets and code (documenting should eat 25% of your time, and be done on-the-fly, not after completing a project. Without proper documentation, nobody, not even you, will know how to replicate, extract value of, and understand what you've done, six months later. Not spending enough time on prioritizing and updating to-do-lists: talk to your stakeholders (but don't overwhelm them with a bunch of tiny requests), spend 30 minutes per day on prioritizing (use calendars and project management tools), to fix this Not respecting or knowing the lifecycle of data science projects Not creating re-usable procedures or code. Instead spending too much time on tons of one-time analyses Using old techniques on big data, for instance time-consuming clustering algorithms when an automated tagging or indexation algorithms would work millions of times faster to create taxonomies Too much or too little computer science - learn how to deploy distributed algorithms, and to optimize algorithms for speed and memory usage, but talk to your IT team before going too deep into this Create local, internal data marts for your analyses: your sys-admin will hate you. Big companies like highly specialized experts (something dangerous for your career), startups like Jacks of all trades who are master of some of them Your project has no measurable yield - talk to stakeholders to know what the success metrics are, don't make a guess about them Identify the right people to talk to in your organization, and outside of it Avoid silos, and data silos.


22 easy-to-fix worst mistakes for data scientists

@machinelearnbot

Without proper documentation, nobody, not even you, will know how to replicate, extract value of, and understand what you've done, six months later. Big companies like highly specialized experts (something dangerous for your career), startups like Jacks of all trades who are master of some of them Focus on tools rather than business problems Planning communication last Data analysis without a question / plan Your project has no measurable yield - talk to stakeholders to know what the success metrics are, don't make a guess about them Identify the right people to talk to in your organization, and outside of it Avoid silos, and data silos. Without proper documentation, nobody, not even you, will know how to replicate, extract value of, and understand what you've done, six months later. Not spending enough time on prioritizing and updating to-do-lists: talk to your stakeholders (but don't overwhelm them with a bunch of tiny requests), spend 30 minutes per day on prioritizing (use calendars and project management tools), to fix this Create local, internal data marts for your analyses: your sys-admin will hate you. Your project has no measurable yield - talk to stakeholders to know what the success metrics are, don't make a guess about them


Machine Learning Refined: Foundations, Algorithms, and Applications

#artificialintelligence

Providing a unique approach to machine learning, this text contains fresh and intuitive, yet rigorous, descriptions of all fundamental concepts necessary to conduct research, build products, tinker, and play. By prioritizing geometric intuition, algorithmic thinking, and practical real world applications in disciplines including computer vision, natural language processing, economics, neuroscience, recommender systems, physics, and biology, this text provides readers with both a lucid understanding of foundational material as well as the practical tools needed to solve real-world problems. With in-depth Python and MATLAB/OCTAVE-based computational exercises and a complete treatment of cutting edge numerical optimization techniques, this is an essential resource for students and an ideal reference for researchers and practitioners working in machine learning, computer science, electrical engineering, signal processing, and numerical optimization.


Machine Learning Refined - Programmer Books

#artificialintelligence

Providing a unique approach to machine learning, this text contains fresh and intuitive, yet rigorous, descriptions of all fundamental concepts necessary to conduct research, build products, tinker, and play. By prioritizing geometric intuition, algorithmic thinking, and practical real world applications in disciplines including computer vision, natural language processing, economics, neuroscience, recommender systems, physics, and biology, this text provides readers with both a lucid understanding of foundational material as well as the practical tools needed to solve real-world problems. With in-depth Python and MATLAB/OCTAVE-based computational exercises and a complete treatment of cutting edge numerical optimization techniques, this is an essential resource for students and an ideal reference for researchers and practitioners working in machine learning, computer science, electrical engineering, signal processing, and numerical optimization.