Machine Learning: Filtering Email for Spam or Ham - Code School Blog


You may have seen our previous posts on machine learning -- specifically, how to let your code learn from text and working with stop words, stemming, and spam. So today, we're going to build our machine learning-based spam filter, using the tools we walked through in those posts: tokenizer, stemmer, and naive bayes classifier. We are going to work with bluebird promise library here, so if you are not used to promises, please take a look at the bluebird API reference. Before we begin, it's important to have good training data. You can download some here -- we are interested in two.

Weekly Digest, December 19


Data Science for IoT vs Classic Data Science: 10 Differences Enterprise AI insights from the AI Europe event in London Is it time to consider data in motion in your big data projects?

IBM hits new AI milestone with new industry record for speech recognition - Computer Business Review


The company created a technology that recognises spoken words ever closer to human parity. IBM reached a new AI milestone in speech recognition, achieving an industry record of 5.5% word error rate using the Switchboard linguistic corpus. The company broke the industry record by extending its deep learning technologies and incorporating an acoustic model that learns from positive examples while taking advantage of negative ones. The model gets smarter and performs better when similar speech patterns are repeated. IBM achieved another major AI milestone in conversational speech recognition last year with a computer system that reached a word error rate of 6.9%.

Sophos Adds Advanced Machine Learning to Its Next-Generation Endpoint Protection Portfolio with Acquisition of Invincea


Sophos (LSE: SOPH), a global leader in network and endpoint security, today announced it has entered into an agreement to acquire Invincea, a visionary provider of next-generation malware protection. Invincea's endpoint security portfolio is designed to detect and prevent unknown malware and sophisticated attacks via its patented deep learning neural-network algorithms. It has been consistently ranked as among the best performing machine learning, signature-less next-generation endpoint technologies in third-party testing and rated highly both for high detection and low false-positive rates. Headquartered in Fairfax, Va., Invincea was founded by chief executive officer Anup Ghosh to address the rapidly growing zero-day security threat from nation states, cyber criminals and rogue actors. Invincea's flagship product X by Invincea uses deep learning neural networks and behavioral monitoring to detect previously unseen malware and stops attacks before damage occurs.

Entry Point Data


In this short tutorial I want to provide a short overview of some of my favorite Python tools for common procedures as entry points for general pattern classification and machine learning tasks, and various other data analyses. In this section want to recommend a way for installing the required Python-packages packages if you have not done so, yet. Otherwise you can skip this part. Although they can be installed step-by-step "manually", but I highly recommend you to take a look at the Anaconda Python distribution for scientific computing. Anaconda is distributed by Continuum Analytics, but it is completely free and includes more than 195 packages for science and data analysis as of today.