Transforming Big Data into Meaningful Insights - insideBIGDATA


In this special guest feature, Marc Alacqua, CEO and founding partner of Signafire, discusses a useful approach to data – known as data fusion – which is essentially alchemy-squared, turning not just one but multiple raw materials in to something greater than the sum of their parts. It goes beyond older methods of big data analysis, like data integration, in which large data sets are simply thrown together in one environment. Marc is a decorated combat veteran of the U.S. Army Special Operations Forces. For his service during Operation Iraqi Freedom, he was cited for "exceptionally conspicuous gallantry" and awarded two Bronze Star Medals and the Army Commendation Medal for Valor. A 20-year veteran and Lieutenant Colonel, Marc has extensive command experience in both combat and peace time, having commanded airborne and light infantry as well as special operations units.

Developing a NLP based PR platform for the Canadian Elections


Elections are a vital part of democracy allowing people to vote for the candidate they think can best lead the country. A candidate's campaign aims to demonstrate to the public why they think they are the best choice. However, in this age of constant media coverage and digital communications, the candidate is scrutinized at every step. A single misquote or negative news about a candidate can be the difference between him winning or losing the election. It becomes crucial to have a public relations manager who can guide and direct the candidate's campaign by prioritizing specific campaign activities. One critical aspect of the PR manager's work is to understand the public perception of their candidate and improve public sentiment about the candidate.

Detecting Cyberattack Entities from Audit Data via Multi-View Anomaly Detection with Feedback

AAAI Conferences

In this paper, we consider the problem of detecting unknown cyberattacks from audit data of system-level events. A key challenge is that different cyberattacks will have different suspicion indicators, which are not known beforehand. To address this we consider a multi-view anomaly detection framework, where multiple expert-designed ``views" of the data are created for capturing features that may serve as potential indicators. Anomaly detectors are then applied to each view and the results are combined to yield an overall suspiciousness ranking of system entities. Unfortunately, there is often a mismatch between what anomaly detection algorithms find and what is actually malicious, which can result in many false positives. This problem is made even worse in the multi-view setting, where only a small subset of the views may be relevant to detecting a particular cyberattack. To help reduce the false positive rate, a key contribution of this paper is to incorporate feedback from security analysts about whether proposed suspicious entities are of interest or likely benign. This feedback is incorporated into subsequent anomaly detection in order to improve the suspiciousness ranking toward entities that are truly of interest to the analyst. For this purpose, we propose an easy to implement variant of the perceptron learning algorithm, which is shown to be quite effective on benchmark datasets. We evaluate our overall approach on real attack data from a DARPA red team exercise, which include multiple attacks on multiple operating systems. The results show that the incorporation of feedback can significantly reduce the time required to identify malicious system entities.

SureID - Vice President of Data Science/Machine Learning (Portland Metro Area)


Job Requirements • Master's degree or equivalent work experience in machine learning • Strong hands on experience solving complex problems using unsupervised and supervised machine learning algorithms • Proficiency in feature selection and feature engineering • Strong experience with big data tools and techniques, like Hadoop and Spark • Broad knowledge of machine learning algorithms, with ability to select and apply appropriate algorithms to specific problem domains • Ability to collaborate with domain experts to efficiently and effectively identify and extract previously unfamiliar domain knowledge Preferred • Knowledge in Natural Language Processing, especially named entity recognition • Experience in problems associated with people-centric data, like name parsing, name comparison, address parsing etc. • Experience with frameworks and techniques in deep learning and deep neural networks • Experience with computer vision, particularly facial recognition and comparison About SureID SureID, Inc. integrates leading edge products and services into solutions that combine identity enrollment, authentication, background screening, and access management to make facilities, assets, and people safer and more secure. Using SureID's patented programs, highly secure facilities – such as military installations, government buildings, manufacturing and distribution sites, ports, and commercial builds – can increase security and streamline access for authorized personnel. SureID has a proven track record for successfully servicing government, military and commercial clients. The RAPIDGate Program already serves thousands of companies and hundreds of thousands of RAPIDGate badge-holders who enjoy streamlined access into Department of Defense and Homeland Security facilities. SureID is a privately-held company founded in November 2001 and headquartered in Hillsboro, OR.

Data mining, text mining, natural language processing, and computational linguistics: some definitions


Every once in a while an innocuous technical term suddenly enters public discourse with a bizarrely negative connotation. I first noticed the phenomenon some years ago, when I saw a Republican politician accusing Hillary Clinton of "parsing." From the disgust with which he said it, he clearly seemed to feel that parsing was morally equivalent to puppy-drowning. It seemed quite odd to me, since I'd only ever heard the word "parse" used to refer to the computer analysis of sentence structures. The most recent word to suddenly find itself stigmatized by Republicans (yes, it does somehow always seem to be Republican politicians who are involved in this particular kind of linguistic bullshittery) is "encryption."