Microsoft Ignite September 26-30, 2016 Atlanta, GA
This talk presents unsupervised analysis techniques that can be applied to collections of unstructured text documents for the purpose of discovering hidden topical trends, correlations or anomalies in their data. The techniques presented are applicable to a wide range of document types including news stories, technical blogs, customer feedback forms, congressional records, and legal documents, among many, many others. The talk will include introductory descriptions of the processing techniques needed to pre-process text data, discover salient multi-word phrases, and learn latent topic models describing the topical content of a collection of text data. The primary focus of the talk will be on analytic techniques that can be applied to the output of a latent topic model to extract trending topics over time, uncover topical correlations with other document features or meta-data, and discover anomalies in a text corpus. To illustrate these techniques, examples using news wire and congressional record data will demonstrate how important events in news wire data and anomalous congressional actions and interesting correlations can be discovered automatically using the presented unsupervised techniques.
Jul-9-2016, 01:07:22 GMT