Microsoft has released a set of 100,000 questions and answers that artificial intelligence (AI) researchers can use to create systems that can read and answer questions as precisely as a human. "The dataset is called MS MARCO, which stands for Microsoft MAchine Reading COmprehension, and can be used to teach artificial intelligence systems to recognize questions and formulate answers and, eventually, to create systems that can come up with their own answers based on unique questions they have not seen before," said Microsoft in a blog post. By providing realistic questions and answers, the researchers said they can train systems to better deal with the nuances and complexities of questions regular people actually ask, including those queries that have no clear answer or multiple possible answers. "Our dataset is designed not only using real-world data but also removing such constraints so that the new-generation deep learning models can understand the data first before they answer questions," added Li Deng, Partner Research Manager of Microsoft's Deep Learning Technology Centre. The MS MARCO dataset is available for free to any researcher who wants to download it and use it for non-commercial applications, Microsoft said.
As a bona fide information technology professional, you have been inundated recently with the idea that artificial intelligence and machine learning are going to change the way your enterprise does business. Time and time again you have been told that, coupled with big data and IoT, AI is going to transform the enterprise network and how it is managed and that an IT pro's life will never be the same.
Until April, Microsoft boasted of having the largest collection of faces that anyone could use to train facial-recognition algorithms. Since then, the once publicly-available dataset has quietly disappeared. As the Financial Times reports, Microsoft quietly deleted the dataset after the paper called attention to privacy and ethical issues, including use of the dataset by military researchers and Chinese surveillance firms. Microsoft did not immediately respond to a request for comment from Fortune. But it told the Financial Times: "The site was intended for academic purposes. It was run by an employee that is no longer with Microsoft and has since been removed."
TDSP provides recommendations for managing shared analytics and storage infrastructure, including cloud file systems for storing datasets, databases, Big Data clusters (Hadoop, Spark), machine learning services, etc., both on the cloud and on-premises. This is where raw and processed datasets are stored, enabling reproducible analysis. It also avoids duplication, which could lead to inconsistencies and additional infrastructure costs. Scripts are provided to provision the shared resources, track them and allow each team member to connect to those resources securely. Our data science team uses the Microsoft Data Science Virtual Machine as our cloud development environment.