NEW YORK: Microsoft has released a set of 100,000 questions and answers that artificial intelligence (AI) researchers can use to create systems that can read and answer questions as precisely as a human. "The dataset is called MS MARCO, which stands for Microsoft MAchine Reading COmprehension, and can be used to teach artificial intelligence systems to recognise questions and formulate answers and, eventually, to create systems that can come up with their own answers based on unique questions they have not seen before," said Microsoft in a blog post. By providing realistic questions and answers, the researchers said they can train systems to better deal with the nuances and complexities of questions regular people actually ask, including those queries that have no clear answer or multiple possible answers. "Our dataset is designed not only using real-world data but also removing such constraints so that the new-generation deep learning models can understand the data first before they answer questions," added Li Deng, Partner Research Manager of Microsoft's Deep Learning Technology Centre. The MS MARCO dataset is available for free to any researcher who wants to download it and use it for non-commercial applications, Microsoft said.
The Microsoft Research Outreach team has worked extensively with the external research community to enable adoption of cloud-based research infrastructure over the past few years. Through this process, we experienced the ubiquity of Jim Gray's fourth paradigm of discovery based on data-intensive science – that is, almost all research projects have a data component to them. This data deluge also demonstrated a clear need for curated and meaningful datasets in the research community, not only in computer science but also in interdisciplinary and domain sciences. Today we are excited to launch Microsoft Research Open Data – a new data repository in the cloud dedicated to facilitating collaboration across the global research community. Microsoft Research Open Data, in a single, convenient, cloud-hosted location, offers datasets representing many years of data curation and research efforts by Microsoft that were used in published research studies.
As a bona fide information technology professional, you have been inundated recently with the idea that artificial intelligence and machine learning are going to change the way your enterprise does business. Time and time again you have been told that, coupled with big data and IoT, AI is going to transform the enterprise network and how it is managed and that an IT pro's life will never be the same.
Developing robust and resilient machine learning models requires diversity in the teams working on the models as well as in the datasets used to train the models, says Diana Kelley of Microsoft. "If you don't understand the datasets that you are using properly, it's a potential to automate bias," she says. Kelley is the cybersecurity field chief technology officer for Microsoft and a cybersecurity architect, executive adviser and author. She leverages her more than 25 years of cyber risk and security experience to provide advice and guidance to CSOs, CIOs and CISOs at some of the world's largest companies. Previously, she was the global executive security adviser at IBM.