open data

Microsoft's President on Privacy, Artificial Intelligence, and Human Rights


Before a rapt, standing-room-only audience of more than 300 students, faculty, and other members of the Law School community, Microsoft President and Chief Legal Officer Brad Smith '84 returned to campus on October 1 to discuss his new book, Tools and Weapons: The Promise and the Peril of the Digital Age (cowritten with Carol Ann Browne). The event with Gillian Lester, Dean and the Lucy G. Moses Professor of Law of Columbia Law School, and Professor Tim Wu, a leading authority on antitrust law who advocates for breaking up Big Tech companies, was the season's first installment of the Dean's Distinguished Speaker Series. "Brad may be the only tech executive who would willingly share the stage with Professor Wu, given Tim's strong and well-articulated position on the perils associated with the bigness of today's technology companies," said Dean Lester in her introduction. The conversation touched on a number of pressing concerns, including cybersecurity, government regulation, ethics, and human rights. Smith's book addresses the untold ramifications of digital technology's ubiquity in our personal lives, our societies, and our economies.

Introducing HANS, the new AI support tool for Estonian lawmakers -- e-Estonia


Speech recognition is definitely one of the areas where artificial intelligence is showing its power and effectiveness. And what is the last thing that journalists, secretaries, and assistants wish to take care of? But whether for interviews or parliamentary reports, new AI-based applications emerge as useful support tools to let the machine do the boring part of the job and allow people to focus on more demanding and intellectually challenging tasks. In the next year, the Estonian Parliament (Riigikogu) is set to introduce HANS – AI system that will be a valuable ally to the work of lawmakers and employees of the Riigikogu. By deploying speech recognition, it will increase the efficiency and accuracy in transcripts of the sessions.

Equifax and FICO on Applying Machine Learning to Open Data - InformationWeek


Teams that work with open data may feel like they face an explosion of information these days, but there are resources being brought to bear to process such data and stem the tide. Last week's FICO World conference in New York revealed some of the varied ways the credit niche of the financial world tries to apply big data analytics and so-called decision technology. The conference was largely a showcase for data analytics company FICO, but some presentations spoke to a broader context -- using machine learning and other resources to process vast amounts of data. Peter Maynard, senior vice president of data and analytics for strategic client and partner engagement at Equifax spoke about a partnership between his consumer credit reporting agency and FICO. He was joined by Tom Johnson, senior director with FICO, to discuss their joint effort combining data in a platform for decision making.

Dangerous streets of Bratislava! Animated maps using open data in R


At the work recently, I wanted to make some interesting start-up pitch (presentation) ready animated visualization and got some first experience with spatial data (e.g. I enjoyed working with such a type of data and I wanted to improve on working with them, so I decided to try to visualize something interesting with Bratislava (Slovakia) open-data and OpenStreetMaps. I ended with animated maps of violations on Bratislava streets through the time of 2 and a half years. Since spatial time series are analyzed in this post, it still sticks with the blog domain and it is time series data mining You can read more about time series forecasting, representations and clustering in my previous blog posts here. The ultimate goal is to show where and when are the most dangerous places in the capital of Slovakia – Bratislava.

Artificial Intelligence Hackathon


Technology is a powerful platform that can help us identify and address issues of inequality and accessibility within our local and global communities. Our ability to make a difference depends on our individual experiences and backgrounds. In choosing this challenge, you are working to create a solution that assists a community you care about. This challenge gives you the freedom to tackle the social good issue most important to you in whatever way you wish. Solutions can be built with the technology of your choice, and leverage one or multiple Azure services in your solution, with a focus on artificial intelligence techniques.

Big Blue opens up hub for machine learning datasets • DEVCLASS


IBM has launched a repository of datasets for training which data scientists can pick and mix to train their deep learning and machine learning models. The IBM Data Asset eXchange (DAX) is designed to complement the Model Asset eXchange it launched earlier this year, which offers researchers and developers models to deploy or train with their own data. In a blog announcing the data exchange, a quartet of IBM luminaries, wrote "Developers adopting ML models need open data that they can use confidently under clearly defined open data licenses." The data sets in question will be covered by the Linux Foundation's Community Data License Agreement (CDLA) open data licensing framework to enable data sharing and collaboration – "where possible". DAX will also provide "unique access to various IBM and IBM Research datasets."

Open source and open data


There's currently an ongoing debate about the value of data and whether internet companies should do more to share their data with others. At Google we've long believed that open data and open source are good not only for us and our industry, but also benefit the world at large. Our commitment to open source and open data has led us to share datasets, services and software with everyone. For example, Google released the Open Images dataset of 36.5 million images containing nearly 20,000 categories of human-labeled objects. With this data, computer vision researchers can train image recognition systems.

Learning Real Estate Automated Valuation Models from Heterogeneous Data Sources Machine Learning

Real estate appraisal is a complex and important task, that can be made more precise and faster with the help of automated valuation tools. Usually the value of some property is determined by taking into account both structural and geographical characteristics. However, while geographical information is easily found, obtaining significant structural information requires the intervention of a real estate expert, a professional appraiser. In this paper we propose a Web data acquisition methodology, and a Machine Learning model, that can be used to automatically evaluate real estate properties. This method uses data from previous appraisal documents, from the advertised prices of similar properties found via Web crawling, and from open data describing the characteristics of a corresponding geographical area. We describe a case study, applicable to the whole Italian territory, and initially trained on a data set of individual homes located in the city of Turin, and analyze prediction and practical applicability.

CMS releases open data for Machine Learning


The CMS collaboration at CERN is happy to announce the release of its fourth batch of open data to the public. With this release, which brings the volume of its open data to more than 2 PB (or two million GB), CMS has now provided open access to 100% of its research data recorded in proton–proton collisions in 2010, in line with the collaboration's data-release policy. The release also includes several new data and simulation samples. The new release builds upon and expands the scope of the successful use of CMS open data in research and in education. In this release, CMS open data address the ever-growing application of machine learning (ML) to challenges in high-energy physics.

Waymo releases a self-driving open data set for free use by the research community – TechCrunch


Waymo is opening up its significant stores of autonomous driving data with a new Open Data Set it's making available for the purposes of research. The data set isn't for commercial use, but its definition of "research" is fairly broad, and includes researchers at other companies as well as academics. The data set is "one of the largest, riches and most diverse self-driving data sets ever released for research," according to Waymo principal scientist and head of Research, Drago Anguelov, who was at both Zoox and Google prior to joining Waymo last year. Anguelov said in a briefing that the reason he initiated the push to make this data available is that Waymo and several other companies working in the field are "currently hampered by the lack of suitable data sets." "We decided to contribute our part to make, ultimately, researchers in academia ask the right questions -- and for that, they need the right data," Anguelov said.