open data


Equifax and FICO on Applying Machine Learning to Open Data - InformationWeek

#artificialintelligence

Teams that work with open data may feel like they face an explosion of information these days, but there are resources being brought to bear to process such data and stem the tide. Last week's FICO World conference in New York revealed some of the varied ways the credit niche of the financial world tries to apply big data analytics and so-called decision technology. The conference was largely a showcase for data analytics company FICO, but some presentations spoke to a broader context -- using machine learning and other resources to process vast amounts of data. Peter Maynard, senior vice president of data and analytics for strategic client and partner engagement at Equifax spoke about a partnership between his consumer credit reporting agency and FICO. He was joined by Tom Johnson, senior director with FICO, to discuss their joint effort combining data in a platform for decision making.


Dangerous streets of Bratislava! Animated maps using open data in R

#artificialintelligence

At the work recently, I wanted to make some interesting start-up pitch (presentation) ready animated visualization and got some first experience with spatial data (e.g. I enjoyed working with such a type of data and I wanted to improve on working with them, so I decided to try to visualize something interesting with Bratislava (Slovakia) open-data and OpenStreetMaps. I ended with animated maps of violations on Bratislava streets through the time of 2 and a half years. Since spatial time series are analyzed in this post, it still sticks with the blog domain and it is time series data mining You can read more about time series forecasting, representations and clustering in my previous blog posts here. The ultimate goal is to show where and when are the most dangerous places in the capital of Slovakia – Bratislava.


Artificial Intelligence Hackathon

#artificialintelligence

Technology is a powerful platform that can help us identify and address issues of inequality and accessibility within our local and global communities. Our ability to make a difference depends on our individual experiences and backgrounds. In choosing this challenge, you are working to create a solution that assists a community you care about. This challenge gives you the freedom to tackle the social good issue most important to you in whatever way you wish. Solutions can be built with the technology of your choice, and leverage one or multiple Azure services in your solution, with a focus on artificial intelligence techniques.


Big Blue opens up hub for machine learning datasets • DEVCLASS

#artificialintelligence

IBM has launched a repository of datasets for training which data scientists can pick and mix to train their deep learning and machine learning models. The IBM Data Asset eXchange (DAX) is designed to complement the Model Asset eXchange it launched earlier this year, which offers researchers and developers models to deploy or train with their own data. In a blog announcing the data exchange, a quartet of IBM luminaries, wrote "Developers adopting ML models need open data that they can use confidently under clearly defined open data licenses." The data sets in question will be covered by the Linux Foundation's Community Data License Agreement (CDLA) open data licensing framework to enable data sharing and collaboration – "where possible". DAX will also provide "unique access to various IBM and IBM Research datasets."


Open source and open data

#artificialintelligence

There's currently an ongoing debate about the value of data and whether internet companies should do more to share their data with others. At Google we've long believed that open data and open source are good not only for us and our industry, but also benefit the world at large. Our commitment to open source and open data has led us to share datasets, services and software with everyone. For example, Google released the Open Images dataset of 36.5 million images containing nearly 20,000 categories of human-labeled objects. With this data, computer vision researchers can train image recognition systems.


Learning Real Estate Automated Valuation Models from Heterogeneous Data Sources

arXiv.org Machine Learning

Real estate appraisal is a complex and important task, that can be made more precise and faster with the help of automated valuation tools. Usually the value of some property is determined by taking into account both structural and geographical characteristics. However, while geographical information is easily found, obtaining significant structural information requires the intervention of a real estate expert, a professional appraiser. In this paper we propose a Web data acquisition methodology, and a Machine Learning model, that can be used to automatically evaluate real estate properties. This method uses data from previous appraisal documents, from the advertised prices of similar properties found via Web crawling, and from open data describing the characteristics of a corresponding geographical area. We describe a case study, applicable to the whole Italian territory, and initially trained on a data set of individual homes located in the city of Turin, and analyze prediction and practical applicability.


CMS releases open data for Machine Learning

#artificialintelligence

The CMS collaboration at CERN is happy to announce the release of its fourth batch of open data to the public. With this release, which brings the volume of its open data to more than 2 PB (or two million GB), CMS has now provided open access to 100% of its research data recorded in proton–proton collisions in 2010, in line with the collaboration's data-release policy. The release also includes several new data and simulation samples. The new release builds upon and expands the scope of the successful use of CMS open data in research and in education. In this release, CMS open data address the ever-growing application of machine learning (ML) to challenges in high-energy physics.


Waymo releases a self-driving open data set for free use by the research community – TechCrunch

#artificialintelligence

Waymo is opening up its significant stores of autonomous driving data with a new Open Data Set it's making available for the purposes of research. The data set isn't for commercial use, but its definition of "research" is fairly broad, and includes researchers at other companies as well as academics. The data set is "one of the largest, riches and most diverse self-driving data sets ever released for research," according to Waymo principal scientist and head of Research, Drago Anguelov, who was at both Zoox and Google prior to joining Waymo last year. Anguelov said in a briefing that the reason he initiated the push to make this data available is that Waymo and several other companies working in the field are "currently hampered by the lack of suitable data sets." "We decided to contribute our part to make, ultimately, researchers in academia ask the right questions -- and for that, they need the right data," Anguelov said.


Artificial intelligence and open data

#artificialintelligence

In the policies promoted by the European Union, an intimate connection between artificial intelligence and open data has been considered. In this regard, as we highlighted, open data is essential for the proper functioning of artificial intelligence, since the algorithms must be fed by data whose quality and availability is essential for its continuous improvement, as well as to audit its correct operation. Artificial intelligence entails an increase in the sophistication of data processing, since it requires greater precision, updating and quality, which, on the other hand, must be obtained from very diverse sources to increase the quality of the algorithms results. Likewise, an added difficulty is the fact that processing is carried out in an automated way and must offer precise answers immediately to face changing circumstances. Therefore, a dynamic perspective that justifies the need for data -not only to be offered in open and machine-readable format, but also with the highest levels of precision and disaggregation- is needed.


Speculative Data Futures: Karima

#artificialintelligence

I am Karima, which means the generous. It always reminds me of my home, Syria, a generous country destroyed by war. I was born and raised in a refugee camp populated by 5000 Syrians. It is not easy to be born Syrian in a refugee camp, especially if you are a woman. Hunger for a loaf of bread in the refugee camp is connected to a hunger for bodies.