Goto

Collaborating Authors

Silicon Valley siphons our data like oil. But the deepest drilling has just begun

#artificialintelligence

Customers in the UK will soon find out. Recent reports suggest that three of the country's largest supermarket chains are rolling out surge pricing in select stores. This means that prices will rise and fall over the course of the day in response to demand. Buying lunch at lunchtime will be like ordering an Uber at rush hour. This may sound pretty drastic, but far more radical changes are on the horizon.


The Incredible Ways Heineken Uses Big Data, The Internet of Things And Artificial Intelligence (AI)

#artificialintelligence

Every industry can benefit from Big Data, IoT and AI, and that includes brewers. Dutch brewer Heineken has been a worldwide brewing leader for the last 150 years, but today, as the No. 1 brewer in Europe and No. 2 in the world they are ramping up their results thanks to the use of big data and AI. As the company sets out to better compete in the formidable U.S. beer market they plan to leverage the vast amounts of data they collect. Currently they sell more than 8.5 million barrels of its various beer brands here in the U.S., but they hope to increase those numbers with data-driven improvements and AI augmentation to its operations, marketing, advertising and customer experience.


Prioritization of Domain-Specific Web Information Extraction

AAAI Conferences

It is often desirable to extract structured information from raw web pages for better information browsing, query answering, and pattern mining. many such Information Extraction (IE) technologies are costly and applying them at the web-scale is impractical. In this paper, we propose a novel prioritization approach where candidate pages from the corpus are ordered according to their expected contribution to the extraction results and those with higher estimated potential are extracted earlier. Systems employing this approach can stop the extraction process at any time when the resource gets scarce (i.e., not all pages in the corpus can be processed), without worrying about wasting extraction effort on unimportant pages. More specifically, we define a novel notion to measure the value of extraction results and design various mechanisms for estimating a candidate page’s contribution to this value. We further design and build the Extraction Prioritization (EP) system with efficient scoring and scheduling algorithms, and experimentally demonstrate that EP significantly outperforms the naive approach and is more flexible than the classifier approach.



Top Data Sources for Journalists in 2018 (350 Sources)

@machinelearnbot

There are many different types of sites that provide a wealth of free, freemium and paid data that can help audience developers and journalists with their reporting and storytelling efforts, The team at State of Digital Publishing would like to acknowledge these, as derived from manual searches and recognition from our existing audience. Kaggle's a site that allows users to discover machine learning while writing and sharing cloud-based code. Relying primarily on the enthusiasm of its sizable community, the site hosts dataset competitions for cash prizes and as a result it has massive amounts of data compiled into it. Whether you're looking for historical data from the New York Stock Exchange, an overview of candy production trends in the US, or cutting edge code, this site is chockful of information. It's impossible to be on the Internet for long without running into a Wikipedia article.