Global Big Data Conference

#artificialintelligence

Data is the fuel that drives AI. But there's a big difference in the quality of fuel you can put into your AI engine. If your enterprise can create the biggest stockpile of the highest quality training data, it will likely win the AI race, but getting there is no easy task. For all the advanced skills that data scientists possess, there's no escaping the fact they often spend up to 80% of their time cleaning and prepping data. Without good, clean data to feed into machine learning algorithms, the data scientist can't be sure that the model will predict anything worthwhile.


Training Data: Why Scale Is Critical for Your AI Future

#artificialintelligence

Data is the fuel that drives AI. But there's a big difference in the quality of fuel you can put into your AI engine. If your enterprise can create the biggest stockpile of the highest quality training data, it will likely win the AI race, but getting there is no easy task. For all the advanced skills that data scientists possess, there's no escaping the fact they often spend up to 80% of their time cleaning and prepping data. Without good, clean data to feed into machine learning algorithms, the data scientist can't be sure that the model will predict anything worthwhile.


Reporter's Notebook: 6 Key Takeaways from Strata Hadoop World

#artificialintelligence

The big data ecosystem was on full display at last week's Strata Hadoop World conference in San Jose. At the ripe old age of 10, Hadoop is still the driving force, but newer frameworks like Spark and Kafka are gaining steam. Here are some of the top trends your Datanami editor pulled from the show based on observations and discussions with attendees and vendors. Let's start with the biggest news from Strata, which was the rise of Kafka and real-time streaming. As Kafka creator Jay Kreps tweeted it seemed "like every other presentation at Strata this year was on streaming data."


How To Build a Data Science Team Now

#artificialintelligence

Business execs who are leading their companies down the data science track may be dismayed by the difficulty and expertise of hiring a data scientist, the so-called "unicorns" who command quarter-million-dollar salaries. But fear not: While companies can benefit from having a full-fledged data scientist on staff, it is by no means a requirement to actually doing data science. The team-approach to data science started soon after Harvard Business Review named data scientist the "sexiest job of the 21st century" back in 2012, spurring a run on data scientists, applied mathematicians, and other quantitative types that still hasn't let up yet. Thanks to the continued rapid evolution of technology – not to mention workplace workarounds put into place due to the aforementioned unicorn shortage – the team approach has grown in popularity. One business leader with real-world experience putting together data science teams (with and without actual data scientists) is Amy O'Connor, who built Nokia's first data lake and is currently Cloudera's Chief Data and Information Officer.


Which Programming Language Is Best for Big Data?

#artificialintelligence

Nothing is quite so personal for programmers as what language they use. Why a data scientist, engineer, or application developer picks one over the other has as much to do with personal preference and their employers' IT culture as it does the qualities and characteristics of the language itself. But when it comes to big data, there are some definite patterns that emerge. The most important factor in choosing a programming language for a big data project is the goal at hand. If the organization is manipulating data, building analytics, and testing out machine learning models, they will probably choose a language that's best suited for that task.