The excitement around any new technology comes with a side order of fears about how that system or service will affect people. In key areas like cloud computing and social media, it often feels as if the regulators are having to play catch-up with the tech firms that create these innovations and the businesses that exploit them. Yet it is in the area of artificial intelligence (AI) that these fears are perhaps greater than anywhere else. Rather than just being a technology that people will themselves use, some experts believe AI could instead help to replace human decision-making at work and at home. So, how can businesses work to reduce fears and create AI systems that exploit big data ethically?
These are the questions your firm should ask before going down the route of edge analytics and processing. Hadoop is the operating system for big data in the enterprise. So when Cloudera and Hortonworks, the two leading Hadoop distributions and vendors, merged, that was big news in and by itself. Last week's DataWorks Summit Europe was the first big public event for the new Cloudera after the merger, and it sure was not short of interesting news, both on the technology and the business front. That's the name the new company will go by, and there's a new-ish logo and branding to go with this too.
AI has a data quality problem. In a survey of 179 data scientists, over half identified addressing issues related to data quality as the biggest bottleneck in successful AI projects. Big data is so often improperly formatted, lacking metadata, or "dirty," meaning incomplete, incorrect, or inconsistent, that data scientists typically spend 80 percent of their time on cleaning and preparing data to make it usable, leaving them with just 20 percent of their time to focus on actually using data for analysis. This means organizations developing and using AI must devote huge amounts of resources to ensuring they have sufficient amounts of high-quality data so that their AI tools are not useless. As policymakers pursue national strategies to increase their competitiveness in AI, they should recognize that any country that wants to lead in AI must also lead in data quality.
In this new world of artificial intelligence and data management, it's easy to get confused by some of the terms that are most commonly used in the IT world. For example, data science and machine learning have a lot to do with each other. It's not surprising that many people with only a passing knowledge of these disciplines would have trouble figuring out how they differ from one another. Join nearly 200,000 subscribers who receive actionable tech insights from Techopedia. First of all, data science is really a broad, overarching category of technology that encompasses many different types of projects and creations.
These days, pundits galore are proselytizing about the Future of Work. Depending on who you ask, the robots may or may not be taking over, leaving us mere humans pondering how work fits into our lives and whether we're going to be eventually rendered obsolete. Just look at the stark contrast in tone between these two headlines: the Wall Street Journal's White-Collar Robots Are Coming For Jobs versus Wired's Chill: Robots Won't Take All Our Jobs. Who should we *really* believe?! The truth is there isn't one easy answer.
Nvidia has been more than a hardware company for a long time. As its GPUs are broadly used to run machine learning workloads, machine learning has become a key priority for Nvidia. In its GTC event this week, Nvidia made a number of related points, aiming to build on machine learning and extend to data science and analytics. Nvidia wants to "couple software and hardware to deliver the advances in computing power needed to transform data into insights and intelligence." Jensen Huang, Nvidia CEO, emphasized the collaborative aspect between chip architecture, systems, algorithms and applications.
The MIT Institute for Data, Systems, and Society (IDSS) convened professional data scientists, academic researchers, and students from a variety of disciplines for the third annual daylong Women in Data Science (WiDS) conference in Cambridge. WiDS Cambridge is one of many global satellite events of the WiDS conference at Stanford University, where attendees join a global community of data science researchers and practitioners. The conference is open to anyone interested in data science, but strives especially to create opportunities for women in the field to showcase their work and network with each other. "I think WiDS is a great opportunity to bring together women at all professional levels -- students, postdocs, faculty, and professionals in industry -- who are working in data science, building community, and learning from a wide variety of perspectives," said Stefanie Jegelka, an IDSS affiliate faculty member with the Department of Electrical Engineering and Computer Science (EECS). Jegelka is an MIT WiDS planning committee member who also gave a talk exploring the properties of neural networks, focusing on ResNet architecture and neural networks for graphs.
Oceanographers studying the physics of the global ocean have long found themselves facing a conundrum: Fluid dynamical balances can vary greatly from point to point, rendering it difficult to make global generalizations. Factors like the wind, local topography, and meteorological exchanges make it difficult to compare one area to another. To add to the complexity, one would have to analyze billions of data points for numerous parameters -- temperature, salinity, velocity, how things change with depth, whether there is a trend present -- to pinpoint what physics are most dominant in a given region. "You would have to look at an overwhelming number of different global maps and mentally match them up to figure out what matters most where," says Maike Sonnewald, a postdoc working in the MIT Department of Earth, Atmospheric and Planetary Sciences (EAPS) and a member of the EAPS Program in Atmospheres, Oceans and Climate (PAOC). Sonnewald, who has a background in physical oceanography and data science, uses computers to reveal connections and patterns in the ocean that would otherwise be beyond human capability.
How do you keep online trolls in check? Dr. Srijan Kumar, a post-doctoral research fellow in computer science at Stanford University, is developing an AI that predicts online conflict. His research uses data science and machine learning to promote healthy online interactions and curb deception, misbehavior, and disinformation. His work is currently deployed inside Indian e-commerce platform Flipkart, which uses it to spot fake reviewers. We spoke to Dr. Kumar ahead of a lecture on healthy online interactions at USC.
In summer 2013, I interviewed for a lead role in the data science and analytics team at tech-for-good company JustGiving. During the interview, I said I planned to deliver batch machine learning, graph analytics and streaming analytics systems, both in-house and in the cloud. A few years later, my former boss Mike Bugembe and I were both presenting at international conferences, winning awards and becoming authors! Here is my story, and what I learnt on the journey -- plus my recommendations for you. I've always been interested in artificial intelligence (AI), machine learning (ML) and natural language processing (NLP).