riselab
Opaque raises $9.5M seed to secure sensitive data in the cloud – TechCrunch
Opaque, a new startup born out of Berkely's RISELabs, announced a $9.5 million seed round today to build a solution to access and work with sensitive data in the cloud in a secure way, even with multiple organizations involved. Intel Capital led today's investment with participation by Race Capital, The House Fund and FactoryHQ. The company helps customers work with secure data in the cloud while making sure the data they are working on is not being exposed to cloud providers, other research participants or anyone else, says company president Raluca Ada Popa. "What we do is we use this very exciting hardware mechanism called Enclave, which [operates] deep down in the processor -- it's a physical black box -- and only gets decrypted there. Company co-founder Ion Stoica, who was a co-founder at Databricks, says the startup's solution helps resolve two conflicting trends. On one hand, businesses increasingly want to make use of data, but at the same time are seeing a growing trend toward privacy. Opaque is designed to resolve this by giving customers access to their data in a safe and fully encrypted way. Data is the world's most valuable (and vulnerable) resource The company describes the solution as "a novel combination of two key technologies layered on top of state-of-the-art cloud security--secure hardware enclaves and cryptographic fortification." This enables customers to work with data -- for example to build machine learning models -- without exposing the data to others, yet while generating meaningful results. Popa says this could be helpful for hospitals working together on cancer research, who want to find better treatment options without exposing a given hospital's patient data to other hospitals, or banks looking for money laundering without exposing customer data to other banks, as a couple of examples. Investors were likely attracted to the pedigree of Popa, a computer security and applied crypto professor at UC Berkeley and Stoica, who is also a Berkeley professor and co-founded Databricks. Both helped found RISELabs at Berkeley where they developed the solution and spun it out as a company. Mark Rostick, vice president and senior managing director at lead investor Intel Capital says his firm has been working with the founders since the startup's earliest days, recognizing the potential of this solution to help companies find complex solutions even when there are multiple organizations involved sharing sensitive data. "Enterprises struggle to find value in data across silos due to confidentiality and other concerns.
Why supervised learning is more common than reinforcement learning
Supervised learning is a more commonly used form of machine learning than reinforcement learning in part because it's a faster, cheaper form of machine learning. With data sets, a supervised learning model can be mapped to inputs and outputs to create image recognition or machine translation models. A reinforcement learning algorithm, on the other hand, must observe, and that can take time, said UC Berkeley professor Ion Stoica. Stoica works on robotics and reinforcement learning at UC Berkeley's RISELab, and if you're a developer working today, then you've likely used or come across some of his work that has built part of the modern infrastructure for machine learning. He spoke today as part of Transform, an annual AI event VentureBeat holds that this year takes place online.
Will AI revolutionise the future of healthcare?
Few industries are as data-intensive as medicine. Medical data comes in many forms: images, audio, video, unstructured text and structured information. All this data suffers from the traditional problems experienced by other industries: missing information, corrupt values, suspicious outliers, lack of labelling, typographic errors and more. As medical databases multiply, cleaning and labelling information is becoming ever more critical. While we are some way from solving this challenge, we are seeing important progress with the likes of Holoclean and Snorkel.
Why Every Python Developer Will Love Ray
There are many reasons why Python has emerged as the number one language for data science. It's easy to get started and relatively forgiving for beginners, yet it's also powerful and extensible enough for experts to take on complex tasks. But there's one aspect of Python that has bedeviled developers in the big data age: Getting Python to scale past a single node. Solving that dilemma is the number one goal of Project Ray. The name "Ray" will ring a bell if you've been following the goings-on at RISELab, the advanced computing laboratory formed at UC Berkeley.
Understanding deep neural networks
Michael Mahoney will speak on "Principled tools for analyzing weight matrices of production-scale deep neural networks" at the Artificial Intelligence conference in London, 14-17 October 2019. Subscribe to the O'Reilly Data Show Podcast to explore the opportunities and techniques driving big data, data science, and AI. Find us on Stitcher, TuneIn, iTunes, SoundCloud, RSS. In this episode of the Data Show, I speak with Michael Mahoney, a member of RISELab, the International Computer Science Institute, and the Department of Statistics at UC Berkeley. A physicist by training, Mahoney has been at the forefront of many important problems in large-scale data analysis.
$10 million for Berkeley RISELab's AI research
The National Science Foundation today announced that UC Berkeley's RISELab has been awarded an Expeditions in Computing award, providing $10 million in funding over five years to enable game-changing advances in real-time decision making technologies. The award was one of three announced today for research teams pursuing large-scale, far-reaching and potentially transformative research in computer and information science and engineering. RISELab's award will be used to develop technology for an era in which artificial intelligence systems will make decisions that will play an increasingly central role in people's lives in areas such as healthcare, transportation and business. For example, the researchers say that these systems will revolutionize healthcare through early identification of patients at risk, cell-level diagnosis and treatment using nanoprobes, and robotic surgery. These systems could also reduce traffic congestion and help eliminate fatalities by powering autonomous vehicles and unmanned drones, or make businesses safer by detecting and defending in real-time against financial fraud and internet attacks.
Ray's New Library Targets High Speed Reinforcement Learning
Data scientists looking to push the ball forward in the field of reinforcement learning may want to check out RLlib, a new library released as open source last month by researchers affiliated with RISELab. According to researchers, the goal of RLlib is to enable users to break down the various components that go into a reinforcement learning, thereby making them more scalable, easier to integrate, and easier to resuse. Reinforcement learning is a type of supervised learning that's gaining popularity as a way to quickly train programs to perform tasks optimally in a world awash in less-than-optimal training data. Instead of training a model with pristine data, which is ideal in supervised learning, the reinforcement learning model learns from the data environment as it naturally exists, and uses a simple feedback mechanism (the reinforcement signal) to nudge the model towards the ideal solution. The practical advantage of the reinforcement approach is that it seeks to achieve a balance between being able to interpret uncharted data (which is where unsupervised learning algorithms flourish) and exploiting existing knowledge (where supervised learning typically excels).
The Next Data Revolution: Intelligent Real-Time Decisions
Over the past decade, big data analysis and applications have revolutionized practices in business and science. They enabled new businesses (e.g., Facebook, Netflix), to disrupt existing industries (e.g., Airbnb, Uber), and accelerated scientific discovery (genomics, astronomy, biology). Today, we are seeing glimpses of the next revolution in data and computation, driven by three trends. First, there is a rapidly growing segment of the economy (e.g., Apple, Facebook, GE) that collects vast amounts of consumer and industrial information and uses this information to provide new services. This trend is spreading widely via the increasing ubiquity of networked sensors in devices like cell phones, thermostats and cars.
After Spark: Ray project tackles real-time machine learning
RISELab, the successor to the U.C. Berkeley group that created Apache Spark, is hatching a project that could replace Spark--or at least displace it for key applications. Ray is a distributed framework designed for low-latency real-time processing, such as machine learning. Created by two doctoral students at RISELab, Philipp Moritz and Robert Nishihara, it works with Python to run jobs either on a single machine or distributed across a cluster, using C for components that need speed. The main aim for Ray, according to an article at Datanami, is to create a framework that can provide better speeds than Spark. Spark was intended to be faster than what it replaced (mainly, MapReduce), but it still suffers from design decisions that make it difficult to write applications with "complex task dependencies" because of its internal synchronization mechanisms.
Meet Ray, the Real-Time Machine-Learning Replacement for Spark
Researchers at UC Berkeley's RISELab have developed a new distributed framework designed to enable Python-based machine learning and deep learning workloads to execute in real-time with MPI-like power and granularity. Called Ray, the framework is ostensibly a replacement for Spark, which is seen as too slow for some real-world AI applications, and should be ready for production use in less than a year. Ray is one of the first technologies to emerge from RISELab, the research group at Berkeley that followed highly successful AMPLab, which generated a host of compelling distributed technologies that have impacted the field of high performance and enterprise computing alike, including Spark, Mesos, Tachyon, and others. One of the advisors for the old AMPLab and the current RISELab, Computer Science Professor Michael Jordan, discussed the core principles and drivers behind Ray during the recent Strata Hadoop World conference in San Jose, California. "Spark was developed because my students were complaining about Hadoop," Jordan said during a keynote address on March 16.