notebook


TensorFlow 2.0 on Azure: Fine-tuning BERT for question tagging

#artificialintelligence

In this blog, we aim to highlight some of the ways that Azure can streamline the building, training, and deployment of your TensorFlow model. In addition to reading this blog, check out the demo discussed in more detail below, showing how you can use TensorFlow 2.0 in Azure to fine-tune a BERT (Bidirectional Encoder Representations from Transformers) model for automatically tagging questions. TensorFlow 1.x is a powerful framework that enables practitioners to build and run deep learning models at massive scale. TensorFlow 2.0 builds on the capabilities of TensorFlow 1.x by integrating more tightly with Keras (a library for building neural networks), enabling eager mode by default, and implementing a streamlined API surface. We've integrated Tensorflow 2.0 with the Azure Machine Learning service to make bringing your TensorFlow workloads into Azure as seamless as possible.


Create a machine learning model pipeline to choose the best model for your problem

#artificialintelligence

It was inevitable to expect artificial intelligence, which facilitates every aspect of our lives, to facilitate its own development process. Building better models requires more complex time-intensive and costly AI procedures, which require expertise from cleansing the data to feature engineering, designing the architectures to parameter optimization. To ease this process and make it efficient in terms of time and effort, you need to automate these workloads. With the aim of creating AI for AI, IBM introduced a service on Watson Studio called AutoAI. AutoAI can be run in public clouds and in private clouds, including IBM Cloud Pak for Data.


AI World 2019 Reporters' Notebook: European Commission Warning On Data Privacy; News from Exelon, PARC, CVS, and More - AI Trends

#artificialintelligence

The AI World Conference & Expo is packed few days with news emanating from the Expo floor, the plenary sessions, a hackathon and tracks. There's more good stuff than a writer can possibly fit into post-event coverage. Our Reporters' Notebook comprises some of the bits and pieces that we collected over the three days in Boston. In an address to attendees of AI World 2019 in Boston recently, Paul F. Nemitz, principal Advisor, Directorate General and Justice and Consumers, European Commission, issued a warning about privacy. In a talk entitled, "Democracy, Ethics and the Rule of Law in the Age of AI," Nemitz provided the European view of privacy, calling the GDPR (General Data Protection Regulation, in effect May 2018) "the most sophisticated system for protecting personal data."


JavierAntoran/Bayesian-Neural-Networks

#artificialintelligence

The project is written in python 2.7 and Pytorch 1.0.1. If CUDA is available, it will be used automatically. The models can also run on CPU as they are not excessively big. We carried out homoscedastic and heteroscedastic regression experiements on toy datasets, generated with (Gaussian Process ground truth), as well as on real data (six UCI datasets). The heteroscedastic notebooks contain both toy and UCI dataset experiments for a given (ModelName).


Machine Learning on Autonomous Database: A Practical Example

#artificialintelligence

The dataset used for building a network intrusion detection classifier is the classic KDD you can download here, released as first version in the 1999 KDD Cup, with 125.973 records in the training set. It was built for DARPA Intrusion Detection Evaluation Program by MIT Lincoln Laboratory. The dataset is already split into training and test dataset. The sub-classes into training dataset are 22 for attacks, and one "normal" for traffic allowed. The list of attacks and the associations with the four categories reported above is hold in this file.


Databricks raises $400 million at a $6.2 billion valuation

#artificialintelligence

Prescient are the entrepreneurs who predicted data would become the new oil, like Ali Ghodsi, Andy Konwinski, Ion Stoica, Matei Zaharia, Patrick Wendell, Reynold Xin, and Scott Shenker. They're the cofounders of Databricks, a San Francisco-based company that provides a suite of enterprise-focused scalable data science and data engineering tools. Since 2013, the year Databricks opened for business, it's had no trouble attracting customers. But this week kicked into high gear the company's uninterrupted march toward market domination. Databricks this morning announced that it's closed a $400 million series F fundraising round led by Andreessen Horowitz with participation from Microsoft, Alkeon Capital Management, BlackRock, Coatue Management, Dragoneer Investment Group, Geodesic, Green Bay Ventures, New Enterprise Associates, T. Rowe Price, and Tiger Global Management.


Google integrates Cloud AutoML with Kaggle ZDNet

#artificialintelligence

Google on Monday announced it's integrating Cloud AuotML into Kaggle, its platform for data scientists. Cloud AutoML, which Google unveiled in 2018, automates the creation of machine learning models. The software makes it possible to build custom machine learning models without any specialized machine learning knowledge. Integrating AutoML into the Kaggle platform advances its mission to "empower our community of data scientists by providing them with the skills and tools they need to lead in their field," Google wrote in a blog post. Kaggle, which was acquired by Google in March 2017, specializes in Jupyter notebooks used by data scientists.


Coding habits for data scientists

#artificialintelligence

While this may be fine for notebooks targeted at teaching people about the machine learning process, in real projects it's a recipe for unmaintainable mess. The lack of good coding habits makes code hard to understand and consequently, modifying code becomes painful and error-prone. This makes it increasingly difficult for data scientists and developers to evolve their ML solutions. In this article, we'll share techniques for identifying bad habits that add to complexity in code as well as habits that can help us partition complexity.


Frameworks for Machine Learning Model Management - inovex-Blog

#artificialintelligence

In my previous blog post „how to manage machine learning models" I explained the difficulties within the process of developing a good machine learning model and motivated using a tool to support data scientists with this challenge. First there will be one paragraph per framework that describes the project and shows some code examples. In the end of the article you will find a framework comparison and recommendations when to use which framework. As with my previous post the sklearn dataset on Boston-Housing prices will be used as basis. You can find a notebook to play with in this github repo. This notebook also includes instructions how to install the frameworks as well as some other functions we will use within the code examples below, but that won't be discussed further, to place focus on the framework specific parts and omit boilerplate code. DVC means „data (science) version control" and aims to do for data science what git already does for software development: Making development processes traceable and reproducible.


Can we do better than Convolutional Neural Networks?

#artificialintelligence

The British Machine Vision Conference (BMVC), finished about two weeks ago in Cardiff, UK, is one of the top conferences in computer vision & pattern recognition with a competitive acceptance rate of 28%. Compared to others, it's a small event, so you have plenty of time to walk around posters and talk to presenters one-on-one, which I found really nice. I presented a poster on Image Classification with Hierarchical Multigraph Networks on which I mainly worked during my internship at SRI International under the supervision of Xiao Lin, Mohamed Amer (homepage) and my PhD advisor Graham Taylor. In the paper, we basically try to answer the question "Can we do better than Convolutional Neural Networks?". Here I discuss this question and support my arguments by results.