Goto

Collaborating Authors

Natural Language Processing in TensorFlow

#artificialintelligence

If you are a software developer who wants to build scalable AI-powered algorithms, you need to understand how to use the tools to build them. This Specialization will teach you best practices for using TensorFlow, a popular open-source framework for machine learning. In Course 3 of the deeplearning.ai TensorFlow Specialization, you will build natural language processing systems using TensorFlow. You will learn to process text, including tokenizing and representing sentences as vectors, so that they can be input to a neural network.


A Beginner's Guide to AutoML - Solita Data

#artificialintelligence

Automated Machine Learning (AutoML) is a concept that provides the means to utilise existing data and create models for non-Machine Learning experts. In addition to that, AutoML provides Machine Learning (ML) professionals ways to develop and use effective models without spending time on tasks such as data cleaning and preprocessing, feature engineering, model selection, hyperparameter tuning, etc. Before we move any further, it is important to note that AutoML is not some system that has been developed by a single entity. Several organisations have developed their own AutoML packages. These packages cover a broad area, and targets people at different skill levels.


3 + 1 ways of running R on Amazon SageMaker

#artificialintelligence

The R programming language is one of the most commonly used languages in the scientific space, being one of the most commonly used languages for machine learning (probably second following python) and arguably the most popular language amongst mathematicians and statisticians. It is easy to get started with, free to use, with support for many scientific and visualisation libraries. While R can help you analyse your data, the more data you have the more compute power you require and the more impactful your analysis is, the more repeatability and reproducibility is required. Analysts and Data Scientists need to find ways to fulfil such requirements. In this post we briefly describe the main ways of running your R workloads on the cloud, making use of Amazon SageMaker, the end-to-end Machine Learning cloud offering of AWS.


Stream Output When Parsing Big Xml With Elixir

#artificialintelligence

There are two big players in elixir's XML parsing ecosystem: I want to read a huge XML file that has some elements very repeated, and want to produce some kind of "iterator" from it. I'd like to produce some iterator that, when iterated, produces this: Saxy is incredibly fast and performant, but it's based on the concept that, as you read the XML file, you "fill" some state object (with whatever you want, and the amount you want, but, nevertheless, you fill it). In this scenario, I could "fill" the state with the list of items. That, of course, is a lot less memory than it would take to hold the entire XML structure in memory. But still it establishes a relationship between the size of the XML file and the size of the stored in-memory list, which I don't like because that means that if I use a big enough file, I can consume more memory than I'm allowed to. SweetXml provides some function called stream_tags and when you see what it does, it seems that it hits the spot!!! because it says it's just what I need: parse an xml and, as it finds certain tags, stream the SweetXml representation of them, and it doesn't build into memory any structure representing xml.


The Berkeley Crossword Solver

#artificialintelligence

We recently built the Berkeley Crossword Solver (BCS), the first computer program to beat every human competitor in the world's top crossword tournament. Crosswords are challenging for humans and computers alike. Many clues are vague or underspecified and can't be answered until crossing constraints are taken into account. While some clues are similar to factoid question answering, others require relational reasoning or understanding difficult wordplay. The BCS uses a two-step process to solve crossword puzzles.


Introduction to Artificial Intelligence for Beginners - Analytics Vidhya

#artificialintelligence

We have come a long way in the field of Machine Learning / Deep learning that we are now very much interested in AI (Artificial Intelligence), in this article we are going to introduce you to AI. The short and precise answer to Artificial Intelligence depends on the person you are explaining it to. A normal human with little understanding of this technology will relate this with "robots". They will say that AI is a terminator like-object that can react and can think on its own. If you ask this same question to an AI expert, he will say that "it is a set of patterns and algorithms that can generate solutions to everything without being explicitly instructed to do that work".


Monkeying with Dall-E

#artificialintelligence

Can there be a movie or a comic book with AI-generated characters, sets & plots? It is getting closer to possibility and let's get a preview. Automatically generating stuff with these artificially intelligent systems is the trend. One subset of this is image creation from text input. Can we use this to create picture stories?


No Joke: Google's AI Is Smart Enough to Understand Your Humor

#artificialintelligence

Google's natural language AI is smart enough to define jokes. The ability to understand the nuances of human language will lead to better and more natural interactions with machines. Google wants to educate people about the benefits of these kinds of AI smarts through upcoming devices like its Pixel 7. Amid a flurry of new hardware including the Pixel 7, the Pixel Buds Pro and a new Pixel Tablet, Google dropped one development at its I/O developer conference that went largely unnoticed: Its AI can now understand jokes. Jokes, sarcasm and humor require understanding the subtleties of language and human behavior. When a comedian says something sarcastic or controversial, usually the audience can discern the tone and know it's more of an exaggeration, something that's learned from years of human interaction.


Learning New Things and Avoiding Obstacles

Communications of the ACM

ACM A.M. Turing Award recipient Jack Dongarra never intended to work with computers. Initially, the Distinguished Professor at the University of Tennessee and founder of the Innovative Computing Laboratory (ICL) thought he would be a high school science teacher. A chance internship at the Argonne National Laboratory kindled a lifelong interest in numerical methods and software--and, in particular, in linear algebra, which powered the development of Dongarra's groundbreaking techniques for optimizing operations on increasingly complex computer architectures. Your career in computing began serendipitously, with a semester-long internship at Argonne National Laboratory. As an undergraduate, I worked on EISPACK, a software package designed to solve eigenvalue problems.


Resolution of the Burrows-Wheeler Transform Conjecture

Communications of the ACM

The Burrows-Wheeler Transform (BWT) is an invertible text transformation that permutes symbols of a text according to the lexicographical order of its suffixes. BWT is the main component of popular lossless compression programs (such as bzip2) as well as recent powerful compressed indexes (such as the r-index7), central in modern bioinformatics. The compressibility of BWT is quantified by the number r of equal-letter runs in the output. Despite the practical significance of BWT, no nontrivial upper bound on r is known. By contrast, the sizes of nearly all other known compression methods have been shown to be either always within a poly-log n factor (where n is the length of the text) from z, the size of Lempel–Ziv (LZ77) parsing of the text, or much larger in the worst case (by an nε factor for ε 0). In this paper, we show that r (z log2 n) holds for every text. This result has numerous implications for text indexing and data compression; in particular: (1) it proves that many results related to BWT automatically apply to methods based on LZ77, for example, it is possible to obtain functionality of the suffix tree in (z polylog n) space; (2) it shows that many text processing tasks can be solved in the optimal time assuming the text is compressible using LZ77 by a sufficiently large polylog n factor; and (3) it implies the first nontrivial relation between the number of runs in the BWT of the text and of its reverse. In addition, we provide an (z polylog n)-time algorithm converting the LZ77 parsing into the run-length compressed BWT. To achieve this, we develop several new data structures and techniques of independent interest. In particular, we define compressed string synchronizing sets (generalizing the recently introduced powerful technique of string synchronizing sets11) and show how to efficiently construct them. Next, we propose a new variant of wavelet trees for sequences of long strings, establish a nontrivial bound on their size, and describe efficient construction algorithms. Finally, we develop new indexes that can be constructed directly from the LZ77 parsing and efficiently support pattern matching queries on text substrings. Lossless data compression aims to exploit redundancy in the input data to represent it in a small space.