Collaborating Authors

Data.Table by Example – Part 3


For this final post, I will cover some advanced topics and discuss how to use data tables within user generated functions. Once again, let's use the Chicago crime data. Let's start by subseting the data. The following code takes the first 50000 rows within the dat dataset, selects four columns, creates three new columns pertaining to the data, and then removes the original date column.

SQLite as a document database


SQLite has had JSON support for a while. However recently it added a killer feature: generated columns. This makes it possible to insert JSON straight into SQLite and then have it extract data and index them, i.e. you can treat SQLite as a document database. This has been possible with PostgreSQL and obviously is what something like Elastic provides but having it available in an embedded database is very nice for lightweight stuff. It's that simple, the d column is extracted from the provided JSON.

5 R data.table fread tips


I'm Sharon Machlis at IDG Communications, here with Episode 50 of Do More With R: 5 things you may not know about data.table's It has several helpful options -- some you might not know about. Or, maybe you knew about them once but since stopped using them. I'll start with a file of daily Covid-19 data from every county in the U.S.: a 12 megabyte file from the New York Times on GitHub. If you'd like to follow along, download the file from the URL on screen 1: Is your file large?

A Method of Grouping and Summarizing Data of Big Text Files in R Language


It is common to use R language to group and summarize data of files. Sometimes we may find ourselves processing comparatively big files which have smaller computed result and bigger source data. We cannot load them wholly to the memory when we need to compute them. The only solutions could be batch importing and computing as well as result merging. We'll use an example in the following to illustrate the way of R language to group and summarize data from big text files.

RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers Artificial Intelligence

When translating natural language questions into SQL queries to answer questions from a database, contemporary semantic parsing models struggle to generalize to unseen database schemas. The generalization challenge lies in (a) encoding the database relations in an accessible way for the semantic parser, and (b) modeling alignment between database columns and their mentions in a given query. We present a unified framework, based on the relation-aware self-attention mechanism, to address schema encoding, schema linking, and feature representation within a text-to-SQL encoder. On the challenging Spider dataset this framework boosts the exact match accuracy to 53.7%, compared to 47.4% for the state-of-the-art model unaugmented with BERT em-beddings. In addition, we observe qualitative improvements in the model's understanding of schema linking and alignment.