Goto

Collaborating Authors

 transactional database


Beyond Object Identification: A Giant-Leap into Pattern Discovery in Imagery Data

#artificialintelligence

A critical question that arises after identifying the objects (or class labels) in an imagery database is: "How are the various objects discovered in an imagery database correlated with one another?" This article tries to answer this question by providing a generic framework that can facilitate the readers to discover hidden correlations between objects in the imagery database. The portion of this article is drawn from our work published in IEEE BIGDATA 2021 [1].) The framework to discover the correlation between the objects in an imagery database is shown in Figure 1. Demonstration: In this demo, we first pass the image data into a trained model (e.g., resnet50) and extract objects and their scores.


It's About Time for InfluxData

#artificialintelligence

These are heady times for InfluxDB, which is the world's most popular time-series database, which has been the fastest growing category of databases the past two years, per DB-Engines.com. But when Paul Dix and his partner founded it a decade ago, the company behind the time-series database and the product itself and looked much different. In fact, InfluxDB went through several transformations to get to where it is today, mirroring the evolution of the time-series database category. And more change appears on the horizon. Dix and Todd Persen co-founded Errplane, the predecessor to InfluxData, back in June 2012 with the idea of building a SaaS metrics and monitoring platform, ร  la Datadog or New Relic.


Characterizing Transactional Databases for Frequent Itemset Mining

arXiv.org Artificial Intelligence

This paper presents a study of the characteristics of transactional databases used in frequent itemset mining. Such characterizations have typically been used to benchmark and understand the data mining algorithms working on these databases. The aim of our study is to give a picture of how diverse and representative these benchmarking databases are, both in general but also in the context of particular empirical studies found in the literature. Our proposed list of metrics contains many of the existing metrics found in the literature, as well as new ones. Our study shows that our list of metrics is able to capture much of the datasets' inner complexity and thus provides a good basis for the characterization of transactional datasets. Finally, we provide a set of representative datasets based on our characterization that may be used as a benchmark safely.


Synthetic Dataset Generation with Itemset-Based Generative Models

arXiv.org Artificial Intelligence

Limited availability of real data hinders the development and growth of knowledge in all kinds of scientific and industrial endeavours. The field of synthetic data generation tries to overcome this problem by developing data generators that produce datasets without any privacy or publishing restrictions. In this paper we propose data generators that take an original real dataset as input, and produce "fake copies" of it that preserve much of the structure of the original dataset without revealing actual information from it. Synthetic data should capture characteristics from the original data and should also represent them in a general way. Therefore, another important advantage of using synthetic data is that it may allow researchers to discover new information and insights that are not present in real datasets by fine-tuning the parameters of the data generation process.


Labeled data brings machine learning applications to life

#artificialintelligence

Over the years, the types and quantity of analytics data have evolved, along with the repositories in which the data is stored. Data warehouses were bumped from center stage by big data systems better-suited to storing new forms of data, such as social media posts and machine logs. However, the refined data found in data warehouses may have a big role to play in machine learning and AI initiatives. In this Q&A, Svetlana Sicular, a research VP at Gartner, says that simply coupling AI and data together doesn't make for the magic wand that some people think it will. Sicular provides an overview of the past and present of data stores and discusses how transactional and labeled data can provide useful information for AI and machine learning applications.


Walking With AI: How to Spot, Store and Clean the Data You Need

#artificialintelligence

Last August, data science leader Monica Rogati unveiled a new way for entrepreneurs to think about artificial intelligence. Modeled after psychologist Abraham Maslow's five-tier hierarchy of psychological needs, her AI hierarchy of needs has become a conference favorite for illustrating how to incorporate AI into a business. Despite entrepreneurs' excitement around AI, Rogati's hierarchy makes an uncomfortable point. Few companies are ready to adopt AI. Most are struggling to fulfill fundamental needs, such as reliable data flow and storage.