"Data centre is an asset that needs to be protected"- Michael Kagan, CTO of NVIDIA On the first day of the NVIDIA GPU Technology Conference, Jensen Huang, founder of NVIDIA revealed the company's three-year DPU roadmap that featured the new NVIDIA BlueField-2 family of DPUs and NVIDIA DOCA software development kit for building applications on DPU-accelerated data centre infrastructure services. Michael Kagan, CTO of NVIDIA recently in a talk, explained the next generation of fully integrated data centres and how supercomputers and edge AI helps in augmenting such initiatives. Kagan stated that the state-of-the-art technologies from both NVIDIA and Mellanox created a great opportunity to build a new class of computers, i.e. the fully-integrated cloud data centres that are designed to handle the workload of the 21st century. Historically, servers were the unit of computing, But eventually, Moore's law has slowed down as the performance of CPUs could not keep up the workload demands. According to Kagan, with the revolution of Cloud AI and edge computing, instead of a single server, the entire data centre has become the new unit of computing designed to handle parallel workloads.
Many have emphasized the need for data for artificial intelligence (AI) and machine learning (ML) algorithms, and metaphors from "data is the new oil" to "data is the new sun" further exacerbate the dire need for better data. However, one aspect of data that is often not explicitly mentioned in these circumstances is the role of master data and how it fundamentally impacts the quality of data that is driving the ML algorithms. In the spirit of paying tribute to management guru Peter Drucker, who's credited with the saying, "culture eats strategy for breakfast," this article explores: According to The DAMA Guide to the Data Management Body of Knowledge, master data represents "data about the business entities that provide context for business transactions." Simply put, for any enterprise, it is the customers whom they sell to, the brands they market, the products they sell, the consumers who use their products, the materials used to make the products, the plants that manufacture their products, the suppliers that supply the materials, the employees who build the products directly or indirectly, and the list goes on. Why is there a lack of awareness in enterprises about master data?
The value of scientific digital-image libraries seldom lies in the pixels of images. For large collections of images, such as those resulting from astronomy sky surveys, the typical useful product is an online database cataloging entries of interest. We focus on the automation of the cataloging effort of a major sky survey and the availability of digital libraries in general. The SKICAT system automates the reduction and analysis of the three terabytes worth of images, expected to contain on the order of 2 billion sky objects. For the primary scientific analysis of these data, it is necessary to detect, measure, and classify every sky object.
As companies increasingly rely on data to power decision making and drive innovation, it's important that this data is timely, accurate, and reliable. In this article we introduce "Key Assets," a new approach taken by some of the best data teams to surface your most important data tables and reports for quick and reliable insights. Have you been 3/4ths of the way done with a data warehouse migration only to discover that you don't know which data assets are right and which ones are wrong? Is your analytics team lost in a sea of spreadsheets, with no life vests in sight? If you answered yes to any of these questions, you're not alone.
Awad, George, Butt, Asad A., Curtis, Keith, Lee, Yooyoung, Fiscus, Jonathan, Godil, Afzal, Delgado, Andrew, Zhang, Jesse, Godard, Eliot, Diduch, Lukas, Smeaton, Alan F., Graham, Yvette, Kraaij, Wessel, Quenot, Georges
The TREC Video Retrieval Evaluation (TRECVID) 2019 was a TREC-style video analysis and retrieval evaluation, the goal of which remains to promote progress in research and development of content-based exploitation and retrieval of information from digital video via open, metrics-based evaluation. Over the last nineteen years this effort has yielded a better understanding of how systems can effectively accomplish such processing and how one can reliably benchmark their performance. TRECVID has been funded by NIST (National Institute of Standards and Technology) and other US government agencies. In addition, many organizations and individuals worldwide contribute significant time and effort. TRECVID 2019 represented a continuation of four tasks from TRECVID 2018. In total, 27 teams from various research organizations worldwide completed one or more of the following four tasks: 1. Ad-hoc Video Search (AVS) 2. Instance Search (INS) 3. Activities in Extended Video (ActEV) 4. Video to Text Description (VTT) This paper is an introduction to the evaluation framework, tasks, data, and measures used in the workshop.
Deploying Machine Learning (ML) algorithms within databases is a challenge due to the varied computational footprints of modern ML algorithms and the myriad of database technologies each with their own restrictive syntax. We introduce an Apache Spark-based micro-service orchestration framework that extends database operations to include web service primitives. Our system can orchestrate web services across hundreds of machines and takes full advantage of cluster, thread, and asynchronous parallelism. Using this framework, we provide large scale clients for intelligent services such as speech, vision, search, anomaly detection, and text analysis. This allows users to integrate ready-to-use intelligence into any datastore with an Apache Spark connector. To eliminate the majority of overhead from network communication, we also introduce a low-latency containerized version of our architecture. Finally, we demonstrate that the services we investigate are competitive on a variety of benchmarks, and present two applications of this framework to create intelligent search engines, and real time auto race analytics systems.
Alteryx, a data management vendor founded in 1997 and based in Irvine, Calif., unveiled its latest platform update in a blog post on Sept. 1, and all of the features included in the release are now generally available to customers. Alteryx previously offered data modeling capabilities with its Assisted Modeling Tool, but new in Alteryx 2020.3 is Automatic Mode within the tool. With a single click of a mouse, users can create a machine learning pipeline that automatically determines the best algorithms, data features and data transformations to create a data model. By adding Automatic Mode, Alteryx is targeting users without a background in data science in addition to data experts already enabled by the Assisted Modeling Tool, according to Dave Menninger, research director of data and analytics research at Ventana Research. "They've adopted a position that you will have data science experts and people who are dabbling in data science, and they've done a good job creating a single platform that those two audiences can share," he said.
Verta, an AI/ModelOps company whose founder created the open source ModelDB catalog for versioning models, has launched with a $10 million Series A led by Intel Capital. The Verta system tackles what is becoming an increasingly familiar problem: not only enabling ML models to get operationalized, but to track their performance and drift over time. Verta is hardly the only tool in the market to do so, but the founder claims that it tracks additional parameters not always caught by model lifecycle management systems. While Verta shares some capabilities with the variety of data science platforms that have grown fairly abundant, its focus is more on the operational challenges of deploying models and keeping them on track. As noted, it starts with model versioning, ModelDB was created by Verta founder Manasi Vartak, a software engineering veteran of Facebook, Google, Microsoft, and Twitter, as part of her doctoral work at MIT. It versions four aspects of models, encompassing code, data sources, hyperparameters, and the compute environment on which the model was designed to run.
According to the World Health Organisation (WHO) [World Health Organization, 2013], the United Nations directing and coordinating health authority, public health surveillance is: The continuous, systematic collection, analysis and interpretation of health-related data needed for the planning, implementation, and evaluation of public health practice. Public health surveillance practice has evolved over time. Although it was limited to pen and paper at the beginning of 20th century, it is now facilitated by huge advances in informatics. Information technology enhancements have changed the traditional approaches of capturing, storing, sharing and analysing of data and resulted efficient and reliable health surveillance techniques [Lombardo and Buckeridge, 2007]. The main objective and challenge of a health surveillance system is the earliest possible detection of a disease outbreak within a society for the purpose of protecting community health. In the past, before the widespread deployment of computers, health surveillance was based on reports received from medical care centres and laboratories.