Goto

Collaborating Authors

Results


Oracle Database 21c spotlights in-memory processing and ML, adds new low-code APEX cloud service

ZDNet

Among the messages that Oracle is putting out for its flagship database, adding new access paths for developers has become just as important as adding new data types. This month, Oracle is launching the next version of Oracle Database, version 21c. In a session hosted by Andrew Mendelsohn, executive vice president of database server technologies, the company is also announcing a new cloud-based APEX Service designed to carve a new access path for low-code developers who traditionally thought that writing apps for Oracle was complex and expensive. To induce new developers, Oracle is throwing in a free tier to this new cloud service. As Oracle now numbers its releases according to calendar year, 21c is the next release, which was announced as generally available last month.



Leveraging Node Attributes for Incomplete Relational Data

arXiv.org Machine Learning

Relational data are usually highly incomplete in practice, which inspires us to leverage side information to improve the performance of community detection and link prediction. This paper presents a Bayesian probabilistic approach that incorporates various kinds of node attributes encoded in binary form in relational models with Poisson likelihood. Our method works flexibly with both directed and undirected relational networks. The inference can be done by efficient Gibbs sampling which leverages sparsity of both networks and node attributes. Extensive experiments show that our models achieve the state-of-the-art link prediction results, especially with highly incomplete relational data.


Optimization tips and tricks on Azure SQL Server for Machine Learning Services

#artificialintelligence

By using memory-optimized tables, resume features are stored in main memory and disk IO could be significantly reduced. If the database engine server detects more than 8 physical cores per NUMA node or socket, it will automatically create soft-NUMA nodes that ideally contain 8 cores. We then further created 4 SQL resource pools and 4 external resource pools [7] to specify the CPU affinity of using the same set of CPUs in each node. We can create resource governance for R services on SQL Server [8] by routing those scoring batches into different workload groups (Figure.


Apache Spark: A Unified Engine for Big Data Processing

@machinelearnbot

Analyses performed using Spark of brain activity in a larval zebrafish: embedding dynamics of whole-brain activity into lower-dimensional trajectories. The growth of data volumes in industry and research poses tremendous opportunities, as well as tremendous computational challenges. As data sizes have outpaced the capabilities of single machines, users have needed new systems to scale out computations to multiple nodes. As a result, there has been an explosion of new cluster programming models targeting diverse computing workloads.1,4,7,10 At first, these models were relatively specialized, with new models developed for new workloads; for example, MapReduce4 supported batch processing, but Google also developed Dremel13 for interactive SQL queries and Pregel11 for iterative graph algorithms. In the open source Apache Hadoop stack, systems like Storm1 and Impala9 are also specialized. Even in the relational database world, the trend has been to move away from "one-size-fits-all" systems.18 Unfortunately, most big data applications need to combine many different processing types. The very nature of "big data" is that it is diverse and messy; a typical pipeline will need MapReduce-like code for data loading, SQL-like queries, and iterative machine learning. Specialized engines can thus create both complexity and inefficiency; users must stitch together disparate systems, and some applications simply cannot be expressed efficiently in any engine. In 2009, our group at the University of California, Berkeley, started the Apache Spark project to design a unified engine for distributed data processing. Spark has a programming model similar to MapReduce but extends it with a data-sharing abstraction called "Resilient Distributed Datasets," or RDDs.25 Using this simple extension, Spark can capture a wide range of processing workloads that previously needed separate engines, including SQL, streaming, machine learning, and graph processing2,26,6 (see Figure 1).


Apache Spark

Communications of the ACM

Analyses performed using Spark of brain activity in a larval zebrafish: embedding dynamics of whole-brain activity into lower-dimensional trajectories. The growth of data volumes in industry and research poses tremendous opportunities, as well as tremendous computational challenges. As data sizes have outpaced the capabilities of single machines, users have needed new systems to scale out computations to multiple nodes. As a result, there has been an explosion of new cluster programming models targeting diverse computing workloads.1,4,7,10 At first, these models were relatively specialized, with new models developed for new workloads; for example, MapReduce4 supported batch processing, but Google also developed Dremel13 for interactive SQL queries and Pregel11 for iterative graph algorithms. In the open source Apache Hadoop stack, systems like Storm1 and Impala9 are also specialized. Even in the relational database world, the trend has been to move away from "one-size-fits-all" systems.18 Unfortunately, most big data applications need to combine many different processing types. The very nature of "big data" is that it is diverse and messy; a typical pipeline will need MapReduce-like code for data loading, SQL-like queries, and iterative machine learning. Specialized engines can thus create both complexity and inefficiency; users must stitch together disparate systems, and some applications simply cannot be expressed efficiently in any engine. In 2009, our group at the University of California, Berkeley, started the Apache Spark project to design a unified engine for distributed data processing. Spark has a programming model similar to MapReduce but extends it with a data-sharing abstraction called "Resilient Distributed Datasets," or RDDs.25 Using this simple extension, Spark can capture a wide range of processing workloads that previously needed separate engines, including SQL, streaming, machine learning, and graph processing2,26,6 (see Figure 1).