Goto

Collaborating Authors

 comparison operator




Python Data Structures Tutorial

#artificialintelligence

Also explains sequence and string functions, slicing, concatenating, iterating, sorting, etc. with code examples. This course combines conceptual lectures to explain how a data structure works, and code lectures that walk through how to implement a data structure in Python code. All the code lectures are based on Python 3 code in a Jupyter notebook. Data structures covered in this course include native Python data structures String, List, Tuple, Set, and Dictionary, as well as Stacks, Queues, Heaps, Linked Lists, Binary Search Trees, and Graphs. The list data type has some more methods.


SQL: A Full Fledged Guide from Basics to Advance Level

#artificialintelligence

This article was published as a part of the Data Science Blogathon. According to the Bureau of Labor Statistics, the job outlook for computer and information research scientists, data scientists is projected to grow by at least 19 per cent by 2026. Data is collected and processed in every company regardless of the domain. Data scientists dive into the data to find valuable insights beneficial to the company. Most companies store and manage their data with Relational Database Management System (RDBMS).


Aggregate Semantics for Propositional Answer Set Programs

arXiv.org Artificial Intelligence

Answer Set Programming (ASP) emerged in the late 1990ies as a paradigm for Knowledge Representation and Reasoning. The attractiveness of ASP builds on an expressive high-level modeling language along with the availability of powerful off-the-shelf solving systems. While the utility of incorporating aggregate expressions in the modeling language has been realized almost simultaneously with the inception of the first ASP solving systems, a general semantics of aggregates and its efficient implementation have been long-standing challenges. Aggregates have been proposed and widely used in database systems, and also in the deductive database language Datalog, which is one of the main precursors of ASP. The use of aggregates was, however, still restricted in Datalog (by either disallowing recursion or only allowing monotone aggregates), while several ways to integrate unrestricted aggregates evolved in the context of ASP. In this survey, we pick up at this point of development by presenting and comparing the main aggregate semantics that have been proposed for propositional ASP programs. We highlight crucial properties such as computational complexity and expressive power, and outline the capabilities and limitations of different approaches by illustrative examples.


Efficient Computation of Probabilistic Dominance in Robust Multi-Objective Optimization

arXiv.org Machine Learning

Real-world problems typically require the simultaneous op timization of several, often conflicting objectives. Many of these multi-objective optimization problems are characterized by wide ranges of uncertainties in their deci sion variables or objective functions, which further increases the complexity of optim ization. To cope with such uncertainties, robust optimization is widely studied aiming to distinguish candidate solutions with uncertain objectives specified by confidence intervals, probability distributions or sampled data. However, existing techniques most ly either fail to consider the actual distributions or assume uncertainty as instance s of uniform or Gaussian distributions. This paper introduces an empirical approac h that enables an efficient comparison of candidate solutions with uncertain objectiv es that can follow arbitrary distributions. Given two candidate solutions under compar ison, this operator calculates the probability that one solution dominates the other in terms of each uncertain objective. It can substitute for the standard comparison op erator of existing optimization techniques such as evolutionary algorithms to ena ble discovering robust solutions to problems with multiple uncertain objectives. Th is paper also proposes to incorporate various uncertainties in well-known multi-ob jective problems to provide a benchmark for evaluating uncertainty-aware optimization techniques. The proposed comparison operator and benchmark suite are integrated int o an existing optimization tool that features a selection of multi-objective optimiza tion problems and algorithms. Experiments show that in comparison with existing techniqu es, the proposed approach achieves higher optimization quality at lower overheads. The authors are with the Department of Computer Science, Friedr ich-Alexander-Universit at Erlangen-N urnberg (FAU), Erlangen 91058, Germany. A a candidate solution B a candidate solution B beta distribution C comparison operator c positive constant E expected value e approximation error f objective function N Gaussian distribution N number of samples or quantile cuts n number of decision variables m number of objective functions S sequence of samples s sample from an uncertain objective's distribution U uniform distribution u uncertainty added to an optimization problem V ar variance X random variable x decision variable γ comparison threshold δ tolerance (bound on an error) σ standard deviation ω interval width in a histogram 2 1 Introduction Real-world problems typically demand solutions that are optimized with respect to multiple criteria called objectives. In these so-called multi-objective optimization problems, the objectives often conflict with each other such that no single solution can b e found to be optimal in all objectives. Instead, one usually searches for a set of non-d ominated solutions known as Pareto front or Pareto set that provide decent tradeoffs among objectives.


Cost Based Optimizer in Apache Spark 2.2 - The Databricks Blog

@machinelearnbot

This is a joint engineering effort between Databricks' Apache Spark engineering team (Sameer Agarwal and Wenchen Fan) and Huawei's engineering team (Ron Hu and Zhenhua Wang) Apache Spark 2.2 recently shipped with a state-of-art cost-based optimization framework that collects and leverages a variety of per-column data statistics (e.g., cardinality, number of distinct values, NULL values, max/min, average/max length, etc.) to improve the quality of query execution plans. Leveraging these statistics helps Spark to make better decisions in picking the most optimal query plan. Examples of these optimizations include selecting the correct build side in a hash-join, choosing the right join type (broadcast hash-join vs. shuffled hash-join) or adjusting a multi-way join order, among others. In this blog, we'll take a deep dive into Spark's Cost Based Optimizer (CBO) and discuss how Spark collects and stores these statistics, optimizes queries, and show its performance impact on TPC-DS benchmark queries. At its core, Spark's Catalyst optimizer is a general library for representing query plans as trees and sequentially applying a number of optimization rules to manipulate them.


Description Logics and Fuzzy Probability

AAAI Conferences

Uncertainty and vagueness are pervasive phenomena in real-life knowledge. They are supported in extended description logics that adapt classical description logics to deal with numerical probabilities or fuzzy truth degrees. While the two concepts are distinguished for good reasons, they combine in the notion of probably, which is ultimately a fuzzy qualification of probabilities. Here, we develop existing propositional logics of fuzzy probability into a full-blown description logic, and we show decidability of several variants of this logic under Lukasiewicz semantics. We obtain these results in a novel generic framework of fuzzy coalgebraic logic; this enables us to extend our results to logics that combine crisp ingredients including standard crisp roles and crisp numerical probabilities with fuzzy roles and fuzzy probabilities.