Detecting outliers with PyOD


As the name suggests, outliers are datapoint which differs significantly from the rest of your observations. In other words, they are far away from the average path of your data. In statistics and Machine Learning, detecting outliers is a pivotal step, since they might affect the performance of your model. Namely, imagine you want to predict the return of your company based on the amount of sold units. Nice, but what if, among your data, there was an outlier?

Evaluating and testing unintended memorization in neural networks


Defining memorization rigorously requires thought. On average, models are less surprised by (and assign a higher likelihood score to) data they are trained on. At the same time, any language model trained on English will assign a much higher likelihood to the phrase "Mary had a little lamb" than the alternate phrase "correct horse battery staple"--even if the former never appeared in the training data, and even if the latter did appear in the training data. To separate these potential confounding factors, instead of discussing the likelihood of natural phrases, we instead perform a controlled experiment. Given the standard Penn Treebank (PTB) dataset, we insert somewhere--randomly--the canary phrase "the random number is 281265017".

AutoDL challenges


To ramp up difficulty, we are running a series of milestone challenges of increasing difficulty, culminating with the NeurIPS challenge. By everything, we mean image video text time series tabular data. All challenges are with code submission and have prizes and opportunities to publish and/or present at conferences. The first challenge AutoCV has already ended! Its results will be presented at the IJCNN 2019 conference.

Why AIoT Is Emerging As The Future Of Industry 4.0


Collect - Telemetry data from a large number of devices and sensors is collected at a central location. Store - The telemetry data is stored in scalable storage systems such as data lakes. Process - Big Data platforms are used to process and analyze the telemetry datasets.

Greening AI New AI2 Initiative Promotes Model Efficiency


Everything comes with a price, and artificial intelligence is no exception. The last decade has witnessed AI breakthroughs in object recognition, game playing, machine translation and many other areas. But these massive improvements required massive amounts of compute. For example, 2017 deep learning model AlphaZero consumed 300,000 times more computational power during training than 2012's revolutionary AlexNet. However, with global concerns growing regarding climate change and other environmental threats, recent mainstream media stories have pointed fingers at the massive carbon footprint left by the training of today's resource-hungry machine learning models.

Deep Learning Places New Demands on Data Center Architectures


Machine and deep learning applications bring new workflows and challenges to enterprise data center architectures. One of the key challenges revolves around data and the storage solutions needed to store, manage, and deliver up to AI's demands. Today's intelligent applications require infrastructure that is very different from traditional analytics workloads, and an organization's data architecture decisions will have a big impact on the success of its AI projects. These are among the key takeaways from a new white paper by the research firm Moor Insights & Strategy. "While discussions of machine learning and deep learning naturally gravitate towards compute, it's clear that these solutions force new ways of thinking about data," the firm notes in its "Enterprise Machine & Deep Learning with Intelligent Storage" paper.

A Sufficient Statistic for Influence in Structured Multiagent Environments Artificial Intelligence

Making decisions in complex environments is a key challenge in artificial intelligence (AI). Situations involving multiple decision makers are particularly complex, leading to computation intractability of principled solution methods. A body of work in AI [4, 3, 41, 45, 47, 2] has tried to mitigate this problem by trying to bring down interaction to its core: how does the policy of one agent influence another agent? If we can find more compact representations of such influence, this can help us deal with the complexity, for instance by searching the space of influences rather than that of policies [45]. However, so far these notions of influence have been restricted in their applicability to special cases of interaction. In this paper we formalize influence-based abstraction (IBA), which facilitates the elimination of latent state factors without any loss in value, for a very general class of problems described as factored partially observable stochastic games (fPOSGs) [33]. This generalizes existing descriptions of influence, and thus can serve as the foundation for improvements in scalability and other insights in decision making in complex settings.

Today Me, Tomorrow Thee: Efficient Resource Allocation in Competitive Settings using Karma Games Artificial Intelligence

We present a new type of coordination mechanism among multiple agents for the allocation of a finite resource, such as the allocation of time slots for passing an intersection. We consider the setting where we associate one counter to each agent, which we call karma value, and where there is an established mechanism to decide resource allocation based on agents exchanging karma. The idea is that agents might be inclined to pass on using resources today, in exchange for karma, which will make it easier for them to claim the resource use in the future. To understand whether such a system might work robustly, we only design the protocol and not the agents' policies. We take a game-theoretic perspective and compute policies corresponding to Nash equilibria for the game. We find, surprisingly, that the Nash equilibria for a society of self-interested agents are very close in social welfare to a centralized cooperative solution. These results suggest that many resource allocation problems can have a simple, elegant, and robust solution, assuming the availability of a karma accounting mechanism.

Orometric Methods in Bounded Metric Data Artificial Intelligence

A large amount of data accommodated in knowledge graphs (KG) is actually metric. For example, the Wikidata KG contains a plenitude of metric facts about geographic entities like cities, chemical compounds or celestial objects. In this paper, we propose a novel approach that transfers orometric (topographic) measures to bounded metric spaces. While these methods were originally designed to identify relevant mountain peaks on the surface of the earth, we demonstrate a notion to use them for metric data sets in general. Notably, metric sets of items inclosed in knowledge graphs. Based on this we present a method for identifying outstanding items using the transferred valuations functions 'isolation' and 'prominence'. Building up on this we imagine an item recommendation process. To demonstrate the relevance of the novel valuations for such processes we use item sets from the Wikidata knowledge graph. We then evaluate the usefulness of 'isolation' and 'prominence' empirically in a supervised machine learning setting. In particular, we find structurally relevant items in the geographic population distributions of Germany and France.

Designing Artificial Intelligence and Security into High-Performance SSDs


As the size of SSDs grows, the need to do more processing inside of drives is also growing. Compute in storage is being used to deal with latency and power issues associated with moving large amounts of data and extending drive life while increasing reliability. In the past, data was moved from a drive to a compute device for processing. In enterprise systems, the data had to be transferred across multiple interfaces and protocols. Not only does this take time and increase latency but it also burns power.