AITopics | frequent item

Collaborating Authors

frequent item

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Topkapi: Parallel and Fast Sketches for Finding Top-K Frequent Elements

Ankush Mandal, He Jiang, Anshumali Shrivastava, Vivek Sarkar

Neural Information Processing SystemsNov-20-2025, 20:03:44 GMT

As a result, several solutions have been proposed to solve this problem approximately.

algorithm, artificial intelligence, data mining, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Harris County > Houston (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Artificial Intelligence (0.93)
Information Technology > Data Science > Data Mining (0.46)

Add feedback

Learning-Augmented Frequency Estimation in Sliding Windows

Shahout, Rana, Sabek, Ibrahim, Mitzenmacher, Michael

arXiv.org Artificial IntelligenceSep-17-2024

We show how to utilize machine learning approaches to improve sliding window algorithms for approximate frequency estimation problems, under the ``algorithms with predictions'' framework. In this dynamic environment, previous learning-augmented algorithms are less effective, since properties in sliding window resolution can differ significantly from the properties of the entire stream. Our focus is on the benefits of predicting and filtering out items with large next arrival times -- that is, there is a large gap until their next appearance -- from the stream, which we show improves the memory-accuracy tradeoffs significantly. We provide theorems that provide insight into how and by how much our technique can improve the sliding window algorithm, as well as experimental results using real-world data sets. Our work demonstrates that predictors can be useful in the challenging sliding window setting.

algorithm, prediction, predictor, (15 more...)

arXiv.org Artificial Intelligence

2409.11516

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Illinois > Cook County > Chicago (0.06)
North America > United States > California > Santa Clara County > San Jose (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry: Information Technology (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Learning-Based Heavy Hitters and Flow Frequency Estimation in Streams

Shahout, Rana, Mitzenmacher, Michael

arXiv.org Artificial IntelligenceJun-23-2024

Identifying heavy hitters and estimating the frequencies of flows are fundamental tasks in various network domains. Existing approaches to this challenge can broadly be categorized into two groups, hashing-based and competing-counter-based. The Count-Min sketch is a standard example of a hashing-based algorithm, and the Space Saving algorithm is an example of a competing-counter algorithm. Recent works have explored the use of machine learning to enhance algorithms for frequency estimation problems, under the algorithms with prediction framework. However, these works have focused solely on the hashing-based approach, which may not be best for identifying heavy hitters. In this paper, we present the first learned competing-counter-based algorithm, called LSS, for identifying heavy hitters, top k, and flow frequency estimation that utilizes the well-known Space Saving algorithm. We provide theoretical insights into how and to what extent our approach can improve upon Space Saving, backed by experimental results on both synthetic and real-world datasets. Our evaluation demonstrates that LSS can enhance the accuracy and efficiency of Space Saving in identifying heavy hitters, top k, and estimating flow frequencies.

algorithm, heavy hitter, hitter, (15 more...)

arXiv.org Artificial Intelligence

2406.1627

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Virginia > Alexandria County > Alexandria (0.04)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry: Information Technology (1.00)

Technology:

Information Technology > Communications (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Revisiting Regex Generation for Modeling Industrial Applications by Incorporating Byte Pair Encoder

Wang, Desheng, Liu, Jiawei, Qi, Xiang, Sun, Baolin, Zhang, Peng

arXiv.org Artificial IntelligenceJun-24-2020

Regular expression is important for many natural language processing tasks especially when used to deal with unstructured and semi-structured data. This work focuses on automatically generating regular expressions and proposes a novel genetic algorithm to deal with this problem. Different from the methods which generate regular expressions from character level, we first utilize byte pair encoder (BPE) to extract some frequent items, which are then used to construct regular expressions. The fitness function of our genetic algorithm contains multi objectives and is solved based on evolutionary procedure including crossover and mutation operation. In the fitness function, we take the length of generated regular expression, the maximum matching characters and samples for positive training samples, and the minimum matching characters and samples for negative training samples into consideration. In addition, to accelerate the training process, we do exponential decay on the population size of the genetic algorithm. Our method together with a strong baseline is tested on 13 kinds of challenging datasets. The results demonstrate the effectiveness of our method, which outperforms the baseline on 10 kinds of data and achieves nearly 50 percent improvement on average. By doing exponential decay, the training speed is approximately 100 times faster than the methods without using exponential decay. In summary, our method possesses both effectiveness and efficiency, and can be implemented for the industry application.

evolutionary algorithm, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2005.02558

Country:

Asia > China > Zhejiang Province > Hangzhou (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

Topkapi: Parallel and Fast Sketches for Finding Top-K Frequent Elements

Mandal, Ankush, Jiang, He, Shrivastava, Anshumali, Sarkar, Vivek

Neural Information Processing SystemsDec-31-2018

Identifying the top-K frequent items is one of the most common and important operations in large data processing systems. As a result, several solutions have been proposed to solve this problem approximately. In this paper, we identify that in modern distributed settings with both multi-node as well as multi-core parallelism, existing algorithms, although theoretically sound, are suboptimal from the performance perspective. In particular, for identifying top-K frequent items, Count-Min Sketch (CMS) has fantastic update time but lack the important property of reducibility which is needed for exploiting available massive data parallelism. On the other end, popular Frequent algorithm (FA) leads to reducible summaries but the update costs are significant. In this paper, we present Topkapi, a fast and parallel algorithm for finding top-K frequent items, which gives the best of both worlds, i.e., it is reducible as well as efficient update time similar to CMS. Topkapi possesses strong theoretical guarantees and leads to significant performance gains due to increased parallelism, relative to past work.

algorithm, cms, topkapi, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Harris County > Houston (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Quebec > Montreal (0.04)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence (0.93)
Information Technology > Data Science > Data Mining (0.46)
Information Technology > Architecture > Distributed Systems (0.34)

Add feedback

Topkapi: Parallel and Fast Sketches for Finding Top-K Frequent Elements

Mandal, Ankush, Jiang, He, Shrivastava, Anshumali, Sarkar, Vivek

Neural Information Processing SystemsDec-31-2018

algorithm, artificial intelligence, data mining, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence (0.93)
Information Technology > Data Science > Data Mining (0.46)
Information Technology > Architecture > Distributed Systems (0.34)

Add feedback

Abstract Representations and Frequent Pattern Discovery

Ozkural, Eray

arXiv.org Artificial IntelligenceFeb-9-2012

We discuss the frequent pattern mining problem in a general setting. From an analysis of abstract representations, summarization and frequent pattern mining, we arrive at a generalization of the problem. Then, we show how the problem can be cast into the powerful language of algorithmic information theory. This allows us to formulate a simple algorithm to mine for all frequent patterns.

data mining, machine learning, pattern recognition, (18 more...)

arXiv.org Artificial Intelligence

1202.2167

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)

Add feedback