hitter
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > Texas > Harris County > Houston (0.04)
- (7 more...)
Learning-Based Heavy Hitters and Flow Frequency Estimation in Streams
Shahout, Rana, Mitzenmacher, Michael
Identifying heavy hitters and estimating the frequencies of flows are fundamental tasks in various network domains. Existing approaches to this challenge can broadly be categorized into two groups, hashing-based and competing-counter-based. The Count-Min sketch is a standard example of a hashing-based algorithm, and the Space Saving algorithm is an example of a competing-counter algorithm. Recent works have explored the use of machine learning to enhance algorithms for frequency estimation problems, under the algorithms with prediction framework. However, these works have focused solely on the hashing-based approach, which may not be best for identifying heavy hitters. In this paper, we present the first learned competing-counter-based algorithm, called LSS, for identifying heavy hitters, top k, and flow frequency estimation that utilizes the well-known Space Saving algorithm. We provide theoretical insights into how and to what extent our approach can improve upon Space Saving, backed by experimental results on both synthetic and real-world datasets. Our evaluation demonstrates that LSS can enhance the accuracy and efficiency of Space Saving in identifying heavy hitters, top k, and estimating flow frequencies.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Virginia > Alexandria County > Alexandria (0.04)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- (3 more...)
Graph Encoding and Neural Network Approaches for Volleyball Analytics: From Game Outcome to Individual Play Predictions
Tracy, Rhys, Xia, Haotian, Rasla, Alex, Wang, Yuan-Fang, Singh, Ambuj
This research aims to improve the accuracy of complex volleyball predictions and provide more meaningful insights to coaches and players. We introduce a specialized graph encoding technique to add additional contact-by-contact volleyball context to an already available volleyball dataset without any additional data gathering. We demonstrate the potential benefits of using graph neural networks (GNNs) on this enriched dataset for three different volleyball prediction tasks: rally outcome prediction, set location prediction, and hit type prediction. We compare the performance of our graph-based models to baseline models and analyze the results to better understand the underlying relationships in a volleyball rally. Our results show that the use of GNNs with our graph encoding yields a much more advanced analysis of the data, which noticeably improves prediction results overall. We also show that these baseline tasks can be significantly improved with simple adjustments, such as removing blocked hits. Lastly, we demonstrate the importance of choosing a model architecture that will better extract the important information for a certain task. Overall, our study showcases the potential strengths and weaknesses of using graph encodings in sports data analytics and hopefully will inspire future improvements in machine learning strategies across sports and applications by using graphbased encodings.
- Leisure & Entertainment > Games (0.68)
- Leisure & Entertainment > Sports > Soccer (0.47)
- Leisure & Entertainment > Sports > Baseball (0.46)
Differentially Private Heavy Hitter Detection using Federated Analytics
Chadha, Karan, Chen, Junye, Duchi, John, Feldman, Vitaly, Hashemi, Hanieh, Javidbakht, Omid, McMillan, Audra, Talwar, Kunal
In this work, we study practical heuristics to improve the performance of prefix-tree based algorithms for differentially private heavy hitter detection. Our model assumes each user has multiple data points and the goal is to learn as many of the most frequent data points as possible across all users' data with aggregate and local differential privacy. We propose an adaptive hyperparameter tuning algorithm that improves the performance of the algorithm while satisfying computational, communication and privacy constraints. We explore the impact of different data-selection schemes as well as the impact of introducing deny lists during multiple runs of the algorithm. We test these improvements using extensive experimentation on the Reddit dataset~\cite{caldas2018leaf} on the task of learning the most frequent words.
- North America > United States (0.04)
- Europe > Italy > Veneto > Venice (0.04)
- Europe > Croatia > Zagreb County > Zagreb (0.04)
A New Perspective for Shuttlecock Hitting Event Detection
This article introduces a novel approach to shuttlecock hitting event detection. Instead of depending on generic methods, we capture the hitting action of players by reasoning over a sequence of images. To learn the features of hitting events in a video clip, we specifically utilize a deep learning model known as SwingNet. This model is designed to capture the relevant characteristics and patterns associated with the act of hitting in badminton. By training SwingNet on the provided video clips, we aim to enable the model to accurately recognize and identify the instances of hitting events based on their distinctive features. Furthermore, we apply the specific video processing technique to extract the prior features from the video, which significantly reduces the learning difficulty for the model. The proposed method not only provides an intuitive and user-friendly approach but also presents a fresh perspective on the task of detecting badminton hitting events. The source code will be available at https://github.com/TW-yuhsi/A-New-Perspective-for-Shuttlecock-Hitting-Event-Detection.
VREN: Volleyball Rally Dataset with Expression Notation Language
Xia, Haotian, Tracy, Rhys, Zhao, Yun, Fraisse, Erwan, Wang, Yuan-Fang, Petzold, Linda
This research is intended to accomplish two goals: The first goal is to curate a large and information rich dataset that contains crucial and succinct summaries on the players' actions and positions and the back-and-forth travel patterns of the volleyball in professional and NCAA Div-I indoor volleyball games. While several prior studies have aimed to create similar datasets for other sports (e.g. badminton and soccer), creating such a dataset for indoor volleyball is not yet realized. The second goal is to introduce a volleyball descriptive language to fully describe the rally processes in the games and apply the language to our dataset. Based on the curated dataset and our descriptive sports language, we introduce three tasks for automated volleyball action and tactic analysis using our dataset: (1) Volleyball Rally Prediction, aimed at predicting the outcome of a rally and helping players and coaches improve decision-making in practice, (2) Setting Type and Hitting Type Prediction, to help coaches and players prepare more effectively for the game, and (3) Volleyball Tactics and Attacking Zone Statistics, to provide advanced volleyball statistics and help coaches understand the game and opponent's tactics better. We conducted case studies to show how experimental results can provide insights to the volleyball analysis community. Furthermore, experimental evaluation based on real-world data establishes a baseline for future studies and applications of our dataset and language. This study bridges the gap between the indoor volleyball field and computer science.
- North America > United States > California > Santa Barbara County > Santa Barbara (0.14)
- South America > Venezuela (0.04)
- North America > United States > Hawaii (0.04)
- (3 more...)
- Leisure & Entertainment > Sports > Volleyball (0.95)
- Leisure & Entertainment > Sports > Soccer (0.67)
Memories of My First Baseball Game
I can still recall every detail of April 7, 2062, Monsanto Opening Day for the Mets presented by DraftKings at the Polo Ralph Lauren grounds. It was magical to walk into the stadium and study each team's nine designated hitters before Dad and I made our in-game bets. We picked the wrong winner, but I scored some credits by correctly predicting that my father would whiff on his wager that the AmazonMetaAlphabet Mariners' lead-off hitter, Machine Gun Kelly III, would draw a walk. As it turned out, Mr. Kelly missed his at-bat waiting for a delayed drone delivery of curly fries. That's a quaint inefficiency you don't see anymore, but even then such things had already started to vanish from the game.
Human Fallibility and the Case for Robot Baseball Umpires
I, for one, will welcome our robot umpire overlords, at least when it comes to calling balls and strikes. The automated strike zone is coming, probably within the next three seasons, and I am here for it. If you've spent any time on Twitter during baseball season, especially the postseason the last few years, you've probably stumbled on fans arguing for #RobotUmpsNow against those who argue for "the human element," two sides of the ongoing debate over whether baseball should move to automated calling of balls and strikes. It came up yet again in the 2019 World Series, when umpire Lance Barksdale missed two obvious calls in Game 5, one of which he openly blamed on Washington catcher Yan Gomes, which led Nationals manager Davey Martinez to yell at Barksdale to "wake up," and another so egregious that the victim, Victor Robles, jumped in anger and tossed his batting gloves after Barksdale called him out on a pitch that never even saw the strike zone. Both calls were bad, and in both cases there was at least the appearance that Barksdale was punishing the Nationals--punishing Gomes for assuming the strike call before it happened, then punishing the whole team later for questioning him in the first place.
- Information Technology > Artificial Intelligence > Robots (0.62)
- Information Technology > Communications > Social Media (0.56)
Heavy Hitters via Cluster-Preserving Clustering
We develop a new algorithm for the turnstile heavy hitters problem in general turnstile streams, the EXPANDERSKETCH, which finds the approximate top-k items in a universe of size n using the same asymptotic O(k log n) words of memory and O(log n) update time as the COUNTMIN and COUNTSKETCH, but requiring only O(k poly(log n)) time to answer queries instead of the O(n log n) time of the other two. The notion of "approximation" is the same l2 sense as the COUNTSKETCH, which given known lower bounds is the strongest guarantee one can achieve in sublinear memory. Our main innovation is an efficient reduction from the heavy hitters problem to a clustering problem in which each heavy hitter is encoded as some form of noisy spectral cluster in a graph, and the goal is to identify every cluster. Since every heavy hitter must be found, correctness requires that every cluster be found. We thus need a "cluster-preserving clustering" algorithm that partitions the graph into pieces while finding every cluster. To do this we first apply standard spectral graph partitioning, and then we use some novel local search techniques to modify the cuts obtained so as to make sure that the original clusters are sufficiently preserved. Our clustering algorithm may be of broader interest beyond heavy hitters and streaming algorithms. Finding "frequent" or "top-k" items in a dataset is a common task in data mining. In the data streaming literature, this problem is typically referred to as the heavy hitters problem, which is as follows: a frequency vector x Rn is initialized to the zero vector, and we process a stream of updates update(i, Δ) for Δ R, with each such update causing the change xi xi Δ . The goal is to identify coordinates in x with large weight (in absolute value) while using limited memory.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (5 more...)
Bayesball: Bayesian analysis of batting average – Towards Data Science
One of the topics in data science or statistics I found interesting, but having difficulty understanding is Bayesian analysis. During the course of my General Assembly's Data Science Immersive boot camp, I have had a chance to explore Bayesian statistics, but I really think I need some review and reinforcement. This is my personal endeavour to have a better understanding of Bayesian thinking, and how it can be applied to real-life cases. For this post, I am mainly inspired by a Youtube series by Rasmus Bååth, "Introduction to Bayesian data analysis". He is really good at giving you an intuitive understanding of Bayesian analysis, not by bombarding you with all the complicated formulas, but by providing you with a thought-process of Bayesian statistics. The topic I chose for this post is baseball.