bbe
ExHuBERT: Enhancing HuBERT Through Block Extension and Fine-Tuning on 37 Emotion Datasets
Amiriparian, Shahin, Packań, Filip, Gerczuk, Maurice, Schuller, Björn W.
Foundation models have shown great promise in speech emotion recognition (SER) by leveraging their pre-trained representations to capture emotion patterns in speech signals. To further enhance SER performance across various languages and domains, we propose a novel twofold approach. First, we gather EmoSet++, a comprehensive multi-lingual, multi-cultural speech emotion corpus with 37 datasets, 150,907 samples, and a total duration of 119.5 hours. Second, we introduce ExHuBERT, an enhanced version of HuBERT achieved by backbone extension and fine-tuning on EmoSet++. We duplicate each encoder layer and its weights, then freeze the first duplicate, integrating an extra zero-initialized linear layer and skip connections to preserve functionality and ensure its adaptability for subsequent fine-tuning. Our evaluation on unseen datasets shows the efficacy of ExHuBERT, setting a new benchmark for various SER tasks. Model and details on EmoSet++: https://huggingface.co/amiriparian/ExHuBERT.
XGBoost Learning of Dynamic Wager Placement for In-Play Betting on an Agent-Based Model of a Sports Betting Exchange
We present first results from the use of XGBoost, a highly effective machine learning (ML) method, within the Bristol Betting Exchange (BBE), an open-source agent-based model (ABM) designed to simulate a contemporary sports-betting exchange with in-play betting during track-racing events such as horse races. We use the BBE ABM and its array of minimally-simple bettor-agents as a synthetic data generator which feeds into our XGBoost ML system, with the intention that XGBoost discovers profitable dynamic betting strategies by learning from the more profitable bets made by the BBE bettor-agents. After this XGBoost training, which results in one or more decision trees, a bettor-agent with a betting strategy determined by the XGBoost-learned decision tree(s) is added to the BBE ABM and made to bet on a sequence of races under various conditions and betting-market scenarios, with profitability serving as the primary metric of comparison and evaluation. Our initial findings presented here show that XGBoost trained in this way can indeed learn profitable betting strategies, and can generalise to learn strategies that outperform each of the set of strategies used for creation of the training data. To foster further research and enhancements, the complete version of our extended BBE, including the XGBoost integration, has been made freely available as an open-source release on GitHub.
Mixture Proportion Estimation and PU Learning: A Modern Approach
Garg, Saurabh, Wu, Yifan, Smola, Alex, Balakrishnan, Sivaraman, Lipton, Zachary C.
Given only positive examples and unlabeled examples (from both positive and negative classes), we might hope nevertheless to estimate an accurate positive-versus-negative classifier. Formally, this task is broken down into two subtasks: (i) Mixture Proportion Estimation (MPE) -- determining the fraction of positive examples in the unlabeled data; and (ii) PU-learning -- given such an estimate, learning the desired positive-versus-negative classifier. Unfortunately, classical methods for both problems break down in high-dimensional settings. Meanwhile, recently proposed heuristics lack theoretical coherence and depend precariously on hyperparameter tuning. In this paper, we propose two simple techniques: Best Bin Estimation (BBE) (for MPE); and Conditional Value Ignoring Risk (CVIR), a simple objective for PU-learning. Both methods dominate previous approaches empirically, and for BBE, we establish formal guarantees that hold whenever we can train a model to cleanly separate out a small subset of positive examples. Our final algorithm (TED)$^n$, alternates between the two procedures, significantly improving both our mixture proportion estimator and classifier
BBE: Simulating the Microstructural Dynamics of an In-Play Betting Exchange via Agent-Based Modelling
I describe the rationale for, and design of, an agent-based simulation model of a contemporary online sports-betting exchange: such exchanges, closely related to the exchange mechanisms at the heart of major financial markets, have revolutionized the gambling industry in the past 20 years, but gathering sufficiently large quantities of rich and temporally high-resolution data from real exchanges - i.e., the sort of data that is needed in large quantities for Deep Learning - is often very expensive, and sometimes simply impossible; this creates a need for a plausibly realistic synthetic data generator, which is what this simulation now provides. The simulator, named the "Bristol Betting Exchange" (BBE), is intended as a common platform, a data-source and experimental test-bed, for researchers studying the application of AI and machine learning (ML) techniques to issues arising in betting exchanges; and, as far as I have been able to determine, BBE is the first of its kind: a free open-source agent-based simulation model consisting not only of a sports-betting exchange, but also a minimal simulation model of racetrack sporting events (e.g., horse-races or car-races) about which bets may be made, and a population of simulated bettors who each form their own private evaluation of odds and place bets on the exchange before and - crucially - during the race itself (i.e., so-called "in-play" betting) and whose betting opinions change second-by-second as each race event unfolds. BBE is offered as a proof-of-concept system that enables the generation of large high-resolution data-sets for automated discovery or improvement of profitable strategies for betting on sporting events via the application of AI/ML and advanced data analytics techniques. This paper offers an extensive survey of relevant literature and explains the motivation and design of BBE, and presents brief illustrative results.