remix
Uirapuru: Timely Video Analytics for High-Resolution Steerable Cameras on Edge Devices
Apostolo, Guilherme H., Bauszat, Pablo, Nigade, Vinod, Bal, Henri E., Wang, Lin
Real-time video analytics on high-resolution cameras has become a popular technology for various intelligent services like traffic control and crowd monitoring. While extensive work has been done on improving analytics accuracy with timing guarantees, virtually all of them target static viewpoint cameras. In this paper, we present Uirapuru, a novel framework for real-time, edge-based video analytics on high-resolution steerable cameras. The actuation performed by those cameras brings significant dynamism to the scene, presenting a critical challenge to existing popular approaches such as frame tiling. To address this problem, Uirapuru incorporates a comprehensive understanding of camera actuation into the system design paired with fast adaptive tiling at a per-frame level. We evaluate Uirapuru on a high-resolution video dataset, augmented by pan-tilt-zoom (PTZ) movements typical for steerable cameras and on real-world videos collected from an actual PTZ camera. Our experimental results show that Uirapuru provides up to 1.45x improvement in accuracy while respecting specified latency budgets or reaches up to 4.53x inference speedup with on-par accuracy compared to state-of-the-art static camera approaches.
- Asia > China > Hong Kong (0.05)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Continual Memorization of Factoids in Large Language Models
Chen, Howard, Geng, Jiayi, Bhaskar, Adithya, Friedman, Dan, Chen, Danqi
Large language models can absorb a massive amount of knowledge through pretraining, but pretraining is inefficient for acquiring long-tailed or specialized facts. Therefore, fine-tuning on specialized or new knowledge that reflects changes in the world has become popular, though it risks disrupting the model's original capabilities. We study this fragility in the context of continual memorization, where the model is trained on a small set of long-tail factoids (factual associations) and must retain these factoids after multiple stages of subsequent training on other datasets. Through extensive experiments, we show that LLMs suffer from forgetting across a wide range of subsequent tasks, and simple replay techniques do not fully prevent forgetting, especially when the factoid datasets are trained in the later stages. We posit that there are two ways to alleviate forgetting: 1) protect the memorization process as the model learns the factoids, or 2) reduce interference from training in later stages. With this insight, we develop an effective mitigation strategy: REMIX (Random and Generic Data Mixing). REMIX prevents forgetting by mixing generic data sampled from pretraining corpora or even randomly generated word sequences during each stage, despite being unrelated to the memorized factoids in the first stage. REMIX can recover performance from severe forgetting, often outperforming replay-based methods that have access to the factoids from the first stage. We then analyze how REMIX alters the learning process and find that successful forgetting prevention is associated with a pattern: the model stores factoids in earlier layers than usual and diversifies the set of layers that store these factoids. The efficacy of REMIX invites further investigation into the underlying dynamics of memorization and forgetting, opening exciting possibilities for future research.
- North America > United States > Kansas (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
ReMix: Training Generalized Person Re-identification on a Mixture of Data
Mamedov, Timur, Konushin, Anton, Konushin, Vadim
Modern person re-identification (Re-ID) methods have a weak generalization ability and experience a major accuracy drop when capturing environments change. This is because existing multi-camera Re-ID datasets are limited in size and diversity, since such data is difficult to obtain. At the same time, enormous volumes of unlabeled single-camera records are available. Such data can be easily collected, and therefore, it is more diverse. Currently, single-camera data is used only for self-supervised pre-training of Re-ID methods. However, the diversity of single-camera data is suppressed by fine-tuning on limited multi-camera data after pre-training. In this paper, we propose ReMix, a generalized Re-ID method jointly trained on a mixture of limited labeled multi-camera and large unlabeled single-camera data. Effective training of our method is achieved through a novel data sampling strategy and new loss functions that are adapted for joint use with both types of data. Experiments show that ReMix has a high generalization ability and outperforms state-of-the-art methods in generalizable person Re-ID. To the best of our knowledge, this is the first work that explores joint training on a mixture of multi-camera and single-camera data in person Re-ID.
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- Asia > Russia (0.04)
The first Cadenza challenges: using machine learning competitions to improve music for listeners with a hearing loss
Dabike, Gerardo Roa, Akeroyd, Michael A., Bannister, Scott, Barker, Jon P., Cox, Trevor J., Fazenda, Bruno, Firth, Jennifer, Graetzer, Simone, Greasley, Alinka, Vos, Rebecca R., Whitmer, William M.
It is well established that listening to music is an issue for those with hearing loss, and hearing aids are not a universal solution. How can machine learning be used to address this? This paper details the first application of the open challenge methodology to use machine learning to improve audio quality of music for those with hearing loss. The first challenge was a stand-alone competition (CAD1) and had 9 entrants. The second was an 2024 ICASSP grand challenge (ICASSP24) and attracted 17 entrants. The challenge tasks concerned demixing and remixing pop/rock music to allow a personalised rebalancing of the instruments in the mix, along with amplification to correct for raised hearing thresholds. The software baselines provided for entrants to build upon used two state-of-the-art demix algorithms: Hybrid Demucs and Open-Unmix. Evaluation of systems was done using the objective metric HAAQI, the Hearing-Aid Audio Quality Index. No entrants improved on the best baseline in CAD1 because there was insufficient room for improvement. Consequently, for ICASSP24 the scenario was made more difficult by using loudspeaker reproduction and specified gains to be applied before remixing. This also made the scenario more useful for listening through hearing aids. 9 entrants scored better than the the best ICASSP24 baseline. Most entrants used a refined version of Hybrid Demucs and NAL-R amplification. The highest scoring system combined the outputs of several demixing algorithms in an ensemble approach. These challenges are now open benchmarks for future research with the software and data being freely available.
- Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.14)
- North America > United States > Rhode Island (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (8 more...)
- Research Report > New Finding (0.94)
- Research Report > Experimental Study (0.68)
No, Drake's Cover of 'Hey There Delilah' Isn't AI
As if he didn't have enough to deal with amid his beef with Kendrick Lamar (or perhaps to distract from it), Drake showed up on a remix of parody rapper Snowd4y's cover of Plain White T's "Hey There Delilah," called "Wah Gwan Delilah," that has everyone … perplexed? Let's walk through this together, it's a mess. It had what appeared to be Drake joining the comedian in a series of quips about women and name-checks of Toronto landmarks like the Yonge-Dundas Square mall. As the track spread, it made its way to the Plain White T's themselves, who posted a video on X and TikTok with the caption "too stunned to speak." Frontman Tom Higgenson also says "it's crazy that everybody thinks that it's real," seemingly referencing early rumors that Drake's lyrics were generated using artificial intelligence.
- Media > Music (0.74)
- Leisure & Entertainment (0.74)
Music Enhancement with Deep Filters: A Technical Report for The ICASSP 2024 Cadenza Challenge
Shao, Keren, Chen, Ke, Dubnov, Shlomo
In this challenge, we disentangle the deep filters from the original DeepfilterNet and incorporate them into our Spec-UNet-based network to further improve a hybrid Demucs (hdemucs) based remixing pipeline. The motivation behind the use of the deep filter component lies at its potential in better handling temporal fine structures. We demonstrate an incremental improvement in both the Signal-to-Distortion Ratio (SDR) and the Hearing Aid Audio Quality Index (HAAQI) metrics when comparing the performance of hdemucs against different versions of our model.
From SMOTE to Mixup for Deep Imbalanced Classification
Cheng, Wei-Chao, Mai, Tan-Ha, Lin, Hsuan-Tien
Given imbalanced data, it is hard to train a good classifier using deep learning because of the poor generalization of minority classes. Traditionally, the well-known synthetic minority oversampling technique (SMOTE) for data augmentation, a data mining approach for imbalanced learning, has been used to improve this generalization. However, it is unclear whether SMOTE also benefits deep learning. In this work, we study why the original SMOTE is insufficient for deep learning, and enhance SMOTE using soft labels. Connecting the resulting soft SMOTE with Mixup, a modern data augmentation technique, leads to a unified framework that puts traditional and modern data augmentation techniques under the same umbrella. A careful study within this framework shows that Mixup improves generalization by implicitly achieving uneven margins between majority and minority classes. We then propose a novel margin-aware Mixup technique that more explicitly achieves uneven margins. Extensive experimental results demonstrate that our proposed technique yields state-of-the-art performance on deep imbalanced classification while achieving superior performance on extremely imbalanced data. The code is open-sourced in our developed package https://github.com/ntucllab/imbalanced-DL to foster future research in this direction.
- North America > Canada > Ontario > Toronto (0.14)
- Asia > Taiwan (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > France (0.04)
Mo\^usai: Text-to-Music Generation with Long-Context Latent Diffusion
Schneider, Flavio, Kamal, Ojasv, Jin, Zhijing, Schölkopf, Bernhard
Recent years have seen the rapid development of large generative models for text; however, much less research has explored the connection between text and another "language" of communication -- music. Music, much like text, can convey emotions, stories, and ideas, and has its own unique structure and syntax. In our work, we bridge text and music via a text-to-music generation model that is highly efficient, expressive, and can handle long-term structure. Specifically, we develop Mo\^usai, a cascading two-stage latent diffusion model that can generate multiple minutes of high-quality stereo music at 48kHz from textual descriptions. Moreover, our model features high efficiency, which enables real-time inference on a single consumer GPU with a reasonable speed. Through experiments and property analyses, we show our model's competence over a variety of criteria compared with existing music generation models. Lastly, to promote the open-source culture, we provide a collection of open-source libraries with the hope of facilitating future work in the field. We open-source the following: Codes: https://github.com/archinetai/audio-diffusion-pytorch; music samples for this paper: http://bit.ly/44ozWDH; all music samples for all models: https://bit.ly/audio-diffusion.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Europe > Austria (0.04)
- (13 more...)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
Semi-supervised Relation Extraction via Data Augmentation and Consistency-training
Due to the semantic complexity of the Relation extraction (RE) task, obtaining high-quality human labelled data is an expensive and noisy process. To improve the sample efficiency of the models, semi-supervised learning (SSL) methods aim to leverage unlabelled data in addition to learning from limited labelled data points. Recently, strong data augmentation combined with consistency-based semi-supervised learning methods have advanced the state of the art in several SSL tasks. However, adapting these methods to the RE task has been challenging due to the difficulty of data augmentation for RE. In this work, we leverage the recent advances in controlled text generation to perform high quality data augmentation for the RE task. We further introduce small but significant changes to model architecture that allows for generation of more training data by interpolating different data points in their latent space. These data augmentations along with consistency training result in very competitive results for semi-supervised relation extraction on four benchmark datasets.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Dominican Republic (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
- (6 more...)
Towards Understanding How Data Augmentation Works with Imbalanced Data
Dablain, Damien A., Chawla, Nitesh V.
Data augmentation forms the cornerstone of many modern machine learning training pipelines; yet, the mechanisms by which it works are not clearly understood. Much of the research on data augmentation (DA) has focused on improving existing techniques, examining its regularization effects in the context of neural network over-fitting, or investigating its impact on features. Here, we undertake a holistic examination of the effect of DA on three different classifiers, convolutional neural networks, support vector machines, and logistic regression models, which are commonly used in supervised classification of imbalanced data. We support our examination with testing on three image and five tabular datasets. Our research indicates that DA, when applied to imbalanced data, produces substantial changes in model weights, support vectors and feature selection; even though it may only yield relatively modest changes to global metrics, such as balanced accuracy or F1 measure. We hypothesize that DA works by facilitating variances in data, so that machine learning models can associate changes in the data with labels. By diversifying the range of feature amplitudes that a model must recognize to predict a label, DA improves a model's capacity to generalize when learning with imbalanced data.
- North America > United States > Nevada > Clark County > Las Vegas (0.04)
- North America > United States > Indiana > St. Joseph County > Notre Dame (0.04)
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)