Overview
Machine Learning Techniques in Automatic Music Transcription: A Systematic Survey
Jamshidi, Fatemeh, Pike, Gary, Das, Amit, Chapman, Richard
ABSTRACT In the domain of Music Information Retrieval (MIR), Automatic Music Transcription (AMT) emerges as a central challenge, aiming to convert audio signals into symbolic Figure 1. This review critically evaluates both Loudness estimation and quantization, Instrument recognition, fully automatic and semi-automatic AMT systems, emphasizing Extraction of rhythmic information, Time quantization, the importance of minimal user intervention and examining Extraction of velocity and dynamic various methodologies proposed to date. By addressing Figure 1 (represented in [7]), illustrates the data representations the limitations of prior techniques and suggesting in an AMT system. AMT system takes an audio avenues for improvement, our objective is to steer future waveform as input, computes a time-frequency representation research towards fully automated AMT systems capable of the audio, outputs a representation of pitches of accurately and efficiently translating intricate audio signals over time in a spectrogram, and generates a typeset music into precise symbolic representations. Previous studies have tackled Automatic Music only synthesizes the latest advancements but also lays out a Transcription (AMT) using two main approaches: Nonnegative road-map for overcoming existing challenges in AMT, providing Matrix Factorization (NMF) [8], and Neural Networks valuable insights for researchers aiming to narrow (NNs) [9] [2].
Few-shot Knowledge Graph Relational Reasoning via Subgraph Adaptation
Liu, Haochen, Wang, Song, Chen, Chen, Li, Jundong
Few-shot Knowledge Graph (KG) Relational Reasoning aims to predict unseen triplets (i.e., query triplets) for rare relations in KGs, given only several triplets of these relations as references (i.e., support triplets). This task has gained significant traction due to the widespread use of knowledge graphs in various natural language processing applications. Previous approaches have utilized meta-training methods and manually constructed meta-relation sets to tackle this task. Recent efforts have focused on edge-mask-based methods, which exploit the structure of the contextualized graphs of target triplets (i.e., a subgraph containing relevant triplets in the KG). However, existing edge-mask-based methods have limitations in extracting insufficient information from KG and are highly influenced by spurious information in KG. To overcome these challenges, we propose SAFER (Subgraph Adaptation for Few-shot Relational Reasoning), a novel approach that effectively adapts the information in contextualized graphs to various subgraphs generated from support and query triplets to perform the prediction. Specifically, SAFER enables the extraction of more comprehensive information from support triplets while minimizing the impact of spurious information when predicting query triplets. Experimental results on three prevalent datasets demonstrate the superiority of our proposed framework SAFER.
ERASE: Benchmarking Feature Selection Methods for Deep Recommender Systems
Jia, Pengyue, Wang, Yejing, Du, Zhaocheng, Zhao, Xiangyu, Wang, Yichao, Chen, Bo, Wang, Wanyu, Guo, Huifeng, Tang, Ruiming
Deep Recommender Systems (DRS) are increasingly dependent on a large number of feature fields for more precise recommendations. Effective feature selection methods are consequently becoming critical for further enhancing the accuracy and optimizing storage efficiencies to align with the deployment demands. This research area, particularly in the context of DRS, is nascent and faces three core challenges. Firstly, variant experimental setups across research papers often yield unfair comparisons, obscuring practical insights. Secondly, the existing literature's lack of detailed analysis on selection attributes, based on large-scale datasets and a thorough comparison among selection techniques and DRS backbones, restricts the generalizability of findings and impedes deployment on DRS. Lastly, research often focuses on comparing the peak performance achievable by feature selection methods, an approach that is typically computationally infeasible for identifying the optimal hyperparameters and overlooks evaluating the robustness and stability of these methods. To bridge these gaps, this paper presents ERASE, a comprehensive bEnchmaRk for feAture SElection for DRS. ERASE comprises a thorough evaluation of eleven feature selection methods, covering both traditional and deep learning approaches, across four public datasets, private industrial datasets, and a real-world commercial platform, achieving significant enhancement. Our code is available online for ease of reproduction.
Learning with 3D rotations, a hitchhiker's guide to SO(3)
Geist, A. Renรฉ, Frey, Jonas, Zobro, Mikel, Levina, Anna, Martius, Georg
Many settings in machine learning require the selection of a rotation representation. However, choosing a suitable representation from the many available options is challenging. This paper acts as a survey and guide through rotation representations. We walk through their properties that harm or benefit deep learning with gradient-based optimization. By consolidating insights from rotation-based learning, we provide a comprehensive overview of learning functions with rotation representations. We provide guidance on selecting representations based on whether rotations are in the model's input or output and whether the data primarily comprises small angles.
Factual Confidence of LLMs: on Reliability and Robustness of Current Estimators
Mahaut, Matรฉo, Aina, Laura, Czarnowska, Paula, Hardalov, Momchil, Mรผller, Thomas, Mร rquez, Lluรญs
Large Language Models (LLMs) tend to be unreliable in the factuality of their answers. To address this problem, NLP researchers have proposed a range of techniques to estimate LLM's confidence over facts. However, due to the lack of a systematic comparison, it is not clear how the different methods compare to one another. To fill this gap, we present a survey and empirical comparison of estimators of factual confidence. We define an experimental framework allowing for fair comparison, covering both fact-verification and question answering. Our experiments across a series of LLMs indicate that trained hidden-state probes provide the most reliable confidence estimates, albeit at the expense of requiring access to weights and training data. We also conduct a deeper assessment of factual confidence by measuring the consistency of model behavior under meaning-preserving variations in the input. We find that the confidence of LLMs is often unstable across semantically equivalent inputs, suggesting that there is much room for improvement of the stability of models' parametric knowledge. Our code is available at (https://github.com/amazon-science/factual-confidence-of-llms).
Bridging the Gap in Drug Safety Data Analysis: Large Language Models for SQL Query Generation
Painter, Jeffery L., Chalamalasetti, Venkateswara Rao, Kassekert, Raymond, Bate, Andrew
Pharmacovigilance (PV) is essential for drug safety, primarily focusing on adverse event monitoring. Traditionally, accessing safety data required database expertise, limiting broader use. This paper introduces a novel application of Large Language Models (LLMs) to democratize database access for non-technical users. Utilizing OpenAI's GPT-4, we developed a chatbot that generates structured query language (SQL) queries from natural language, bridging the gap between domain knowledge and technical requirements. The proposed application aims for more inclusive and efficient data access, enhancing decision making in drug safety. By providing LLMs with plain language summaries of expert knowledge, our approach significantly improves query accuracy over methods relying solely on database schemas. The application of LLMs in this context not only optimizes PV data analysis, ensuring timely and precise drug safety reporting -- a crucial component in adverse drug reaction monitoring -- but also promotes safer pharmacological practices and informed decision making across various data intensive fields.
DRACO: Decentralized Asynchronous Federated Learning over Continuous Row-Stochastic Network Matrices
Jeong, Eunjeong, Kountouris, Marios
Recent advancements in machine learning, networked intelligent systems, and wireless connectivity have paved the way for various innovative applications and use cases across various sectors, including the Internet of Things (IoT), consumer robotics, autonomous transportation, and edge computing. These systems increasingly rely on decentralized learning architectures for processing data where generated, minimizing latency and bandwidth usage while enhancing privacy . However, these benefits come with significant challenges, particularly in terms of ensuring efficient and reliable communication and processing within inherently unstable and diverse network environments. Addressing these challenges requires novel approaches that adapt to the unique demands of decentralized architectures, fostering robust and expandable solutions for real-time data processing and learning. In this work, we consider the problem of communication efficiency in federated learning (FL) [1] and in particular in serverless (fully decentralized) learning settings that operate without a central coordinating server [2-6]. Asynchronous learning, empowering each participant to conduct local training and data transmission at their own pace, is a standard and relevant design choice in decentralized network schemes [7-12]. Asynchronous and decentralized learning have an advantage when used separately from each other, manifesting as adaptability to limited resources and downsized communication overhead. Y et unfortunately, when these two paradigms are combined, their integration poses a greater challenge in achieving a unanimous global consensus, as required for instance in the development of sophisticated navigation algorithms [13]. Decentralized optimization studies in the literature often involve high "synchronization costs" due to the complexity of ensuring consensus.
Media Forensics and Deepfake Systematic Survey
CH, Nadeem Jabbar, Saghir, Aqib, Meer, Ayaz Ahmad, Sahi, Salman Ahmad, Hassan, Bilal, Yasir, Siddiqui Muhammad
Deepfake is a generative deep learning algorithm that creates or changes facial features in a very realistic way making it hard to differentiate the real from the fake features It can be used to make movies look better as well as to spread false information by imitating famous people In this paper many different ways to make a Deepfake are explained analyzed and separated categorically Using Deepfake datasets models are trained and tested for reliability through experiments Deepfakes are a type of facial manipulation that allow people to change their entire faces identities attributes and expressions The trends in the available Deepfake datasets are also discussed with a focus on how they have changed Using Deep learning a general Deepfake detection model is made Moreover the problems in making and detecting Deepfakes are also mentioned As a result of this survey it is expected that the development of new Deepfake based imaging tools will speed up in the future This survey gives indepth review of methods for manipulating images of face and various techniques to spot altered face images Four types of facial manipulation are specifically discussed which are attribute manipulation expression swap entire face synthesis and identity swap Across every manipulation category we yield information on manipulation techniques significant benchmarks for technical evaluation of counterfeit detection techniques available public databases and a summary of the outcomes of all such analyses From all of the topics in the survey we focus on the most recent development of Deepfake showing its advances and obstacles in detecting fake images
Deep Learning-Based 3D Instance and Semantic Segmentation: A Review
Yasir, Siddiqui Muhammad, Ahn, Hyunsik
The process of segmenting point cloud data into several homogeneous areas with points in the same region having the same attributes is known as 3D segmentation. Segmentation is challenging with point cloud data due to substantial redundancy, fluctuating sample density and lack of apparent organization. The research area has a wide range of robotics applications, including intelligent vehicles, autonomous mapping and navigation. A number of researchers have introduced various methodologies and algorithms. Deep learning has been successfully used to a spectrum of 2D vision domains as a prevailing A.I. methods. However, due to the specific problems of processing point clouds with deep neural networks, deep learning on point clouds is still in its initial stages. This study examines many strategies that have been presented to 3D instance and semantic segmentation and gives a complete assessment of current developments in deep learning-based 3D segmentation. In these approaches benefits, draw backs, and design mechanisms are studied and addressed. This study evaluates the impact of various segmentation algorithms on competitiveness on various publicly accessible datasets, as well as the most often used pipelines, their advantages and limits, insightful findings and intriguing future research directions.
Recent advances in text embedding: A Comprehensive Review of Top-Performing Methods on the MTEB Benchmark
Text embedding methods have become increasingly popular in both industrial and academic fields due to their critical role in a variety of natural language processing tasks. The significance of universal text embeddings has been further highlighted with the rise of Large Language Models (LLMs) applications such as Retrieval-Augmented Systems (RAGs). While previous models have attempted to be general-purpose, they often struggle to generalize across tasks and domains. However, recent advancements in training data quantity, quality and diversity; synthetic data generation from LLMs as well as using LLMs as backbones encourage great improvements in pursuing universal text embeddings. In this paper, we provide an overview of the recent advances in universal text embedding models with a focus on the top performing text embeddings on Massive Text Embedding Benchmark (MTEB). Through detailed comparison and analysis, we highlight the key contributions and limitations in this area, and propose potentially inspiring future research directions.