AITopics | time axis

Collaborating Authors

time axis

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

about real-world experiments and deep density models and then answer detailed comments and questions

Neural Information Processing SystemsAug-16-2025, 23:40:19 GMT

We thank the reviewers for their very helpful comments and suggestions. The 100% recall but very poor precision (i.e., it always predicts a shift) is expected Marginal-KS is very bad because the attack model is very strong, i.e., it mimics the marginal distribution of Thus, marginal KS will naturally fail--highlighting the limitation of prior work for this adversarial attack. F or bootstrapping, does the model need to be fit multiple times? For Gaussian, this is fairly simple. For the detection stage, the FDR was controlled below 0.05 in all See also Table 6 and 7 in appendix.

artificial intelligence, density model, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.37)

Add feedback

ROSAnnotator: A Web Application for ROSBag Data Analysis in Human-Robot Interaction

Zhang, Yan, Li, Haoqi, Tabatabaei, Ramtin, Johal, Wafa

arXiv.org Artificial IntelligenceJan-12-2025

Human-robot interaction (HRI) is an interdisciplinary field that utilises both quantitative and qualitative methods. While ROSBags, a file format within the Robot Operating System (ROS), offer an efficient means of collecting temporally synched multimodal data in empirical studies with real robots, there is a lack of tools specifically designed to integrate qualitative coding and analysis functions with ROSBags. To address this gap, we developed ROSAnnotator, a web-based application that incorporates a multimodal Large Language Model (LLM) to support both manual and automated annotation of ROSBag data. ROSAnnotator currently facilitates video, audio, and transcription annotations and provides an open interface for custom ROS messages and tools. By using ROSAnnotator, researchers can streamline the qualitative analysis process, create a more cohesive analysis pipeline, and quickly access statistical summaries of annotations, thereby enhancing the overall efficiency of HRI data analysis. https://github.com/CHRI-Lab/ROSAnnotator

annotation, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2501.07051

Country: Oceania > Australia (0.16)

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.63)

Add feedback

TripCast: Pre-training of Masked 2D Transformers for Trip Time Series Forecasting

Liao, Yuhua, Wang, Zetian, Wei, Peng, Nie, Qiangqiang, Zhang, Zhenhua

arXiv.org Artificial IntelligenceOct-24-2024

Deep learning and pre-trained models have shown great success in time series forecasting. However, in the tourism industry, time series data often exhibit a leading time property, presenting a 2D structure. This introduces unique challenges for forecasting in this sector. In this study, we propose a novel modelling paradigm, TripCast, which treats trip time series as 2D data and learns representations through masking and reconstruction processes. Pre-trained on large-scale real-world data, TripCast notably outperforms other state-of-the-art baselines in in-domain forecasting scenarios and demonstrates strong scalability and transferability in out-domain forecasting scenarios.

forecasting, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2410.18612

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Consumer Products & Services > Travel (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)

Add feedback

Improving Real-Time Music Accompaniment Separation with MMDenseNet

Wang, Chun-Hsiang, Wang, Chung-Che, Wang, Jun-You, Jang, Jyh-Shing Roger, Chu, Yen-Hsun

arXiv.org Artificial IntelligenceJun-30-2024

Music source separation aims to separate polyphonic music into different types of sources. Most existing methods focus on enhancing the quality of separated results by using a larger model structure, rendering them unsuitable for deployment on edge devices. Moreover, these methods may produce low-quality output when the input duration is short, making them impractical for real-time applications. Therefore, the goal of this paper is to enhance a lightweight model, MMDenstNet, to strike a balance between separation quality and latency for real-time applications. Different directions of improvement are explored or proposed in this paper, including complex ideal ratio mask, self-attention, band-merge-split method, and feature look back. Source-to-distortion ratio, real-time factor, and optimal latency are employed to evaluate the performance. To align with our application requirements, the evaluation process in this paper focuses on the separation performance of the accompaniment part. Experimental results demonstrate that our improvement achieves low real-time factor and optimal latency while maintaining acceptable separation quality.

feature look back, latency, mmdensenet, (15 more...)

arXiv.org Artificial Intelligence

2407.00657

Country:

Asia > Taiwan (0.04)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Architecture > Real Time Systems (1.00)

Add feedback

Spectral Mapping of Singing Voices: U-Net-Assisted Vocal Segmentation

Sorrenti, Adam

arXiv.org Artificial IntelligenceMay-30-2024

Separating vocal elements from musical tracks is a longstanding challenge in audio signal processing. This study tackles the distinct separation of vocal components from musical spectrograms. We employ the Short Time Fourier Transform (STFT) to extract audio waves into detailed frequency-time spectrograms, utilizing the benchmark MUSDB18 dataset for music separation. Subsequently, we implement a UNet neural network to segment the spectrogram image, aiming to delineate and extract singing voice components accurately. We achieved noteworthy results in audio source separation using of our U-Net-based models. The combination of frequency-axis normalization with Min/Max scaling and the Mean Absolute Error (MAE) loss function achieved the highest Source-to-Distortion Ratio (SDR) of 7.1 dB, indicating a high level of accuracy in preserving the quality of the original signal during separation. This setup also recorded impressive Source-to-Interference Ratio (SIR) and Source-to-Artifact Ratio (SAR) scores of 25.2 dB and 7.2 dB, respectively. These values significantly outperformed other configurations, particularly those using Quantile-based normalization or a Mean Squared Error (MSE) loss function. Our source code, model weights, and demo material can be found at the project's GitHub repository: https://github.com/mbrotos/SoundSeg

separation, source separation, spectrogram, (12 more...)

arXiv.org Artificial Intelligence

2405.20059

Country: North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

Add feedback

Timeline-based Process Discovery

Kaur, Harleen, Mendling, Jan, Rubensson, Christoffer, Kampik, Timotheus

arXiv.org Artificial IntelligenceDec-21-2023

A key concern of automatic process discovery is to provide insights into performance aspects of business processes. Waiting times are of particular importance in this context. For that reason, it is surprising that current techniques for automatic process discovery generate directly-follows graphs and comparable process models, but often miss the opportunity to explicitly represent the time axis. In this paper, we present an approach for automatically constructing process models that explicitly align with a time axis. We exemplify our approach for directly-follows graphs. Our evaluation using two BPIC datasets and a proprietary dataset highlight the benefits of this representation in comparison to standard layout techniques.

event log, process model, time axis, (14 more...)

arXiv.org Artificial Intelligence

2401.04114

Country: Europe > Germany > Berlin (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

V2CE: Video to Continuous Events Simulator

Zhang, Zhongyang, Cui, Shuyang, Chai, Kaidong, Yu, Haowen, Dasgupta, Subhasis, Mahbub, Upal, Rahman, Tauhidur

arXiv.org Artificial IntelligenceSep-16-2023

Dynamic Vision Sensor (DVS)-based solutions have recently garnered significant interest across various computer vision tasks, offering notable benefits in terms of dynamic range, temporal resolution, and inference speed. However, as a relatively nascent vision sensor compared to Active Pixel Sensor (APS) devices such as RGB cameras, DVS suffers from a dearth of ample labeled datasets. Prior efforts to convert APS data into events often grapple with issues such as a considerable domain shift from real events, the absence of quantified validation, and layering problems within the time axis. In this paper, we present a novel method for video-to-events stream conversion from multiple perspectives, considering the specific characteristics of DVS. A series of carefully designed losses helps enhance the quality of generated event voxels significantly. We also propose a novel local dynamic-aware timestamp inference strategy to accurately recover event timestamps from event voxels in a continuous fashion and eliminate the temporal layering problem. Results from rigorous validation through quantified metrics at all stages of the pipeline establish our method unquestionably as the current state-of-the-art (SOTA).

dataset, event voxel, voxel, (15 more...)

arXiv.org Artificial Intelligence

2309.08891

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > California > San Diego County > San Diego (0.05)
North America > United States > California > Santa Clara County > San Jose (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Add feedback

Automatic Piano Transcription with Hierarchical Frequency-Time Transformer

Toyama, Keisuke, Akama, Taketo, Ikemiya, Yukara, Takida, Yuhta, Liao, Wei-Hsiang, Mitsufuji, Yuki

arXiv.org Artificial IntelligenceJul-9-2023

Taking long-term spectral and temporal dependencies into account is essential for automatic piano transcription. This is especially helpful when determining the precise onset and offset for each note in the polyphonic piano content. In this case, we may rely on the capability of self-attention mechanism in Transformers to capture these long-term dependencies in the frequency and time axes. In this work, we propose hFT-Transformer, which is an automatic music transcription method that uses a two-level hierarchical frequency-time Transformer architecture. The first hierarchy includes a convolutional block in the time axis, a Transformer encoder in the frequency axis, and a Transformer decoder that converts the dimension in the frequency axis. The output is then fed into the second hierarchy which consists of another Transformer encoder in the time axis. We evaluated our method with the widely used MAPS and MAESTRO v3.0.0 datasets, and it demonstrated state-of-the-art performance on all the F1-scores of the metrics among Frame, Note, Note with Offset, and Note with Offset and Velocity estimations.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2307.04305

Country:

Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.05)
Europe > Italy (0.04)

Genre: Research Report (0.82)

Industry:

Media > Music (0.94)
Leisure & Entertainment (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

fMRI Multiple Missing Values Imputation Regularized by a Recurrent Denoiser

Calhas, David, Henriques, Rui

arXiv.org Machine LearningSep-26-2020

Functional Magnetic Resonance Imaging (fMRI) is a neuroimaging technique with pivotal importance due to its scientific and clinical applications. As with any widely used imaging modality, there is a need to ensure the quality of the same, with missing values being highly frequent due to the presence of artifacts or sub-optimal imaging resolutions. Our work focus on missing values imputation on multivariate signal data. To do so, a new imputation method is proposed consisting on two major steps: spatial-dependent signal imputation and time-dependent regularization of the imputed signal. A novel layer, to be used in deep learning architectures, is proposed in this work, bringing back the concept of chained equations for multiple imputation. Finally, a recurrent layer is applied to tune the signal, such that it captures its true patterns. Both operations yield an improved robustness against state-of-the-art alternatives.

data mining, imputation, machine learning, (17 more...)

arXiv.org Machine Learning

2009.12602

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.66)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Mel-spectrogram augmentation for sequence to sequence voice conversion

Hwang, Yeongtae, Cho, Hyemin, Yang, Hongsun, Oh, Insoo, Lee, Seong-Whan

arXiv.org Machine LearningJan-6-2020

When training the sequence-to-sequence voice conversion model, we need to handle an issue of insufficient data about the number of speech tuples which consist of the same utterance. This study experimentally investigated the effects of Mel-spectrogram augmentation on the sequence-to-sequence voice conversion model. For Mel-spectrogram augmentation, we adopted the policies proposed in SpecAugment. In addition, we propose new policies for more data variations. To find the optimal hyperparameters of augmentation policies for voice conversion, we experimented based on the new metric, namely deformation per deteriorating ratio. We observed the effect of these through experiments based on various sizes of training set and combinations of augmentation policy. In the experimental results, the time axis warping based policies showed better performance than other policies.

augmentation policy, experiment, mel-spectrogram augmentation, (12 more...)

arXiv.org Machine Learning

2001.01401

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.70)

Add feedback