AITopics | segment boundary

Collaborating Authors

segment boundary

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Adaptive Segmentation of EEG for Machine Learning Applications

Zhou, Johnson, West, Joseph, Ehinger, Krista A., Ren, Zhenming, John, Sam E., Grayden, David B.

arXiv.org Artificial IntelligenceAug-29-2025

Objective. Electroencephalography (EEG) data is derived by sampling continuous neurological time series signals. In order to prepare EEG signals for machine learning, the signal must be divided into manageable segments. The current naive approach uses arbitrary fixed time slices, which may have limited biological relevance because brain states are not confined to fixed intervals. We investigate whether adaptive segmentation methods are beneficial for machine learning EEG analysis. Approach. We introduce a novel adaptive segmentation method, CTXSEG, that creates variable-length segments based on statistical differences in the EEG data and propose ways to use them with modern machine learning approaches that typically require fixed-length input. We assess CTXSEG using controllable synthetic data generated by our novel signal generator CTXGEN. While our CTXSEG method has general utility, we validate it on a real-world use case by applying it to an EEG seizure detection problem. We compare the performance of CTXSEG with fixed-length segmentation in the preprocessing step of a typical EEG machine learning pipeline for seizure detection. Main results. We found that using CTXSEG to prepare EEG data improves seizure detection performance compared to fixed-length approaches when evaluated using a standardized framework, without modifying the machine learning method, and requires fewer segments. Significance. This work demonstrates that adaptive segmentation with CTXSEG can be readily applied to modern machine learning approaches, with potential to improve performance. It is a promising alternative to fixed-length segmentation for signal preprocessing and should be considered as part of the standard preprocessing repertoire in EEG machine learning applications.

artificial intelligence, machine learning, segmentation, (18 more...)

arXiv.org Artificial Intelligence

2508.20336

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.68)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.68)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Estimating Musical Surprisal from Audio in Autoregressive Diffusion Model Noise Spaces

Bjare, Mathias Rose, Lattner, Stefan, Widmer, Gerhard

arXiv.org Artificial IntelligenceAug-8-2025

Recently, the information content (IC) of predictions from a Generative Infinite-Vocabulary Transformer (GIVT) has been used to model musical expectancy and surprisal in audio. We investigate the effectiveness of such modelling using IC calculated with autoregressive diffusion models (ADMs). We empirically show that IC estimates of models based on two different diffusion ordinary differential equations (ODEs) describe diverse data better, in terms of negative log-likelihood, than a GIVT. We evaluate diffusion model IC's effectiveness in capturing surprisal aspects by examining two tasks: (1) capturing monophonic pitch surprisal, and (2) detecting segment boundaries in multi-track audio. In both tasks, the diffusion models match or exceed the performance of a GIVT. We hypothesize that the surprisal estimated at different diffusion process noise levels corresponds to the surprisal of music and audio features present at different audio granularities. Testing our hypothesis, we find that, for appropriate noise levels, the studied musical surprisal tasks' results improve. Code is provided on github.com/SonyCSLParis/audioic.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2508.05306

Country: Europe > United Kingdom > England (0.46)

Genre: Research Report (0.50)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Enhancing Retrieval Augmented Generation with Hierarchical Text Segmentation Chunking

Nguyen, Hai Toan, Nguyen, Tien Dat, Nguyen, Viet Ha

arXiv.org Artificial IntelligenceJul-15-2025

Retrieval-Augmented Generation (RAG) systems commonly use chunking strategies for retrieval, which enhance large language models (LLMs) by enabling them to access external knowledge, ensuring that the retrieved information is up-to-date and domain-specific. However, traditional methods often fail to create chunks that capture sufficient semantic meaning, as they do not account for the underlying textual structure. This paper proposes a novel framework that enhances RAG by integrating hierarchical text segmentation and clustering to generate more meaningful and semantically coherent chunks. During inference, the framework retrieves information by leveraging both segment-level and cluster-level vector representations, thereby increasing the likelihood of retrieving more precise and contextually relevant information. Evaluations on the NarrativeQA, QuALITY, and QASPER datasets indicate that the proposed method achieved improved results compared to traditional chunking techniques.

computational linguistic, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2507.09935

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Bronchovascular Tree-Guided Weakly Supervised Learning Method for Pulmonary Segment Segmentation

Zhao, Ruijie, Tan, Zuopeng, Xue, Xiao, Zhao, Longfei, Li, Bing, Liao, Zicheng, Ming, Ying, Wang, Jiaru, Xiao, Ran, Piao, Sirong, Zhao, Rui, Xu, Qiqi, Song, Wei

arXiv.org Artificial IntelligenceMay-21-2025

Pulmonary segment segmentation is crucial for cancer localization and surgical planning. However, the pixel-wise annotation of pulmonary segments is laborious, as the boundaries between segments are indistinguishable in medical images. To this end, we propose a weakly supervised learning (WSL) method, termed Anatomy-Hierarchy Supervised Learning (AHSL), which consults the precise clinical anatomical definition of pulmonary segments to perform pulmonary segment segmentation. Since pulmonary segments reside within the lobes and are determined by the bronchovascular tree, i.e., artery, airway and vein, the design of the loss function is founded on two principles. First, segment-level labels are utilized to directly supervise the output of the pulmonary segments, ensuring that they accurately encompass the appropriate bronchovascular tree. Second, lobe-level supervision indirectly oversees the pulmonary segment, ensuring their inclusion within the corresponding lobe. Besides, we introduce a two-stage segmentation strategy that incorporates bronchovascular priori information. Furthermore, a consistency loss is proposed to enhance the smoothness of segment boundaries, along with an evaluation metric designed to measure the smoothness of pulmonary segment boundaries. Visual inspection and evaluation metrics from experiments conducted on a private dataset demonstrate the effectiveness of our method.

artificial intelligence, inductive learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2505.13911

Country: Asia > China (0.29)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

OCR Error Post-Correction with LLMs in Historical Documents: No Free Lunches

Kanerva, Jenna, Ledins, Cassandra, Käpyaho, Siiri, Ginter, Filip

arXiv.org Artificial IntelligenceFeb-3-2025

Optical Character Recognition (OCR) systems often introduce errors when transcribing historical documents, leaving room for post-correction to improve text quality. This study evaluates the use of open-weight LLMs for OCR error correction in historical English and Finnish datasets. We explore various strategies, including parameter optimization, quantization, segment length effects, and text continuation methods. Our results demonstrate that while modern LLMs show promise in reducing character error rates (CER) in English, a practically useful performance for Finnish was not reached. Our findings highlight the potential and limitations of LLMs in scaling OCR post-correction for large historical corpora.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.01205

Country:

Europe > Switzerland (0.04)
North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

Multi-Modal Video Topic Segmentation with Dual-Contrastive Domain Adaptation

Xing, Linzi, Tran, Quan, Caba, Fabian, Dernoncourt, Franck, Yoon, Seunghyun, Wang, Zhaowen, Bui, Trung, Carenini, Giuseppe

arXiv.org Artificial IntelligenceNov-30-2023

Video topic segmentation unveils the coarse-grained semantic structure underlying videos and is essential for other video understanding tasks. Given the recent surge in multi-modal, relying solely on a single modality is arguably insufficient. On the other hand, prior solutions for similar tasks like video scene/shot segmentation cater to short videos with clear visual shifts but falter for long videos with subtle changes, such as livestreams. In this paper, we introduce a multi-modal video topic segmenter that utilizes both video transcripts and frames, bolstered by a cross-modal attention mechanism. Furthermore, we propose a dual-contrastive learning framework adhering to the unsupervised domain adaptation paradigm, enhancing our model's adaptability to longer, more semantically complex videos. Experiments on short and long video corpora demonstrate that our proposed solution, significantly surpasses baseline methods in terms of both accuracy and transferability, in both intra- and cross-domain settings.

proceedings, segmentation, video, (15 more...)

arXiv.org Artificial Intelligence

2312.0022

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report (0.82)

Industry:

Media (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Vision (0.89)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)

Add feedback

Structural Segmentation and Labeling of Tabla Solo Performances

R, Gowriprasad, Aravind, R, Murthy, Hema A

arXiv.org Artificial IntelligenceNov-16-2022

Tabla is a North Indian percussion instrument used as an accompaniment and an exclusive instrument for solo performances. Tabla solo is intricate and elaborate, exhibiting rhythmic evolution through a sequence of homogeneous sections marked by shared rhythmic characteristics. Each section has a specific structure and name associated with it. Tabla learning and performance in the Indian subcontinent is based on stylistic schools called gharana-s. Several compositions by various composers from different gharana-s are played in each section. This paper addresses the task of segmenting the tabla solo concert into musically meaningful sections. We then assign suitable section labels and recognize gharana-s from the sections. We present a diverse collection of over 38 hours of solo tabla recordings for the task. We motivate the problem and present different challenges and facets of the tasks. Inspired by the distinct musical properties of tabla solo, we compute several rhythmic and timbral features for the segmentation task. This work explores the approach of automatically locating the significant changes in the rhythmic structure by analyzing local self-similarity in an unsupervised manner. We also explore supervised random forest and a convolutional neural network trained on hand-crafted features. Both supervised and unsupervised approaches are also tested on a set of held-out recordings. Segmentation of an audio piece into its structural components and labeling is crucial to many music information retrieval applications like repetitive structure finding, audio summarization, and fast music navigation. This work helps us obtain a comprehensive musical description of the tabla solo concert.

artificial intelligence, concert, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2211.0879

Country:

Asia > India > Uttar Pradesh > Lucknow (0.04)
Asia > India > Tamil Nadu > Chennai (0.04)
North America > United States (0.04)
(6 more...)

Genre: Research Report (0.40)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Monotonic segmental attention for automatic speech recognition

Zeyer, Albert, Schmitt, Robin, Zhou, Wei, Schlüter, Ralf, Ney, Hermann

arXiv.org Artificial IntelligenceOct-26-2022

We introduce a novel segmental-attention model for automatic speech recognition. We restrict the decoder attention to segments to avoid quadratic runtime of global attention, better generalize to long sequences, and eventually enable streaming. We directly compare global-attention and different segmental-attention modeling variants. We develop and compare two separate time-synchronous decoders, one specifically taking the segmental nature into account, yielding further improvements. Using time-synchronous decoding for segmental models is novel and a step towards streaming applications. Our experiments show the importance of a length model to predict the segment boundaries. The final best segmental-attention model using segmental decoding performs better than global-attention, in contrast to other monotonic attention approaches in the literature. Further, we observe that the segmental model generalizes much better to long sequences of up to several minutes.

machine learning, natural language, recognition, (18 more...)

arXiv.org Artificial Intelligence

2210.14742

Country:

Europe > Germany > North Rhine-Westphalia > Cologne Region > Aachen (0.04)
Europe > Germany > Berlin (0.04)
Asia > Singapore (0.04)
(9 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Structured Summarization: Unified Text Segmentation and Segment Labeling as a Generation Task

Inan, Hakan, Rungta, Rashi, Mehdad, Yashar

arXiv.org Artificial IntelligenceSep-27-2022

Text segmentation aims to divide text into contiguous, semantically coherent segments, while segment labeling deals with producing labels for each segment. Past work has shown success in tackling segmentation and labeling for documents and conversations. This has been possible with a combination of task-specific pipelines, supervised and unsupervised learning objectives. In this work, we propose a single encoder-decoder neural network that can handle long documents and conversations, trained simultaneously for both segmentation and segment labeling using only standard supervision. We successfully show a way to solve the combined task as a pure generation task, which we refer to as structured summarization. We apply the same technique to both document and conversational data, and we show state of the art performance across datasets for both segmentation and labeling, under both high- and low-resource settings. Our results establish a strong case for considering text segmentation and segment labeling as a whole, and moving towards general-purpose techniques that don't depend on domain expertise or task-specific components.

machine learning, natural language, segmentation, (19 more...)

arXiv.org Artificial Intelligence

2209.13759

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Massachusetts (0.04)
North America > Dominican Republic (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

ESPRESSO: Entropy and ShaPe awaRe timE-Series SegmentatiOn for processing heterogeneous sensor data

Deldari, Shohreh, Smith, Daniel V., Sadri, Amin, Salim, Flora D.

arXiv.org Machine LearningJul-24-2020

Extracting informative and meaningful temporal segments from high-dimensional wearable sensor data, smart devices, or IoT data is a vital preprocessing step in applications such as Human Activity Recognition (HAR), trajectory prediction, gesture recognition, and lifelogging. In this paper, we propose ESPRESSO (Entropy and ShaPe awaRe timE-Series SegmentatiOn), a hybrid segmentation model for multi-dimensional time-series that is formulated to exploit the entropy and temporal shape properties of time-series. ESPRESSO differs from existing methods that focus upon particular statistical or temporal properties of time-series exclusively. As part of model development, a novel temporal representation of time-series $WCAC$ was introduced along with a greedy search approach that estimate segments based upon the entropy metric. ESPRESSO was shown to offer superior performance to four state-of-the-art methods across seven public datasets of wearable and wear-free sensing. In addition, we undertake a deeper investigation of these datasets to understand how ESPRESSO and its constituent methods perform with respect to different dataset characteristics. Finally, we provide two interesting case-studies to show how applying ESPRESSO can assist in inferring daily activity routines and the emotional state of humans.

artificial intelligence, data mining, machine learning, (14 more...)

arXiv.org Machine Learning

doi: 10.1145/3411832

2008.0323

Country:

Asia (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology (1.00)
Health & Medicine (1.00)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback