AITopics | splitter

Collaborating Authors

splitter

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Practical Code RAG at Scale: Task-Aware Retrieval Design Choices under Compute Budgets

Galimzyanov, Timur, Kolomyttseva, Olga, Bogomolov, Egor

arXiv.org Artificial IntelligenceOct-24-2025

We study retrieval design for code-focused generation tasks under realistic compute budgets. Using two complementary tasks from Long Code Arena -- code completion and bug localization -- we systematically compare retrieval configurations across various context window sizes along three axes: (i) chunking strategy, (ii) similarity scoring, and (iii) splitting granularity. (1) For PL-PL, sparse BM25 with word-level splitting is the most effective and practical, significantly outperforming dense alternatives while being an order of magnitude faster. (2) For NL-PL, proprietary dense encoders (Voyager-3 family) consistently beat sparse retrievers, however requiring 100x larger latency. (3) Optimal chunk size scales with available context: 32-64 line chunks work best at small budgets, and whole-file retrieval becomes competitive at 16000 tokens. (4) Simple line-based chunking matches syntax-aware splitting across budgets. (5) Retrieval latency varies by up to 200x across configurations; BPE-based splitting is needlessly slow, and BM25 + word splitting offers the best quality-latency trade-off. Thus, we provide evidence-based recommendations for implementing effective code-oriented RAG systems based on task requirements, model constraints, and computational efficiency.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.20609

Country:

Asia (0.46)
North America > United States (0.46)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)

Add feedback

TISDiSS: A Training-Time and Inference-Time Scalable Framework for Discriminative Source Separation

Feng, Yongsheng, Xu, Yuetonghui, Luo, Jiehui, Liu, Hongjia, Li, Xiaobing, Yu, Feng, Li, Wei

arXiv.org Artificial IntelligenceOct-15-2025

Source separation is a fundamental task in speech, music, and audio processing, and it also provides cleaner and larger data for training generative models. However, improving separation performance in practice often depends on increasingly large networks, inflating training and deployment costs. Motivated by recent advances in inference-time scaling for generative modeling, we propose Training-Time and Inference-Time Scalable Discriminative Source Separation (TISDiSS), a unified framework that integrates early-split multi-loss supervision, shared-parameter design, and dynamic inference repetitions. TISDiSS enables flexible speed-performance trade-offs by adjusting inference depth without retraining additional models. We further provide systematic analyses of architectural and training choices and show that training with more inference repetitions improves shallow-inference performance, benefiting low-latency applications. Experiments on standard speech separation benchmarks demonstrate state-of-the-art performance with a reduced parameter count, establishing TISDiSS as a scalable and practical framework for adaptive source separation. Code is available at https://github.com/WingSingFung/TISDiSS.

artificial intelligence, machine learning, separation, (16 more...)

arXiv.org Artificial Intelligence

2509.15666

Country: Asia > China (0.29)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

WildFX: A DAW-Powered Pipeline for In-the-Wild Audio FX Graph Modeling

Yang, Qihui, Berg-Kirkpatrick, Taylor, McAuley, Julian, Novack, Zachary

arXiv.org Artificial IntelligenceJul-21-2025

Despite rapid progress in end-to-end AI music generation, AI-driven modeling of professional Digital Signal Processing (DSP) workflows remains challenging. In particular, while there is growing interest in neural black-box modeling of audio effect graphs (e.g. reverb, compression, equalization), AI-based approaches struggle to replicate the nuanced signal flow and parameter interactions used in professional workflows. Existing differentiable plugin approaches often diverge from real-world tools, exhibiting inferior performance relative to simplified neural controllers under equivalent computational constraints. We introduce WildFX, a pipeline containerized with Docker for generating multi-track audio mixing datasets with rich effect graphs, powered by a professional Digital Audio Workstation (DAW) backend. WildFX supports seamless integration of cross-platform commercial plugins or any plugins in the wild, in VST/VST3/LV2/CLAP formats, enabling structural complexity (e.g., sidechains, crossovers) and achieving efficient parallelized processing. A minimalist metadata interface simplifies project/plugin configuration. Experiments demonstrate the pipeline's validity through blind estimation of mixing graphs, plugin/gain parameters, and its ability to bridge AI research with practical DSP demands. The code is available on: https://github.com/IsaacYQH/WildFX.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2507.10534

Genre:

Research Report (0.64)
Workflow (0.55)

Industry:

Media > Music (0.67)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Software (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Real-Time Cell Sorting with Scalable In Situ FPGA-Accelerated Deep Learning

Islam, Khayrul, Forelli, Ryan F., Han, Jianzhong, Bhadane, Deven, Huang, Jian, Agar, Joshua C., Tran, Nhan, Ogrenci, Seda, Liu, Yaling

arXiv.org Artificial IntelligenceMar-16-2025

Precise cell classification is essential in biomedical diagnostics and therapeutic monitoring, particularly for identifying diverse cell types involved in various diseases. Traditional cell classification methods such as flow cytometry depend on molecular labeling which is often costly, time-intensive, and can alter cell integrity. To overcome these limitations, we present a label-free machine learning framework for cell classification, designed for real-time sorting applications using bright-field microscopy images. This approach leverages a teacher-student model architecture enhanced by knowledge distillation, achieving high efficiency and scalability across different cell types. Demonstrated through a use case of classifying lymphocyte subsets, our framework accurately classifies T4, T8, and B cell types with a dataset of 80,000 preprocessed images, accessible via an open-source Python package for easy adaptation. Our teacher model attained 98\% accuracy in differentiating T4 cells from B cells and 93\% accuracy in zero-shot classification between T8 and B cells. Remarkably, our student model operates with only 0.02\% of the teacher model's parameters, enabling field-programmable gate array (FPGA) deployment. Our FPGA-accelerated student model achieves an ultra-low inference latency of just 14.5~$\mu$s and a complete cell detection-to-sorting trigger time of 24.7~$\mu$s, delivering 12x and 40x improvements over the previous state-of-the-art real-time cell analysis algorithm in inference and total latency, respectively, while preserving accuracy comparable to the teacher model. This framework provides a scalable, cost-effective solution for lymphocyte classification, as well as a new SOTA real-time cell sorting implementation for rapid identification of subsets using in situ deep learning on off-the-shelf computing hardware.

artificial intelligence, classification, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2503.12622

Country:

North America > United States > Pennsylvania > Northampton County > Bethlehem (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > New Jersey > Camden County > Camden (0.04)
(4 more...)

Genre: Research Report > New Finding (0.69)

Industry:

Health & Medicine > Therapeutic Area > Immunology (0.88)
Health & Medicine > Therapeutic Area > Hematology (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Diversity Enhances an LLM's Performance in RAG and Long-context Task

Wang, Zhchao, Bi, Bin, Luo, Yanqi, Asur, Sitaram, Cheng, Claire Na

arXiv.org Artificial IntelligenceFeb-13-2025

The rapid advancements in large language models (LLMs) have highlighted the challenge of context window limitations, primarily due to the quadratic time complexity of the self-attention mechanism ($O(N^2)$, where $N$ denotes the context window length). This constraint impacts tasks such as retrieval-augmented generation (RAG) in question answering (Q\&A) and long context summarization. A common approach involves selecting content with the highest similarity to the query; however, this often leads to redundancy and the exclusion of diverse yet relevant information. Building on principles from Maximal Marginal Relevance (MMR) and Farthest Point Sampling (FPS), we integrate diversity into the content selection process. Our findings reveal that incorporating diversity substantially increases the recall of selecting relevant sentences or chunks before LLM-based Q\&A and summarization. These results highlight the importance of maintaining diversity in future LLM applications to further improve summarization and Q\&A outcomes.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2502.09017

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(6 more...)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

Tethered Variable Inertial Attitude Control Mechanisms through a Modular Jumping Limbed Robot

Tanaka, Yusuke, Zhu, Alvin, Hong, Dennis

arXiv.org Artificial IntelligenceJan-17-2025

This paper presents the concept of a tethered variable inertial attitude control mechanism for a modular jumping-limbed robot designed for planetary exploration in low-gravity environments. The system, named SPLITTER, comprises two sub-10 kg quadrupedal robots connected by a tether, capable of executing successive jumping gaits and stabilizing in-flight using inertial morphing technology. Through model predictive control (MPC), attitude control was demonstrated by adjusting the limbs and tether length to modulate the system's principal moments of inertia. Our results indicate that this control strategy allows the robot to stabilize during flight phases without needing traditional flywheel-based systems or relying on aerodynamics, making the approach mass-efficient and ideal for small-scale planetary robots' successive jumps. The paper outlines the dynamics, MPC formulation for inertial morphing, actuator requirements, and simulation results, illustrating the potential of agile exploration for small-scale rovers in low-gravity environments like the Moon or asteroids.

artificial intelligence, inertial, robot, (15 more...)

arXiv.org Artificial Intelligence

2501.10156

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report > New Finding (0.48)

Industry:

Energy > Oil & Gas (0.80)
Transportation > Air (0.48)

Technology: Information Technology > Artificial Intelligence > Robots > Locomotion (0.47)

Add feedback

Universal on-chip polarization handling with deep photonic networks

Vit, Aycan Deniz, Rzayev, Ujal, Danis, Bahrem Serhat, Amiri, Ali Najjar, Gorgulu, Kazim, Magden, Emir Salih

arXiv.org Artificial IntelligenceDec-2-2024

We propose a novel design paradigm for arbitrarily capable deep photonic networks of cascaded Mach-Zehnder Interferometers (MZIs) for on-chip universal polarization handling. Using a device architecture made of cascaded Mach-Zehnder interferometers, we modify and train the phase difference between interferometer arms for both polarizations through wide operation bandwidths. Three proof-of-concept polarization handling devices are illustrated using a software-defined, physics-informed neural framework, to achieve user-specified target device responses as functions of polarization and wavelength. These devices include a polarization splitter, a polarization-independent power splitter, and an arbitrary polarization-dependent splitter to illustrate the capabilities of the design framework. The performance for all three devices is optimized using transfer matrix calculations; and their final responses are verified through 3D-FDTD simulations. All devices demonstrate state-of-the-art performance metrics with over 20 dB extinction, and flat-top transmission bands through bandwidths of 120 nm. In addition to the functional diversity enabled, the optimization for each device is completed in under a minute, highlighting the computational efficiency of the design paradigm presented. These results demonstrate the versatility of the deep photonic network design ecosystem in polarization management, unveiling promising prospects for advanced on-chip applications in optical communications, sensing, and computing.

deep photonic network, photonic network, polarization, (14 more...)

arXiv.org Artificial Intelligence

2411.16698

Country:

North America > United States > Illinois > Cook County > Evanston (0.04)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Make Compound Sentences Simple to Analyze: Learning to Split Sentences for Aspect-based Sentiment Analysis

Seo, Yongsik, Song, Sungwon, Heo, Ryang, Kim, Jieyong, Lee, Dongha

arXiv.org Artificial IntelligenceOct-3-2024

In the domain of Aspect-Based Sentiment Analysis (ABSA), generative methods have shown promising results and achieved substantial advancements. However, despite these advancements, the tasks of extracting sentiment quadruplets, which capture the nuanced sentiment expressions within a sentence, remain significant challenges. In particular, compound sentences can potentially contain multiple quadruplets, making the extraction task increasingly difficult as sentence complexity grows. To address this issue, we are focusing on simplifying sentence structures to facilitate the easier recognition of these elements and crafting a model that integrates seamlessly with various ABSA tasks. In this paper, we propose Aspect Term Oriented Sentence Splitter (ATOSS), which simplifies compound sentence into simpler and clearer forms, thereby clarifying their structure and intent. As a plug-and-play module, this approach retains the parameters of the ABSA model while making it easier to identify essential intent within input sentences. Extensive experimental results show that utilizing ATOSS outperforms existing methods in both ASQP and ACOS tasks, which are the primary tasks for extracting sentiment quadruplets.

absa model, quadruplet, split sentence, (16 more...)

arXiv.org Artificial Intelligence

2410.02297

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Europe > Italy (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Consumer Products & Services (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.82)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.76)

Add feedback

Efficient Network Embedding by Approximate Equitable Partitions

Squillace, Giuseppe, Tribastone, Mirco, Tschaikowski, Max, Vandin, Andrea

arXiv.org Artificial IntelligenceSep-16-2024

Structural network embedding is a crucial step in enabling effective downstream tasks for complex systems that aims to project a network into a lower-dimensional space while preserving similarities among nodes. We introduce a simple and efficient embedding technique based on approximate variants of equitable partitions. The approximation consists in introducing a user-tunable tolerance parameter relaxing the otherwise strict condition for exact equitable partitions that can be hardly found in real-world networks. We exploit a relationship between equitable partitions and equivalence relations for Markov chains and ordinary differential equations to develop a partition refinement algorithm for computing an approximate equitable partition in polynomial time. We compare our method against state-of-the-art embedding techniques on benchmark networks. We report comparable -- when not superior -- performance for visualization, classification, and regression tasks at a cost between one and three orders of magnitude smaller using a prototype implementation, enabling the embedding of large-scale networks which could not be efficiently handled by most of the competing techniques.

equitable partition, node, partition, (16 more...)

arXiv.org Artificial Intelligence

2409.1016

Country:

North America > United States (0.28)
South America > Brazil (0.05)
Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)
Europe > Denmark > North Jutland > Aalborg (0.04)

Genre: Research Report (1.00)

Industry: Media (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Exploring Information Retrieval Landscapes: An Investigation of a Novel Evaluation Techniques and Comparative Document Splitting Methods

Narimissa, Esmaeil, Raithel, David

arXiv.org Artificial IntelligenceSep-12-2024

The performance of Retrieval-Augmented Generation (RAG) systems in information retrieval is significantly influenced by the characteristics of the documents being processed. In this study, the structured nature of textbooks, the conciseness of articles, and the narrative complexity of novels are shown to require distinct retrieval strategies. A comparative evaluation of multiple documentsplitting methods reveals that the Recursive Character Splitter outperforms the Token-based Splitter in preserving contextual integrity. A novel evaluation technique is introduced, utilizing an open-source model to generate a comprehensive dataset of question-and-answer pairs, simulating realistic retrieval scenarios to enhance testing efficiency and metric reliability. The evaluation employs weighted scoring metrics, including SequenceMatcher, BLEU, METEOR, and BERT Score, to assess the system's accuracy and relevance. This approach establishes a refined standard for evaluating the precision of RAG systems, with future research focusing on optimizing chunk and overlap sizes to improve retrieval accuracy and efficiency. 2

document type, rag system, retrieval, (15 more...)

arXiv.org Artificial Intelligence

2409.08479

Country:

North America > United States > New York > New York County > New York City (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.91)

Add feedback