burstiness
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
Workflow is All You Need: Escaping the "Statistical Smoothing Trap" via High-Entropy Information Foraging and Adversarial Pacing
Central to long-form text generation in vertical domains is the "impossible trinity" confronting current large language models (LLMs): the simultaneous achievement of low hallucination, deep logical coherence, and personalized expression. This study establishes that this bottleneck arises from existing generative paradigms succumbing to the Statistical Smoothing Trap, a phenomenon that overlooks the high-entropy information acquisition and structured cognitive processes integral to expert-level writing. To address this limitation, we propose the DeepNews Framework, an agentic workflow that explicitly models the implicit cognitive processes of seasoned financial journalists. The framework integrates three core modules: first, a dual-granularity retrieval mechanism grounded in information foraging theory, which enforces a 10:1 saturated information input ratio to mitigate hallucinatory outputs; second, schema-guided strategic planning, a process leveraging domain expert knowledge bases (narrative schemas) and Atomic Blocks to forge a robust logical skeleton; third, adversarial constraint prompting, a technique deploying tactics including Rhythm Break and Logic Fog to disrupt the probabilistic smoothness inherent in model-generated text. Experiments delineate a salient Knowledge Cliff in deep financial reporting: content truthfulness collapses when retrieved context falls below 15,000 characters, while a high-redundancy input exceeding 30,000 characters stabilizes the Hallucination-Free Rate (HFR) above 85%. In an ecological validity blind test conducted with a top-tier Chinese technology media outlet, the DeepNews system--built on a previous-generation model (DeepSeek-V3-0324)-achieved a 25% submission acceptance rate, significantly outperforming the 0% acceptance rate of zero-shot generation by a state-of-the-art (SOTA) model (GPT-5).
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.93)
- Banking & Finance > Trading (1.00)
- Media > News (0.88)
Can You Detect the Difference?
The rapid advancement of large language models (LLMs) has raised concerns about reliably detecting AI-generated text. Stylometric metrics work well on autoregressive (AR) outputs, but their effectiveness on diffusion-based models is unknown. We present the first systematic comparison of diffusion-generated text (LLaDA) and AR-generated text (LLaMA) using 2 000 samples. Perplexity, burstiness, lexical diversity, readability, and BLEU/ROUGE scores show that LLaDA closely mimics human text in perplexity and burstiness, yielding high false-negative rates for AR-oriented detectors. LLaMA shows much lower perplexity but reduced lexical fidelity. Relying on any single metric fails to separate diffusion outputs from human writing. We highlight the need for diffusion-aware detectors and outline directions such as hybrid models, diffusion-specific stylometric signatures, and robust watermarking.
Data Distributional Properties As Inductive Bias for Systematic Generalization
del Río, Felipe, Raymond-Sáez, Alain, Florea, Daniel, Icarte, Rodrigo Toro, Hurtado, Julio, Calderón, Cristián Buc, Soto, Álvaro
Deep neural networks (DNNs) struggle at systematic generalization (SG). Several studies have evaluated the possibility to promote SG through the proposal of novel architectures, loss functions or training methodologies. Few studies, however, have focused on the role of training data properties in promoting SG. In this work, we investigate the impact of certain data distributional properties, as inductive biases for the SG ability of a multi-modal language model. To this end, we study three different properties. First, data diversity, instantiated as an increase in the possible values a latent property in the training distribution may take. Second, burstiness, where we probabilistically restrict the number of possible values of latent factors on particular inputs during training. Third, latent intervention, where a particular latent factor is altered randomly during training. We find that all three factors significantly enhance SG, with diversity contributing an 89% absolute increase in accuracy in the most affected property. Through a series of experiments, we test various hypotheses to understand why these properties promote SG. Finally, we find that Normalized Mutual Information (NMI) between latent attributes in the training distribution is strongly predictive of out-of-distribution generalization. We find that a mechanism by which lower NMI induces SG is in the geometry of representations. In particular, we find that NMI induces more parallelism in neural representations (i.e., input features coded in parallel neural vectors) of the model, a property related to the capacity of reasoning by analogy.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > New York (0.04)
- (6 more...)
What Matters for In-Context Learning: A Balancing Act of Look-up and In-Weight Learning
Bratulić, Jelena, Mittal, Sudhanshu, Rupprecht, Christian, Brox, Thomas
Large Language Models (LLMs) have demonstrated impressive performance in various tasks, including In-Context Learning (ICL), where the model performs new tasks by conditioning solely on the examples provided in the context, without updating the model's weights. While prior research has explored the roles of pretraining data and model architecture, the key mechanism behind ICL remains unclear. In this work, we systematically uncover properties present in LLMs that support the emergence of ICL. To disambiguate these factors, we conduct a study with a controlled dataset and data sequences using a deep autoregressive model. We show that conceptual repetitions in the data sequences are crucial for ICL, more so than previously indicated training data properties like burstiness or long-tail distribution. Conceptual repetitions could refer to $n$-gram repetitions in textual data or exact image copies in image sequence data. Such repetitions also offer other previously overlooked benefits such as reduced transiency in ICL performance. Furthermore, we show that the emergence of ICL depends on balancing the in-weight learning objective with the in-context solving ability during training.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Germany > Baden-Württemberg > Freiburg (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)
Strategic Client Selection to Address Non-IIDness in HAPS-enabled FL Networks
Farajzadeh, Amin, Yadav, Animesh, Yanikomeroglu, Halim
The deployment of federated learning (FL) within vertical heterogeneous networks, such as those enabled by high-altitude platform station (HAPS), offers the opportunity to engage a wide array of clients, each endowed with distinct communication and computational capabilities. This diversity not only enhances the training accuracy of FL models but also hastens their convergence. Yet, applying FL in these expansive networks presents notable challenges, particularly the significant non-IIDness in client data distributions. Such data heterogeneity often results in slower convergence rates and reduced effectiveness in model training performance. Our study introduces a client selection strategy tailored to address this issue, leveraging user network traffic behaviour. This strategy involves the prediction and classification of clients based on their network usage patterns while prioritizing user privacy. By strategically selecting clients whose data exhibit similar patterns for participation in FL training, our approach fosters a more uniform and representative data distribution across the network. Our simulations demonstrate that this targeted client selection methodology significantly reduces the training loss of FL models in HAPS networks, thereby effectively tackling a crucial challenge in implementing large-scale FL systems.
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Ohio > Athens County > Athens (0.04)
- (2 more...)
- Telecommunications (0.95)
- Information Technology > Security & Privacy (0.46)
On the Burstiness of Distributed Machine Learning Traffic
Luangsomboon, Natchanon, Fazel, Fahimeh, Liebeherr, Jörg, Sobhani, Ashkan, Guan, Shichao, Chu, Xingjun
Traffic from distributed training of machine learning (ML) models makes up a large and growing fraction of the traffic mix in enterprise data centers. While work on distributed ML abounds, the network traffic generated by distributed ML has received little attention. Using measurements on a testbed network, we investigate the traffic characteristics generated by the training of the ResNet-50 neural network with an emphasis on studying its shortterm burstiness. For the latter we propose metrics that quantify traffic burstiness at different time scales. Our analysis reveals that distributed ML traffic exhibits a very high degree of burstiness on short time scales, exceeding a 60:1 peak-to-mean ratio on time intervals as long as 5 ms. We observe that training software orchestrates transmissions in such a way that burst transmissions from different sources within the same application do not result in congestion and packet losses. An extrapolation of the measurement data to multiple applications underscores the challenges of distributed ML traffic for congestion and flow control algorithms. This paper studies and analyzes the burstiness of traffic from training deep neural network (DNN) models as a root cause for short-lived surges of traffic, known as microbursts, that cause periods of high packet delay and loss in a data center network (DCN) even at a low utilization. Since microbursts occur at a time scale of less than a millisecond [1], traditional traffic control methods are not effective with avoiding packet losses in such scenarios. Research on microbursts in DCNs has suggested a range of potential root causes, including the inherent burstiness of application traffic, confluence of traffic flows to a common destination (fan-in, incast), offloading of protocol processing at hosts, and traffic control algorithms, such as packet scheduling and flow control [1]-[10]. While training of neural networks makes up a large fraction of the workload in data centers [11], to the best of our knowledge, there does not exist a detailed analysis of distributed ML traffic and its potential impact on the creation of microbursts. The vast majority of network traffic from training DNN models is due to the exchange of gradients of model parameters. As modern DNN models involve millions, and, in the case of large language models such as GPT, billions of parameters [12], the transmission of gradients creates huge data bursts. The measurement experiments are performed in a testbed network with a single switch with 100 Gbps line rates. We evaluate a server-based and a serverless mode of training. In server-based training, the nodes involved in the training, referred to as workers, exchange gradients with a dedicated server. Here, the transmissions to the server create a bottleneck.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada > Ontario > National Capital Region > Ottawa (0.04)
- (2 more...)
- Telecommunications > Networks (1.00)
- Information Technology (1.00)