AITopics | byte

Collaborating Authors

byte

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Slagle - 2024 - SpaceByte: Towards Deleting Tokenization from Large Language Modeling

Kevin Slagle

Neural Information Processing SystemsFeb-18-2026, 10:43:50 GMT

In this work, we study the performance of byte-level and subword-level autoregressive models when trained using a fixed compute budget.

large language model, machine learning, spacebyte, (22 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > Dominican Republic (0.04)
(2 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.65)

Add feedback

f8f78f8043f35890181a824e53a57134-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 01:21:22 GMT

artificial intelligence, patch size, raster scan, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.72)

Add feedback

Modeling Million-byte Sequences with Multiscale Transformers Lili Y u Dániel Simig Colin Flaherty Armen Aghajanyan Luke Zettlemoyer Mike Lewis Meta AI

Neural Information Processing SystemsFeb-18-2026, 01:21:18 GMT

Sequences of millions of bytes are ubiquitous; for example, music, image, or video files typically consist of multiple megabytes.

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Flow-based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection - Appendix Haibao Y u 1, 2, Yingjuan T ang

Neural Information Processing SystemsFeb-13-2026, 16:56:23 GMT

Mean A verage Precision (mAP). For VIC3D object detection, we focus on the obstacles around the ego vehicle. There are two metrics used for evaluation: BEV@mAP and 3D@mAP . BEV@mAP evaluates the 3D boxes in the bird's-eye view and ignores the In our implementation, we ignore the transmission cost of calibration files and timestamps. For early fusion, we calculate the transmission cost of transmitting raw data.

artificial intelligence, machine learning, transmission cost, (16 more...)

Neural Information Processing Systems

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.88)

Add feedback

Blockwise Parallel Transformers for Large Context Models

Neural Information Processing SystemsFeb-8-2026, 13:45:42 GMT

Model sizes range from 1B to 70B.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

Neural Information Processing Systems

Country: Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

Chronicals: A High-Performance Framework for LLM Fine-Tuning with 3.51x Speedup over Unsloth

Nair, Arjun S.

arXiv.org Machine LearningJan-7-2026

Large language model fine-tuning is bottlenecked by memory: a 7B parameter model requires 84GB--14GB for weights, 14GB for gradients, and 56GB for FP32 optimizer states--exceeding even A100-40GB capacity. We present Chronicals, an open-source training framework achieving 3.51x speedup over Unsloth through four synergistic optimizations: (1) fused Triton kernels eliminating 75% of memory traffic via RMSNorm (7x), SwiGLU (5x), and QK-RoPE (2.3x) fusion; (2) Cut Cross-Entropy reducing logit memory from 5GB to 135MB through online softmax computation; (3) LoRA+ with theoretically-derived 16x differential learning rates between adapter matrices; and (4) Best-Fit Decreasing sequence packing recovering 60-75% of compute wasted on padding. On Qwen2.5-0.5B with A100-40GB, Chronicals achieves 41,184 tokens/second for full fine-tuning versus Unsloth's 11,736 tokens/second (3.51x). For LoRA at rank 32, we reach 11,699 tokens/second versus Unsloth MAX's 2,857 tokens/second (4.10x). Critically, we discovered that Unsloth's reported 46,000 tokens/second benchmark exhibited zero gradient norms--the model was not training. We provide complete mathematical foundations: online softmax correctness proofs, FlashAttention IO complexity bounds O(N^2 d^2 M^{-1}), LoRA+ learning rate derivations from gradient magnitude analysis, and bin-packing approximation guarantees. All implementations, benchmarks, and proofs are available at https://github.com/Ajwebdevs/Chronicals with pip installation via https://pypi.org/project/chronicals/.

large language model, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

2601.02609

Country: Europe (0.27)

Genre: Research Report > New Finding (0.67)

Industry: Education (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Llamazip: Leveraging LLaMA for Lossless Text Compression and Training Dataset Detection

Dréano, Sören, Molloy, Derek, Murphy, Noel

arXiv.org Artificial IntelligenceNov-25-2025

This work introduces Llamazip, a novel lossless text compression algorithm based on the predictive capabilities of the LLaMA3 language model. Llamazip achieves significant data reduction by only storing tokens that the model fails to predict, optimizing storage efficiency without compromising data integrity. Key factors affecting its performance, including quantization and context window size, are analyzed, revealing their impact on compression ratios and computational requirements. Beyond compression, Llamazip demonstrates the potential to identify whether a document was part of the training dataset of a language model. This capability addresses critical concerns about data provenance, intellectual property, and transparency in language model training.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.17589

Country: Asia (0.28)

Genre: Research Report (0.50)

Industry: Law > Intellectual Property & Technology Law (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Binary BPE: A Family of Cross-Platform Tokenizers for Binary Analysis

Bommarito, Michael J. II

arXiv.org Artificial IntelligenceNov-25-2025

Sequence models for binary analysis are bottlenecked by byte-level tokenization: raw bytes waste precious context window capacity for transformers and other neural network architectures, and many existing text-oriented tokenizers fail on arbitrary 0x00--0xFF sequences. To address this issue, we introduce the Binary BPE tokenizer family, a set of cross-platform Byte Pair Encoding (BPE) tokenizers for executables trained on a large corpus of binaries spanning multiple platforms, architectures, and operating systems, including Linux, Windows, macOS, Android, and malware sources. We release trained tokenizers with vocabularies of 4K, 8K, 16K, 32K, and 64K tokens, enabling both systematic scaling studies and practical deployment from resource-constrained edge devices to high-throughput datacenters. These tokenizers discover interpretable patterns (ELF/PE headers, instruction sequences, cross-platform strings) while yielding multi-byte compression per token. On representative uncompressed executables (e.g., ELF/PE/Mach-O rather than compressed APKs), the Binary BPE tokenizers typically allow for roughly 2-3x more binary content per fixed-length transformer context window than raw bytes, enabling more efficient research and practical deployment for content identification, malware detection, reverse engineering, and optimization. We release the trained Binary BPE tokenizers on HuggingFace, providing a drop-in, open-source foundation for binary-focused language models and context-efficient agentic tools.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2511.17573

Country: North America (0.28)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Software (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Reproducibility Report: Test-Time Training on Nearest Neighbors for Large Language Models

Zhou, Boyang, Lindqvist, Johan, Li, Lindsey

arXiv.org Artificial IntelligenceNov-24-2025

We reproduce the central claims of Test-Time Training on Nearest Neighbors for Large Language Models (Hardt and Sun, 2024), which proposes adapting a language model at inference time by fine-tuning on retrieved nearest-neighbor sequences. Using pretrained RoBERTa embeddings indexed with Faiss, we retrieve 20 neighbors per test input and apply one gradient update per neighbor across GPT-2 (117M, 774M), GPT-Neo (1.3B), and R1-Distilled-Qwen2.5-1.5B. Our experiments confirm that test-time training significantly reduces perplexity and bits-per-byte metrics across diverse domains from The Pile, with the largest improvements in structured or specialized datasets such as GitHub and EuroParl. We further validate that models not pretrained on The Pile benefit more from this adaptation than models already trained on similar data, allowing smaller models to approach the performance of larger ones. Due to infrastructure limitations, we introduce a memory-efficient retrieval implementation that loads only required line offsets rather than entire files, reducing RAM requirements from over 128 GB per server to 32 GB. We also extend the original study by evaluating R1-Distilled-Qwen2.5-1.5B, showing that test-time training yields consistent gains even for modern reasoning-optimized architectures. Overall, our results support the robustness and generality of nearest-neighbor test-time training while highlighting practical considerations for reproducing large-scale retrieval-augmented adaptation.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.16691

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filters

Collaborating Authors

byte

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

1bfd87d2d92f0556819467dc08034f76-Paper-Conference.pdf

Slagle - 2024 - SpaceByte: Towards Deleting Tokenization from Large Language Modeling

f8f78f8043f35890181a824e53a57134-Supplemental-Conference.pdf

Modeling Million-byte Sequences with Multiscale Transformers Lili Y u Dániel Simig Colin Flaherty Armen Aghajanyan Luke Zettlemoyer Mike Lewis Meta AI

Flow-based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection - Appendix Haibao Y u 1, 2, Yingjuan T ang

Blockwise Parallel Transformers for Large Context Models

Chronicals: A High-Performance Framework for LLM Fine-Tuning with 3.51x Speedup over Unsloth

Llamazip: Leveraging LLaMA for Lossless Text Compression and Training Dataset Detection

Binary BPE: A Family of Cross-Platform Tokenizers for Binary Analysis

Reproducibility Report: Test-Time Training on Nearest Neighbors for Large Language Models