Goto

Collaborating Authors

 disadvantage


SpaceByte: Towards Deleting Tokenization from Large Language Modeling

Neural Information Processing Systems

Tokenization is widely used in large language models because it significantly improves performance. However, tokenization imposes several disadvantages, such as performance biases, increased adversarial vulnerability, decreased character-level modeling performance, and increased modeling complexity. To address these disadvantages without sacrificing performance, we propose SpaceByte, a novel byte-level decoder architecture that closes the performance gap between byte-level and subword autoregressive language modeling. SpaceByte consists of a byte-level Transformer model, but with extra larger transformer blocks inserted in the middle of the layers. We find that performance is significantly improved by applying these larger blocks only after certain bytes, such as space characters, which typically denote word boundaries. Our experiments show that for a fixed training and inference compute budget, SpaceByte outperforms other byte-level architectures and roughly matches the performance of tokenized Transformer architectures.


Assessing Historical Structural Oppression Worldwide via Rule-Guided Prompting of Large Language Models

Chatterjee, Sreejato, Tran, Linh, Nguyen, Quoc Duy, Kirson, Roni, Hamlin, Drue, Aquino, Harvest, Lyu, Hanjia, Luo, Jiebo, Dye, Timothy

arXiv.org Artificial Intelligence

Abstract--Traditional efforts to measure historical structural oppression struggle with cross-national validity due to the unique, locally specified histories of exclusion, colonization, and social status in each country, and often have relied on structured indices that privilege material resources while overlooking lived, identity-based exclusion. We introduce a novel framework for oppression measurement that leverages Large Language Models (LLMs) to generate context-sensitive scores of lived historical disadvantage across diverse geopolitical settings. Using unstructured self-identified ethnicity utterances from a multilingual COVID-19 global study, we design rule-guided prompting strategies that encourage models to produce interpretable, theoretically grounded estimations of oppression. We systematically evaluate these strategies across multiple state-of-the-art LLMs. Our results demonstrate that LLMs, when guided by explicit rules, can capture nuanced forms of identity-based historical oppression within nations. This approach provides a complementary measurement tool that highlights dimensions of systemic exclusion, offering a scalable, cross-cultural lens for understanding how oppression manifests in data-driven research and public health contexts. The study of racial and ethnic inequality remains central to sociological research, with extensive research documenting how structural oppression is reproduced in historical and contemporary contexts [1]-[3]. Oppression can be understood as a social hierarchy in which some groups subject other groups to lower status and to systemic exclusion, dehumanization, and disadvantage. In public health and sociology, this oppression is closely aligned with definitions of systemic and structural racism, which describe racism as deeply embedded in laws, policies, institutional practices, and social norms that sustain widespread inequities, violence, and disadvantage over time [1]. Foundational works have demonstrated how ethnic and national hierarchies shape access to power, life opportunities, autonomy, and sovereignty, for example, primarily through institutionalized mechanisms such as legal structures, educational systems, and healthcare access, among others [2].


Evaluation of Real-Time Preprocessing Methods in AI-Based ECG Signal Analysis

Freudenberg, Jasmin, Hahn, Kai, Weber, Christian, Fathi, Madjid

arXiv.org Artificial Intelligence

The increasing popularity of portable ECG systems and the growing demand for privacy-compliant, energy-efficient real-time analysis require new approaches to signal processing at the point of data acquisition. In this context, the edge domain is acquiring increasing importance, as it not only reduces latency times, but also enables an increased level of data security. The FACE project aims to develop an innovative machine learning solution for analysing long-term electrocardiograms that synergistically combines the strengths of edge and cloud computing. In this thesis, various pre-processing steps of ECG signals are analysed with regard to their applicability in the project. The selection of suitable methods in the edge area is based in particular on criteria such as energy efficiency, processing capability and real-time capability.


Towards Explaining Monte-Carlo Tree Search by Using Its Enhancements

Kowalski, Jakub, Winands, Mark H. M., Wiśniewski, Maksymilian, Reda, Stanisław, Wilbik, Anna

arXiv.org Artificial Intelligence

--Typically, research on Explainable Artificial Intelligence (XAI) focuses on black-box models within the context of a general policy in a known, specific domain. This paper advocates for the need for knowledge-agnostic explainability applied to the subfield of XAI called Explainable Search, which focuses on explaining the choices made by intelligent search techniques. It proposes Monte-Carlo Tree Search (MCTS) enhancements as a solution to obtaining additional data and providing higher-quality explanations while remaining knowledge-free, and analyzes the most popular enhancements in terms of the specific types of explainability they introduce. So far, no other research has considered the explainability of MCTS enhancements. We present a proof-of-concept that demonstrates the advantages of utilizing enhancements.


SpaceByte: Towards Deleting Tokenization from Large Language Modeling

Neural Information Processing Systems

Tokenization is widely used in large language models because it significantly improves performance. However, tokenization imposes several disadvantages, such as performance biases, increased adversarial vulnerability, decreased character-level modeling performance, and increased modeling complexity. To address these disadvantages without sacrificing performance, we propose SpaceByte, a novel byte-level decoder architecture that closes the performance gap between byte-level and subword autoregressive language modeling. SpaceByte consists of a byte-level Transformer model, but with extra larger transformer blocks inserted in the middle of the layers. We find that performance is significantly improved by applying these larger blocks only after certain bytes, such as space characters, which typically denote word boundaries. Our experiments show that for a fixed training and inference compute budget, SpaceByte outperforms other byte-level architectures and roughly matches the performance of tokenized Transformer architectures.


Generative AI in Collaborative Academic Report Writing: Advantages, Disadvantages, and Ethical Considerations

Sadeghpour, Mahshid, Arakala, Arathi, Rao, Asha

arXiv.org Artificial Intelligence

The availability and abundance of GenAI tools to administer tasks traditionally managed by people have raised concerns, particularly within the education and academic sectors, as some students may highly rely on these tools to complete the assignments designed to enable learning. This article focuses on informing students about the significance of investing their time during their studies on developing essential life-long learning skills using their own critical thinking, rather than depending on AI models that are susceptible to misinformation, hallucination, and bias. As we transition to an AI-centric era, it is important to educate students on how these models work, their pitfalls, and the ethical concerns associated with feeding data to such tools. Keywords: GenAI in Academic Writing GenAI's Ethics GenAI's Privacy Concerns. 1 Introduction Writing academic reports, and papers have been instrumental to assisting students and researchers in shaping their ideas, organising their methods, and practicing their communication skills, particularly when this process is combined with receiving constant feedback from experts. With the launch of OpenAI's first publicly available Large Language Model, namely ChatGPT (GPT-3.5), a significant concern rose within the academic and research community about the reliability of the academic and research output. Evidence suggests that as individuals began discovering the availability and efficiency in using Generative Artificial Intelligence tools in late 2022, there was a significant surge in retracted research articles resulting in more than 10,000 retracted papers [1]. The over-reliance of individuals on various Generative Artificial Intelligence (Gen AI) tools for completing tasks that require a human's critical thinking has raised concerns.


Reviews: Deep Equilibrium Models

Neural Information Processing Systems

Based on the authors response, I find the comparison against gradient checkpointing they provide satisfactory. Please ensure it is included in the final draft This work considers handling sequences of networks layers with identical weights (i.e. Instead of directly computing the sequence, a quasi-newton method is used to approximate the fixed point of the sequence. This has the advantage that the gradient has a simpler form, although one which must also be computed iteratively. The advantages are: • Much lower memory usage as intermediate tensors do not need to be stored for use in the backwards pass. Approximately 4-10x lower for the considered models.


Deep Learning and Machine Learning: Advancing Big Data Analytics and Management with Design Patterns

Chen, Keyu, Bi, Ziqian, Wang, Tianyang, Wen, Yizhu, Feng, Pohsun, Niu, Qian, Liu, Junyu, Peng, Benji, Zhang, Sen, Li, Ming, Pan, Xuanhe, Xu, Jiawei, Wang, Jinlang, Liu, Ming

arXiv.org Artificial Intelligence

This book, Design Patterns in Machine Learning and Deep Learning: Advancing Big Data Analytics Management, presents a comprehensive study of essential design patterns tailored for large-scale machine learning and deep learning applications. The book explores the application of classical software engineering patterns, Creational, Structural, Behavioral, and Concurrency Patterns, to optimize the development, maintenance, and scalability of big data analytics systems. Through practical examples and detailed Python implementations, it bridges the gap between traditional object-oriented design patterns and the unique demands of modern data analytics environments. Key design patterns such as Singleton, Factory, Observer, and Strategy are analyzed for their impact on model management, deployment strategies, and team collaboration, providing invaluable insights into the engineering of efficient, reusable, and flexible systems. This volume is an essential resource for developers, researchers, and engineers aiming to enhance their technical expertise in both machine learning and software design.


Reviews: Neural Expectation Maximization

Neural Information Processing Systems

This paper presents some though-provoking experiments in unsupervised entity recognition from time-series data. For me the impact of the paper came in Figs 3 and 5, which showed a very human-like decomposition. I'm not convinced that analyzing a few static shapes is an important problem these days. To me, it seems like a "first step" toward a more significant problem of recognizing concurrent actions (In this case, they have actions like "flying triangle" and "flying 9", with occasional occlusions muddying the picture). For example, RNN-EM running on non-pixel input features (output from a static object detector output (YOLO?)) seems one reasonable comparison point.


The Unfairness of $\varepsilon$-Fairness

Fadina, Tolulope, Schmidt, Thorsten

arXiv.org Machine Learning

Fairness in decision-making processes is often quantified using probabilistic metrics. However, these metrics may not fully capture the real-world consequences of unfairness. In this article, we adopt a utility-based approach to more accurately measure the real-world impacts of decision-making process. In particular, we show that if the concept of $\varepsilon$-fairness is employed, it can possibly lead to outcomes that are maximally unfair in the real-world context. Additionally, we address the common issue of unavailable data on false negatives by proposing a reduced setting that still captures essential fairness considerations. We illustrate our findings with two real-world examples: college admissions and credit risk assessment. Our analysis reveals that while traditional probability-based evaluations might suggest fairness, a utility-based approach uncovers the necessary actions to truly achieve equality. For instance, in the college admission case, we find that enhancing completion rates is crucial for ensuring fairness. Summarizing, this paper highlights the importance of considering the real-world context when evaluating fairness.