contributed
The Zamba2 Suite: Technical Report
Glorioso, Paolo, Anthony, Quentin, Tokpanov, Yury, Golubeva, Anna, Shyam, Vasudev, Whittington, James, Pilault, Jonathan, Millidge, Beren
In this technical report, we present the Zamba2 series -- a suite of 1.2B, 2.7B, and 7.4B parameter hybrid Mamba2-transformer models that achieve state of the art performance against the leading open-weights models of their class, while achieving substantial gains in inference latency, throughput, and memory efficiency. The Zamba2 series builds upon our initial work with Zamba1-7B, optimizing its architecture, training and annealing datasets, and training for up to three trillion tokens. We provide open-source weights for all models of the Zamba2 series as well as instruction-tuned variants that are strongly competitive against comparable instruct-tuned models of their class. We additionally open-source the pretraining dataset, which we call Zyda-2, used to train the Zamba2 series of models. The models and datasets used in this work are openly available at https://huggingface.co/Zyphra
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
Real or Robotic? Assessing Whether LLMs Accurately Simulate Qualities of Human Responses in Dialogue
Ivey, Jonathan, Kumar, Shivani, Liu, Jiayu, Shen, Hua, Rakshit, Sushrita, Raju, Rohan, Zhang, Haotian, Ananthasubramaniam, Aparna, Kim, Junghwan, Yi, Bowen, Wright, Dustin, Israeli, Abraham, Møller, Anders Giovanni, Zhang, Lechen, Jurgens, David
Studying and building datasets for dialogue tasks is both expensive and time-consuming due to the need to recruit, train, and collect data from study participants. In response, much recent work has sought to use large language models (LLMs) to simulate both human-human and human-LLM interactions, as they have been shown to generate convincingly human-like text in many settings. However, to what extent do LLM-based simulations \textit{actually} reflect human dialogues? In this work, we answer this question by generating a large-scale dataset of 100,000 paired LLM-LLM and human-LLM dialogues from the WildChat dataset and quantifying how well the LLM simulations align with their human counterparts. Overall, we find relatively low alignment between simulations and human interactions, demonstrating a systematic divergence along the multiple textual properties, including style and content. Further, in comparisons of English, Chinese, and Russian dialogues, we find that models perform similarly. Our results suggest that LLMs generally perform better when the human themself writes in a way that is more similar to the LLM's own style.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- South America (0.04)
- (12 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Zamba: A Compact 7B SSM Hybrid Model
Glorioso, Paolo, Anthony, Quentin, Tokpanov, Yury, Whittington, James, Pilault, Jonathan, Ibrahim, Adam, Millidge, Beren
In this technical report, we present Zamba, a novel 7B SSM-transformer hybrid model which achieves competitive performance against leading open-weight models at a comparable scale. Zamba is trained on 1T tokens from openly available datasets and is the best non-transformer model at this scale. Zamba pioneers a unique architecture combining a Mamba backbone with a single shared attention module, thus obtaining the benefits of attention at minimal parameter cost. Due to its architecture, Zamba is significantly faster at inference than comparable transformer models and requires substantially less memory for generation of long sequences. Zamba is pretrained in two phases: the first phase is based on existing web datasets, while the second one consists of annealing the model over high-quality instruct and synthetic datasets, and is characterized by a rapid learning rate decay. We open-source the weights and all checkpoints for Zamba, through both phase 1 and annealing phases.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)
Gecko: Versatile Text Embeddings Distilled from Large Language Models
Lee, Jinhyuk, Dai, Zhuyun, Ren, Xiaoqi, Chen, Blair, Cer, Daniel, Cole, Jeremy R., Hui, Kai, Boratko, Michael, Kapadia, Rajvi, Ding, Wen, Luan, Yi, Duddu, Sai Meher Karthik, Abrego, Gustavo Hernandez, Shi, Weiqiang, Gupta, Nithi, Kusupati, Aditya, Jain, Prateek, Jonnalagadda, Siddhartha Reddy, Chang, Ming-Wei, Naim, Iftekhar
Text embedding models represent natural language as dense vectors, positioning semantically similar text near each other within the embedding space (Gao et al., 2021; Le and Mikolov, 2014; Reimers and Gurevych, 2019). These embeddings are commonly used for a wide range of downstream tasks including document retrieval, sentence similarity, classification, and clustering (Muennighoff et al., 2023). Instead of building separate embedding models for each downstream task, recent efforts seek to create a single embedding model supporting many tasks. The recent development of general-purpose text embedding models presents a challenge: these models require large amounts of training data to comprehensively cover desired domains and skills. Recent embedding efforts have focused on using extensive collections of training examples (Li et al., 2023; Wang et al., 2022).
- Asia > Middle East > Republic of Türkiye > Batman Province > Batman (0.05)
- North America > United States > New York (0.04)
- North America > Dominican Republic (0.04)
- (7 more...)
- Media > Film (0.67)
- Leisure & Entertainment > Sports > Olympic Games (0.46)
MetaHate: A Dataset for Unifying Efforts on Hate Speech Detection
Piot, Paloma, Martín-Rodilla, Patricia, Parapar, Javier
Hate speech represents a pervasive and detrimental form of online discourse, often manifested through an array of slurs, from hateful tweets to defamatory posts. As such speech proliferates, it connects people globally and poses significant social, psychological, and occasionally physical threats to targeted individuals and communities. Current computational linguistic approaches for tackling this phenomenon rely on labelled social media datasets for training. For unifying efforts, our study advances in the critical need for a comprehensive meta-collection, advocating for an extensive dataset to help counteract this problem effectively. We scrutinized over 60 datasets, selectively integrating those pertinent into MetaHate. This paper offers a detailed examination of existing collections, highlighting their strengths and limitations. Our findings contribute to a deeper understanding of the existing datasets, paving the way for training more robust and adaptable models. These enhanced models are essential for effectively combating the dynamic and complex nature of hate speech in the digital realm.
- North America > United States (0.28)
- Europe (0.04)
- Asia > Japan (0.04)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.68)
- Information Technology (0.68)
- Media > News (0.48)
- (3 more...)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)
Contributed: The power of AI in surgery
Artificial intelligence (AI) defined as algorithms that enable machines to perform cognitive functions (such as problem solving and decision-making) has changed for some time now the face of healthcare through Machine Learning (ML) and Natural Language Processing (NLP). Its use in surgery, however, took a longer time than in other medical specialties, mainly because of missing information regarding the possibilities of computational implementation in practical surgery. Thanks to fast developments registered, AI is currently perceived as a supplement and not a replacement for the skill of a human surgeon. And although the potential of the surgeon-patient-computer relationship is a long way from being fully explored, the use of AI in surgery is already driving significant changes for doctors and patients alike. For example, surgical planning and navigation have improved consistently through computed tomography (CT), ultrasound and magnetic resonance imaging (MRI), while minimally invasive surgery (MIS), combined with robotic assistance, resulted in decreased surgical trauma and improved patient recovery. Preoperative planning is the stage in which surgeons plan the surgical intervention based on the patient's medical records and imaging.
- Health & Medicine > Surgery (1.00)
- Health & Medicine > Health Care Technology (0.92)
- Health & Medicine > Therapeutic Area > Neurology (0.35)
- Health & Medicine > Diagnostic Medicine > Imaging (0.35)
Channeling AI into Government Citizen Engagement (Contributed)
In recent years, the proliferation of digital technologies has created multiple customer service channels and touchpoints through which citizens can access online government services. Unfortunately, user experience is often overlooked in the design and deployment of these new digital services. Citizens' expectations of service are shaped not only by their interactions with government agencies, but also by their everyday digital experiences. For example, a recent Accenture survey of over 5,000 citizens from five countries found that as they encounter more user-friendly AI solutions in their daily lives, expectations for government use of these technologies increase. In this changing environment, the need for a convenient and seamless customer experience across all engagement channels has never been more pressing.