cohere
One Word Is Not Enough: Simple Prompts Improve Word Embeddings
Text embedding models are designed for sentence-level applications like retrieval and semantic similarity, and are primarily evaluated on sentence-level benchmarks. Their behavior on isolated words is less understood. We show that simply prepending semantic prompts to words before embedding substantially improves word similarity correlations. Testing 7 text embedding models, including text-embedding-3-large (OpenAI), embed-english-v3.0 (Cohere), voyage-3(Voyage AI), all-mpnet-base-v2, and Qwen3-Embedding-8B, on 3 standard benchmarks (SimLex-999, WordSim-353, MEN-3000), we find that prompts like "meaning: {word}" or "Represent the semantic concept: {word}" improve Spearman correlations by up to +0.29 on SimLex-999. Some models fail completely on bare words (correlation = 0) but recover with prompts (+0.73 improvement). Our best results achieve correlation = 0.692 on SimLex-999 with embed-english-v3.0 (Cohere), correlation = 0.811 on WordSim-353, and correlation = 0.855 on MEN-3000 with text-embedding-3-large (OpenAI). These results outperform classic static embeddings like Word2Vec (correlation = 0.40) and even the best static method LexVec (correlation = 0.48) on SimLex-999, establishing a new state-of-the-art for pure embedding methods. This zero-shot technique requires no training and works with any text embedding model.
- Asia > China > Hong Kong (0.04)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.47)
Think Like a Person Before Responding: A Multi-Faceted Evaluation of Persona-Guided LLMs for Countering Hate
Ngueajio, Mikel K., Plaza-del-Arco, Flor Miriam, Chung, Yi-Ling, Rawat, Danda B., Curry, Amanda Cercas
Automated counter-narratives (CN) offer a promising strategy for mitigating online hate speech, yet concerns about their affective tone, accessibility, and ethical risks remain. We propose a framework for evaluating Large Language Model (LLM)-generated CNs across four dimensions: persona framing, verbosity and readability, affective tone, and ethical robustness. Using GPT-4o-Mini, Cohere's CommandR-7B, and Meta's LLaMA 3.1-70B, we assess three prompting strategies on the MT-Conan and HatEval datasets. Our findings reveal that LLM-generated CNs are often verbose and adapted for people with college-level literacy, limiting their accessibility. While emotionally guided prompts yield more empathetic and readable responses, there remain concerns surrounding safety and effectiveness.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (9 more...)
Major publishers sue AI startup Cohere over copyright infringement
This is another salvo in the ongoing war between the people that make stuff and the AI algorithms that mimic the stuff that people make. Additionally, the startup has been accused of passing off large segments of entire articles to its users without proper attribution. "Rather than create their own content, they're stealing ours to compete with us without our permission, without compensation, and undermining our very business that feeds their machines in the first place," said Danielle Coffey, CEO of the News Media Alliance, which organized the lawsuit on behalf of its members. The suit also says the company has engaged in trademark infringement, suggesting that the algorithm would send articles to users with proper attribution, using the publisher's name, but the article itself would be filled with hallucinated and incorrect information. One example given in the suit involves a piece that The Guardian published about Hamas's attack on the Nova music festival in Israel, only the AI conflated the terror attack with a 2020 shooting in Nova Scotia, Canada. Members of the News Media Alliance are suing the AI company Cohere, accusing it of stealing their journalism without permission to train its generative AI model.
- North America > Canada > Nova Scotia (0.26)
- Asia > Middle East > Israel (0.26)
- North America > United States > New York (0.08)
- Media (1.00)
- Law > Intellectual Property & Technology Law (1.00)
StAyaL | Multilingual Style Transfer
Thakrar, Karishma, Lawrence, Katrina, Howard, Kyle
Stylistic text generation plays a vital role in enhancing communication by reflecting the nuances of individual expression. This paper presents a novel approach for generating text in a specific speaker's style across different languages. We show that by leveraging only 100 lines of text, an individuals unique style can be captured as a high-dimensional embedding, which can be used for both text generation and stylistic translation. This methodology breaks down the language barrier by transferring the style of a speaker between languages. The paper is structured into three main phases: augmenting the speaker's data with stylistically consistent external sources, separating style from content using machine learning and deep learning techniques, and generating an abstract style profile by mean pooling the learned embeddings. The proposed approach is shown to be topic-agnostic, with test accuracy and F1 scores of 74.9% and 0.75, respectively. The results demonstrate the potential of the style profile for multilingual communication, paving the way for further applications in personalized content generation and cross-linguistic stylistic transfer.
ChemTEB: Chemical Text Embedding Benchmark, an Overview of Embedding Models Performance & Efficiency on a Specific Domain
Kasmaee, Ali Shiraee, Khodadad, Mohammad, Saloot, Mohammad Arshi, Sherck, Nick, Dokas, Stephen, Mahyar, Hamidreza, Samiee, Soheila
Recent advancements in language models have started a new era of superior information retrieval and content generation, with embedding models playing an important role in optimizing data representation efficiency and performance. While benchmarks like the Massive Text Embedding Benchmark (MTEB) have standardized the evaluation of general domain embedding models, a gap remains in specialized fields such as chemistry, which require tailored approaches due to domain-specific challenges. This paper introduces a novel benchmark, the Chemical Text Embedding Benchmark (ChemTEB), designed specifically for the chemical sciences. ChemTEB addresses the unique linguistic and semantic complexities of chemical literature and data, offering a comprehensive suite of tasks on chemical domain data. Through the evaluation of 34 open-source and proprietary models using this benchmark, we illuminate the strengths and weaknesses of current methodologies in processing and understanding chemical information. Our work aims to equip the research community with a standardized, domain-specific evaluation framework, promoting the development of more precise and efficient NLP models for chemistry-related applications. Furthermore, it provides insights into the performance of generic models in a domain-specific context. ChemTEB comes with open-source code and data, contributing further to its accessibility and utility.
- North America > United States (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > Canada > Ontario > Hamilton (0.04)
- Materials > Chemicals (0.46)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
Evaluating the Usability of LLMs in Threat Intelligence Enrichment
Srikanth, Sanchana, Hasanuzzaman, Mohammad, Meem, Farah Tasnur
Large Language Models (LLMs) have the potential to significantly enhance threat intelligence by automating the collection, preprocessing, and analysis of threat data. However, the usability of these tools is critical to ensure their effective adoption by security professionals. Despite the advanced capabilities of LLMs, concerns about their reliability, accuracy, and potential for generating inaccurate information persist. This study conducts a comprehensive usability evaluation of five LLMs ChatGPT, Gemini, Cohere, Copilot, and Meta AI focusing on their user interface design, error handling, learning curve, performance, and integration with existing tools in threat intelligence enrichment. Utilizing a heuristic walkthrough and a user study methodology, we identify key usability issues and offer actionable recommendations for improvement. Our findings aim to bridge the gap between LLM functionality and user experience, thereby promoting more efficient and accurate threat intelligence practices by ensuring these tools are user-friendly and reliable.
- Information Technology > Security & Privacy (1.00)
- Education (1.00)
- Government > Military (0.69)
- Health & Medicine (0.67)
Bias in Text Embedding Models
Rakivnenko, Vasyl, Maslej, Nestor, Cervi, Jessica, Zhukov, Volodymyr
Text embedding is becoming an increasingly popular AI methodology, especially among businesses, yet the potential of text embedding models to be biased is not well understood. This paper examines the degree to which a selection of popular text embedding models are biased, particularly along gendered dimensions. More specifically, this paper studies the degree to which these models associate a list of given professions with gendered terms. The analysis reveals that text embedding models are prone to gendered biases but in varying ways. Although there are certain inter-model commonalities, for instance, greater association of professions like nurse, homemaker, and socialite with female identifiers, and greater association of professions like CEO, manager, and boss with male identifiers, not all models make the same gendered associations for each occupation. Furthermore, the magnitude and directionality of bias can also vary on a model-by-model basis and depend on the particular words models are prompted with. This paper demonstrates that gender bias afflicts text embedding models and suggests that businesses using this technology need to be mindful of the specific dimensions of this problem.
Characterizing Multimodal Long-form Summarization: A Case Study on Financial Reports
Cao, Tianyu, Raman, Natraj, Dervovic, Danial, Tan, Chenhao
As large language models (LLMs) expand the power of natural language processing to handle long inputs, rigorous and systematic analyses are necessary to understand their abilities and behavior. A salient application is summarization, due to its ubiquity and controversy (e.g., researchers have declared the death of summarization). In this paper, we use financial report summarization as a case study because financial reports not only are long but also use numbers and tables extensively. We propose a computational framework for characterizing multimodal long-form summarization and investigate the behavior of Claude 2.0/2.1, GPT-4/3.5, and Command. We find that GPT-3.5 and Command fail to perform this summarization task meaningfully. For Claude 2 and GPT-4, we analyze the extractiveness of the summary and identify a position bias in LLMs. This position bias disappears after shuffling the input for Claude, which suggests that Claude has the ability to recognize important information. We also conduct a comprehensive investigation on the use of numeric data in LLM-generated summaries and offer a taxonomy of numeric hallucination. We employ prompt engineering to improve GPT-4's use of numbers with limited success. Overall, our analyses highlight the strong capability of Claude 2 in handling long multimodal inputs compared to GPT-4.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Research Report (1.00)
- Financial News (0.94)
Evaluating Embedding APIs for Information Retrieval
Kamalloo, Ehsan, Zhang, Xinyu, Ogundepo, Odunayo, Thakur, Nandan, Alfonso-Hermelo, David, Rezagholizadeh, Mehdi, Lin, Jimmy
The ever-increasing size of language models curtails their widespread availability to the community, thereby galvanizing many companies into offering access to large language models through APIs. One particular type, suitable for dense retrieval, is a semantic embedding service that builds vector representations of input text. With a growing number of publicly available APIs, our goal in this paper is to analyze existing offerings in realistic retrieval scenarios, to assist practitioners and researchers in finding suitable services according to their needs. Specifically, we investigate the capabilities of existing semantic embedding APIs on domain generalization and multilingual retrieval. For this purpose, we evaluate these services on two standard benchmarks, BEIR and MIRACL. We find that re-ranking BM25 results using the APIs is a budget-friendly approach and is most effective in English, in contrast to the standard practice of employing them as first-stage retrievers. For non-English retrieval, re-ranking still improves the results, but a hybrid model with BM25 works best, albeit at a higher cost. We hope our work lays the groundwork for evaluating semantic embedding APIs that are critical in search and more broadly, for information access.
- North America > United States > Oregon (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (6 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.83)
The economic trade-offs of large language models: A case study
Howell, Kristen, Christian, Gwen, Fomitchov, Pavel, Kehat, Gitit, Marzulla, Julianne, Rolston, Leanne, Tredup, Jadin, Zimmerman, Ilana, Selfridge, Ethan, Bradley, Joseph
Contacting customer service via chat is a common practice. Because employing customer service agents is expensive, many companies are turning to NLP that assists human agents by auto-generating responses that can be used directly or with modifications. Large Language Models (LLMs) are a natural fit for this use case; however, their efficacy must be balanced with the cost of training and serving them. This paper assesses the practical cost and impact of LLMs for the enterprise as a function of the usefulness of the responses that they generate. We present a cost framework for evaluating an NLP model's utility for this use case and apply it to a single brand as a case study in the context of an existing agent assistance product. We compare three strategies for specializing an LLM - prompt engineering, fine-tuning, and knowledge distillation - using feedback from the brand's customer service agents. We find that the usability of a model's responses can make up for a large difference in inference cost for our case study brand, and we extrapolate our findings to the broader enterprise space.
- Europe > Austria > Vienna (0.14)
- North America > United States > Washington > King County > Seattle (0.04)