customer review
From Reviews to Actionable Insights: An LLM-Based Approach for Attribute and Feature Extraction
Boughanmi, Khaled, Jedidi, Kamel, Jedidi, Nour
This research proposes a systematic, large language model (LLM) approach for extracting product and service attributes, features, and associated sentiments from customer reviews. Grounded in marketing theory, the framework distinguishes perceptual attributes from actionable features, producing interpretable and managerially actionable insights. We apply the methodology to 20,000 Yelp reviews of Starbucks stores and evaluate eight prompt variants on a random subset of reviews. Model performance is assessed through agreement with human annotations and predictive validity for customer ratings. Results show high consistency between LLMs and human coders and strong predictive validity, confirming the reliability of the approach. Human coders required a median of six minutes per review, whereas the LLM processed each in two seconds, delivering comparable insights at a scale unattainable through manual coding. Managerially, the analysis identifies attributes and features that most strongly influence customer satisfaction and their associated sentiments, enabling firms to pinpoint "joy points," address "pain points," and design targeted interventions. We demonstrate how structured review data can power an actionable marketing dashboard that tracks sentiment over time and across stores, benchmarks performance, and highlights high-leverage features for improvement. Simulations indicate that enhancing sentiment for key service features could yield 1-2% average revenue gains per store.
- North America > Canada > Alberta (0.14)
- North America > United States > Pennsylvania (0.04)
- North America > United States > New Jersey (0.04)
- (6 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.94)
Does Using Counterfactual Help LLMs Explain Textual Importance in Classification?
Tan, Nelvin, Cheung, James Asikin, Shih, Yu-Ching, Yang, Dong, Salunkhe, Amol
Large language models (LLMs) are becoming useful in many domains due to their impressive abilities that arise from large training datasets and large model sizes. More recently, they have been shown to be very effective in textual classification tasks, motivating the need to explain the LLMs' decisions. Motivated by practical constrains where LLMs are black-boxed and LLM calls are expensive, we study how incorporating counterfactuals into LLM reasoning can affect the LLM's ability to identify the top words that have contributed to its classification decision. To this end, we introduce a framework called the decision changing rate that helps us quantify the importance of the top words in classification. Our experimental results show that using counterfactuals can be helpful.
- Media > Film (1.00)
- Leisure & Entertainment (0.83)
End-to-End Aspect-Guided Review Summarization at Scale
Boytsov, Ilya, DeGenova, Vinny, Balyasin, Mikhail, Walt, Joseph, Eusden, Caitlin, Rochat, Marie-Claire, Pierson, Margaret
We present a scalable large language model (LLM)-based system that combines aspect-based sentiment analysis (ABSA) with guided summarization to generate concise and interpretable product review summaries for the Wayfair platform. Our approach first extracts and consolidates aspect-sentiment pairs from individual reviews, selects the most frequent aspects for each product, and samples representative reviews accordingly. These are used to construct structured prompts that guide the LLM to produce summaries grounded in actual customer feedback. We demonstrate the real-world effectiveness of our system through a large-scale online A/B test. Furthermore, we describe our real-time deployment strategy and release a dataset of 11.8 million anonymized customer reviews covering 92,000 products, including extracted aspects and generated summaries, to support future research in aspect-guided review summarization.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (6 more...)
PRAISE: Enhancing Product Descriptions with LLM-Driven Structured Insights
Qidwai, Adnan, Mukhopadhyay, Srija, Khatiwada, Prerana, Roth, Dan, Gupta, Vivek
Accurate and complete product descriptions are crucial for e-commerce, yet seller-provided information often falls short. Customer reviews offer valuable details but are laborious to sift through manually. We present PRAISE: Product Review Attribute Insight Structuring Engine, a novel system that uses Large Language Models (LLMs) to automatically extract, compare, and structure insights from customer reviews and seller descriptions. PRAISE provides users with an intuitive interface to identify missing, contradictory, or partially matching details between these two sources, presenting the discrepancies in a clear, structured format alongside supporting evidence from reviews. This allows sellers to easily enhance their product listings for clarity and persuasiveness, and buyers to better assess product reliability. Our demonstration showcases PRAISE's workflow, its effectiveness in generating actionable structured insights from unstructured reviews, and its potential to significantly improve the quality and trustworthiness of e-commerce product catalogs.
- North America > United States > Pennsylvania (0.04)
- North America > United States > Arizona (0.04)
- North America > Canada > British Columbia (0.04)
- Asia > Singapore (0.04)
- Workflow (1.00)
- Research Report (0.82)
Enhancing Multilingual Sentiment Analysis with Explainability for Sinhala, English, and Code-Mixed Content
Rizvi, Azmarah, Thamindu, Navojith, Adhikari, A. M. N. H., Senevirathna, W. P. U., Kasthurirathna, Dharshana, Abeywardhana, Lakmini
Sentiment analysis is crucial for brand reputation management in the banking sector, where customer feedback spans English, Sinhala, Singlish, and code-mixed text. Existing models struggle with low-resource languages like Sinhala and lack interpretability for practical use. This research develops a hybrid aspect-based sentiment analysis framework that enhances multilingual capabilities with explainable outputs. Using cleaned banking customer reviews, we fine-tune XLM-RoBERTa for Sinhala and code-mixed text, integrate domain-specific lexicon correction, and employ BERT-base-uncased for English. The system classifies sentiment (positive, neutral, negative) with confidence scores, while SHAP and LIME improve interpretability by providing real-time sentiment explanations. Experimental results show that our approaches outperform traditional transformer-based classifiers, achieving 92.3 percent accuracy and an F1-score of 0.89 in English and 88.4 percent in Sinhala and code-mixed content. An explainability analysis reveals key sentiment drivers, improving trust and transparency. A user-friendly interface delivers aspect-wise sentiment insights, ensuring accessibility for businesses. This research contributes to robust, transparent sentiment analysis for financial applications by bridging gaps in multilingual, low-resource NLP and explainability.
- Banking & Finance (1.00)
- Information Technology > Security & Privacy (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Yelp adds AI-powered scores to business pages
The crowd-sourced review site Yelp unveiled a new feature that uses AI and customer reviews to rate common facets of nightlife and food-related business. The new Review Insights feature is available now on the iOS version of the Yelp app, according to the company's official blog. Review Insights aggregates customer reviews and feeds them into a large language model (LLM), which will assign specific aspects -- like the vibe or service time -- a rating out of 100. Supposedly it will be able to infer customer sentiment about these parts of a business "even when a review doesn't explicitly mention one of the topics." Yelp will also be adding an AI-powered homepage in the coming weeks.
Leveraging AI and NLP for Bank Marketing: A Systematic Review and Gap Analysis
Gerling, Christopher, Lessmann, Stefan
This paper explores the growing impact of AI and NLP in bank marketing, highlighting their evolving roles in enhancing marketing strategies, improving customer engagement, and creating value within this sector. While AI and NLP have been widely studied in general marketing, there is a notable gap in understanding their specific applications and potential within the banking sector. This research addresses this specific gap by providing a systematic review and strategic analysis of AI and NLP applications in bank marketing, focusing on their integration across the customer journey and operational excellence. Employing the PRISMA methodology, this study systematically reviews existing literature to assess the current landscape of AI and NLP in bank marketing. Additionally, it incorporates semantic mapping using Sentence Transformers and UMAP for strategic gap analysis to identify underexplored areas and opportunities for future research. The systematic review reveals limited research specifically focused on NLP applications in bank marketing. The strategic gap analysis identifies key areas where NLP can further enhance marketing strategies, including customer-centric applications like acquisition, retention, and personalized engagement, offering valuable insights for both academic research and practical implementation. This research contributes to the field of bank marketing by mapping the current state of AI and NLP applications and identifying strategic gaps. The findings provide actionable insights for developing NLP-driven growth and innovation frameworks and highlight the role of NLP in improving operational efficiency and regulatory compliance. This work has broader implications for enhancing customer experience, profitability, and innovation in the banking industry.
- Research Report > New Finding (1.00)
- Overview (0.92)
- Marketing (1.00)
- Consumer Products & Services (1.00)
- Banking & Finance > Financial Services (1.00)
- (2 more...)
Learning to Predict Usage Options of Product Reviews with LLM-Generated Labels
Kohlenberg, Leo, Horns, Leonard, Sadrieh, Frederic, Kiele, Nils, Clausen, Matthis, Ketterer, Konstantin, Navasardyan, Avetis, Czinczoll, Tamara, de Melo, Gerard, Herbrich, Ralf
Annotating large datasets can be challenging. However, crowd-sourcing is often expensive and can lack quality, especially for non-trivial tasks. We propose a method of using LLMs as few-shot learners for annotating data in a complex natural language task where we learn a standalone model to predict usage options for products from customer reviews. We also propose a new evaluation metric for this scenario, HAMS4, that can be used to compare a set of strings with multiple reference sets. Learning a custom model offers individual control over energy efficiency and privacy measures compared to using the LLM directly for the sequence-to-sequence task. We compare this data annotation approach with other traditional methods and demonstrate how LLMs can enable considerable cost savings. We find that the quality of the resulting data exceeds the level attained by third-party vendor services and that GPT-4-generated labels even reach the level of domain experts. We make the code and generated labels publicly available.
- Europe > Germany > Brandenburg > Potsdam (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (4 more...)
LLM-Measure: Generating Valid, Consistent, and Reproducible Text-Based Measures for Social Science Research
Yang, Yi, Duan, Hanyu, Liu, Jiaxin, Tam, Kar Yan
The increasing use of text as data in social science research necessitates the development of valid, consistent, reproducible, and efficient methods for generating text-based concept measures. This paper presents a novel method that leverages the internal hidden states of large language models (LLMs) to generate these concept measures. Specifically, the proposed method learns a concept vector that captures how the LLM internally represents the target concept, then estimates the concept value for text data by projecting the text's LLM hidden states onto the concept vector. Three replication studies demonstrate the method's effectiveness in producing highly valid, consistent, and reproducible text-based measures across various social science research contexts, highlighting its potential as a valuable tool for the research community.
- North America > United States > New York (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (2 more...)
- Banking & Finance > Economy (1.00)
- Government > Regional Government > North America Government > United States Government (0.96)
Automatic Generation of Behavioral Test Cases For Natural Language Processing Using Clustering and Prompting
Li, Ying, Singh, Rahul, Joshi, Tarun, Sudjianto, Agus
Recent work in behavioral testing for natural language processing (NLP) models, such as Checklist, is inspired by related paradigms in software engineering testing. They allow evaluation of general linguistic capabilities and domain understanding, hence can help evaluate conceptual soundness and identify model weaknesses. However, a major challenge is the creation of test cases. The current packages rely on semi-automated approach using manual development which requires domain expertise and can be time consuming. This paper introduces an automated approach to develop test cases by exploiting the power of large language models and statistical techniques. It clusters the text representations to carefully construct meaningful groups and then apply prompting techniques to automatically generate Minimal Functionality Tests (MFT). The well-known Amazon Reviews corpus is used to demonstrate our approach. We analyze the behavioral test profiles across four different classification algorithms and discuss the limitations and strengths of those models.
- Leisure & Entertainment (1.00)
- Media (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)