meteor
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
The rapid development of large language and vision models (LLVMs) has been driven by advances in visual instruction tuning. Recently, open-source LLVMs have curated high-quality visual instruction tuning datasets and utilized additional vision encoders or multiple computer vision models in order to narrow the performance gap with powerful closed-source LLVMs. These advancements are attributed to multifaceted information required for diverse capabilities, including fundamental image understanding, real-world knowledge about common-sense and non-object concepts (e.g., charts, diagrams, symbols, signs, and math problems), and step-by-step procedures for solving complex questions. Drawing from the multifaceted information, we present a new efficient LLVM, Mamba-based traversal of rationales (Meteor), which leverages multifaceted rationale to enhance understanding and answering capabilities. To embed lengthy rationales containing abundant information, we employ the Mamba architecture, capable of processing sequential data with linear time complexity. We introduce a new concept of traversal of rationale that facilitates efficient embedding of rationale. Subsequently, the backbone multimodal language model (MLM) is trained to generate answers with the aid of rationale. Through these steps, Meteor achieves significant improvements in vision language performances across multiple evaluation benchmarks requiring diverse capabilities, without scaling up the model size or employing additional vision encoders and computer vision models.
The Best Meteor Shower of the Year Is Coming--Here's How to Watch
The highlight of the year, the Geminids are the most active and colorful meteor shower, offering the chance to see hundreds of shooting stars every hour when they peak in mid-December. If you want to get into stargazing in 2025, there's still a chance to catch some of the best meteor showers of the year. Also known as shooting stars, meteors happen when Earth's orbital path crosses a path of debris left by a comet and that material burns up in the Earth's atmosphere. Watching a meteor shower is one of the most accessible ways to engage with the night sky. The next shower are the Geminids, a busy and bright shower that peaks in mid-December, offering the chance to see hundreds of shooting stars each hour.
- Asia > Nepal (0.14)
- North America > United States > California (0.04)
- Europe > Slovakia (0.04)
- (2 more...)
PARAN: Persona-Augmented Review ANswering system on Food Delivery Review Dataset
Park, Moonsoo, Yun, Jeongseok, Kim, Bohyung
Abstract--Personalized review response generation presents a significant challenge in domains where user information is limited, such as food delivery platforms. While large language models (LLMs) offer powerful text generation capabilities, they often produce generic responses when lacking contextual user data, reducing engagement and effectiveness. In this work, we propose a two-stage prompting framework that infers both explicit (e.g., user-stated preferences) and implicit (e.g., demographic or stylistic cues) personas directly from short review texts. These inferred persona attributes are then incorporated into the response generation prompt to produce user-tailored replies. T o encourage diverse yet faithful generations, we adjust decoding temperature during inference. We evaluate our method using a real-world dataset collected from a Korean food delivery app, and assess its impact on precision, diversity, and semantic consistency. Our findings highlight the effectiveness of persona-augmented prompting in enhancing the relevance and personalization of automated responses without requiring model fine-tuning.
- Asia > South Korea > Seoul > Seoul (0.05)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
TeluguST-46: A Benchmark Corpus and Comprehensive Evaluation for Telugu-English Speech Translation
Akkiraju, Bhavana, Bandarupalli, Srihari, Sambangi, Swathi, Ravuri, Vasavi, Saraswathi, R Vijaya, Vuppala, Anil Kumar
Despite Telugu being spoken by over 80 million people, speech translation research for this morphologically rich language remains severely underexplored. We address this gap by developing a high-quality Telugu--English speech translation benchmark from 46 hours of manually verified CSTD corpus data (30h/8h/8h train/dev/test split). Our systematic comparison of cascaded versus end-to-end architectures shows that while IndicWhisper + IndicMT achieves the highest performance due to extensive Telugu-specific training data, finetuned SeamlessM4T models demonstrate remarkable competitiveness despite using significantly less Telugu-specific training data. This finding suggests that with careful hyperparameter tuning and sufficient parallel data (potentially less than 100 hours), end-to-end systems can achieve performance comparable to cascaded approaches in low-resource settings. Our metric reliability study evaluating BLEU, METEOR, ChrF++, ROUGE-L, TER, and BERTScore against human judgments reveals that traditional metrics provide better quality discrimination than BERTScore for Telugu--English translation. The work delivers three key contributions: a reproducible Telugu--English benchmark, empirical evidence of competitive end-to-end performance potential in low-resource scenarios, and practical guidance for automatic evaluation in morphologically complex language pairs.
- Europe > Austria > Vienna (0.14)
- Asia > India > Telangana > Hyderabad (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- (5 more...)
CryptoQA: A Large-scale Question-answering Dataset for AI-assisted Cryptography
Elfares, Mayar, Reisert, Pascal, Dietz, Tilman, Barman, Manpa, Zaki, Ahmed, Küsters, Ralf, Bulling, Andreas
Large language models (LLMs) excel at many general-purpose natural language processing tasks. However, their ability to perform deep reasoning and mathematical analysis, particularly for complex tasks as required in cryptography, remains poorly understood, largely due to the lack of suitable data for evaluation and training. To address this gap, we present CryptoQA, the first large-scale question-answering (QA) dataset specifically designed for cryptography. CryptoQA contains over two million QA pairs drawn from curated academic sources, along with contextual metadata that can be used to test the cryptographic capabilities of LLMs and to train new LLMs on cryptographic tasks. We benchmark 15 state-of-the-art LLMs on CryptoQA, evaluating their factual accuracy, mathematical reasoning, consistency, referencing, backward reasoning, and robustness to adversarial samples. In addition to quantitative metrics, we provide expert reviews that qualitatively assess model outputs and establish a gold-standard baseline. Our results reveal significant performance deficits of LLMs, particularly on tasks that require formal reasoning and precise mathematical knowledge. This shows the urgent need for LLM assistants tailored to cryptography research and development. We demonstrate that, by using CryptoQA, LLMs can be fine-tuned to exhibit better performance on cryptographic tasks.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States > Illinois (0.04)
- North America > United States > Florida > Escambia County > Pensacola (0.04)
- (5 more...)
Asm2SrcEval: Evaluating Large Language Models for Assembly-to-Source Code Translation
Hamedi, Parisa, Jelodar, Hamed, Bai, Samita, Meymani, Mohammad, Razavi-Far, Roozbeh, Ghorbani, Ali A.
Assembly-to-source code translation is a critical task in reverse engineering, cybersecurity, and software maintenance, yet systematic benchmarks for evaluating large language models on this problem remain scarce. In this work, we present the first comprehensive evaluation of five state-of-the-art large language models on assembly-to-source translation. We assess model performance using a diverse set of metrics capturing lexical similarity (BLEU, ROUGE, and METEOR), semantic alignment (BERTScore), fluency (Perplexity), and efficiency (time prediction). Our results reveal clear trade-offs: while certain models excel in text similarity metrics, others demonstrate lower perplexity or faster inference times. We further provide qualitative analyses of typical model successes and failure cases, highlighting challenges such as control flow recovery and identifier reconstruction. Taken together, our benchmark offers actionable insights into the strengths and limitations of current large language models for program translation, establishing a foundation for future research in combining accuracy with efficiency for real-world applications.
Reasoning-Guided Claim Normalization for Noisy Multilingual Social Media Posts
Sharma, Manan, Suneesh, Arya, Jain, Manish, Rajpoot, Pawan Kumar, Devadiga, Prasanna, Hazarika, Bharatdeep, Shrivastava, Ashish, Gurumurthy, Kishan, Suresh, Anshuman B, Baliga, Aditya U
We address claim normalization for multilingual misinformation detection - transforming noisy social media posts into clear, verifiable statements across 20 languages. The key contribution demonstrates how systematic decomposition of posts using Who, What, Where, When, Why and How questions enables robust cross-lingual transfer despite training exclusively on English data. Our methodology incorporates finetuning Qwen3-14B using LoRA with the provided dataset after intra-post deduplication, token-level recall filtering for semantic alignment and retrieval-augmented few-shot learning with contextual examples during inference. Our system achieves METEOR scores ranging from 41.16 (English) to 15.21 (Marathi), securing third rank on the English leaderboard and fourth rank for Dutch and Punjabi. The approach shows 41.3% relative improvement in METEOR over baseline configurations and substantial gains over existing methods. Results demonstrate effective cross-lingual generalization for Romance and Germanic languages while maintaining semantic coherence across diverse linguistic structures.
- Europe > Spain > Galicia > Madrid (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (13 more...)
TianHui: A Domain-Specific Large Language Model for Diverse Traditional Chinese Medicine Scenarios
Yin, Ji, He, Menglan, Zhang, Yujie, Zhang, Linshuai, Ma, Tingting, Tian, Ce, Wu, Jie, Xu, Lin, Jiang, Tao
Background: Currently, domain - specific large language models (LLMs) in traditional Chinese medicine (TCM) are primarily designed for clinical practice and medical education, yet they demonstrate substantial limitations when applied to research contexts owing to inadeq uate adaptability to complex tasks, thereby constraining their scientific utility. Moreover, the absence of comprehensive evaluation datasets and computational resource constraints hinder rigorous performance assessments and prevent extensive comparative o r ablation experiments, ultimately resulting in suboptimal model performance and weakened persuasiveness. Objective: To address these challenges, this study proposed a method for constructing a specialized LLM for the TCM domain based on contextual data integration and domain knowledge fusion and successfully developed a privatized LLM for the TCM profession, TianHui. Methods: Firstly, we acquired a large amount of TCM data, including academic literature resources, published book materials, online public data, and other supplementary materials, and pre - processed them to finally generate the 0.97G unsupervised dataset and 611312 QAs. Then, we adopted a phased training strategy (Pre - Training (PT) and Supervised Fine - Tuning (SFT)) and integrated three key technologies, Quantized Low - Rank Adaptation (QLoRA) parameter efficient fine - tuning, DeepSpeed Stage 2 distributed traini ng optimization, and Flash Attention 2 accelerated computation, to achieve optimal allocation of computational resources while guaranteeing training stability. Finally, we evaluated TianHui using 12 different types of benchmark test datasets and conducted extensive comparison experiments and ablation experiments. Results: The benchmark test data showed that TianHui demonstrated excellent performance in 12 TCM - related application scenarios. It ranked in the top three in each evaluation index in six test datasets: APQ, TCMCD, HFR, HCCA, DHPE, and TLAW. Meanwhile, it achieved optimal performance in all indicators of the six test data sets: TCMEE, APR, GCPMI, TCMKQA, TCMRC, and ADTG.
- Health & Medicine > Diagnostic Medicine (0.46)
- Health & Medicine > Health Care Technology > Medical Record (0.46)
Galton's Law of Mediocrity: Why Large Language Models Regress to the Mean and Fail at Creativity in Advertising
Keon, Matt, Karim, Aabid, Lohana, Bhoomika, Karim, Abdul, Nguyen, Thai, Hamilton, Tara, Abbas, Ali
Large language models (LLMs) generate fluent text yet often default to safe, generic phrasing, raising doubts about their ability to handle creativity. We formalize this tendency as a Galton-style regression to the mean in language and evaluate it using a creativity stress test in advertising concepts. When ad ideas were simplified step by step, creative features such as metaphors, emotions, and visual cues disappeared early, while factual content remained, showing that models favor high-probability information. When asked to regenerate from simplified inputs, models produced longer outputs with lexical variety but failed to recover the depth and distinctiveness of the originals. We combined quantitative comparisons with qualitative analysis, which revealed that the regenerated texts often appeared novel but lacked true originality. Providing ad-specific cues such as metaphors, emotional hooks and visual markers improved alignment and stylistic balance, though outputs still relied on familiar tropes. Taken together, the findings show that without targeted guidance, LLMs drift towards mediocrity in creative tasks; structured signals can partially counter this tendency and point towards pathways for developing creativity-sensitive models.
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- Europe > United Kingdom (0.04)
- Europe > Ireland (0.04)
- (2 more...)
- Leisure & Entertainment (1.00)
- Automobiles & Trucks > Manufacturer (0.94)
- Media (0.93)
- Transportation > Ground > Road (0.46)
Align Where the Words Look: Cross-Attention-Guided Patch Alignment with Contrastive and Transport Regularization for Bengali Captioning
Anonto, Riad Ahmed, Zabin, Sardar Md. Saffat, Rahman, M. Saifur
Grounding vision--language models in low-resource languages remains challenging, as they often produce fluent text about the wrong objects. This stems from scarce paired data, translation pivots that break alignment, and English-centric pretraining that ignores target-language semantics. We address this with a compute-aware Bengali captioning pipeline trained on LaBSE-verified EN--BN pairs and 110k bilingual-prompted synthetic images. A frozen MaxViT yields stable visual patches, a Bengali-native mBART-50 decodes, and a lightweight bridge links the modalities. Our core novelty is a tri-loss objective: Patch-Alignment Loss (PAL) aligns real and synthetic patch descriptors using decoder cross-attention, InfoNCE enforces global real--synthetic separation, and Sinkhorn-based OT ensures balanced fine-grained patch correspondence. This PAL+InfoNCE+OT synergy improves grounding, reduces spurious matches, and drives strong gains on Flickr30k-1k (BLEU-4 12.29, METEOR 27.98, BERTScore-F1 71.20) and MSCOCO-1k (BLEU-4 12.00, METEOR 28.14, BERTScore-F1 75.40), outperforming strong CE baselines and narrowing the real--synthetic centroid gap by 41%.
- North America > United States (0.04)
- Asia > Singapore (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)