Education
Black Friday Protein Powder Deals and Supplement Steals (2025)
From protein supplements and electrolytes to greens powders and energy drinks, these are the discounted picks worth snagging. The wellness industry is a wild marketplace. You can't trust the marketing alone, and FDA regulation on protein powder deals is quite limited. It pays to be cautious. So for this year's Black Friday, we sifted through the markdowns, cross-checked claims, verified third-party tests, and sampled the supplements so you don't have to.
Black Friday Protein Powder Deals and Supplement Steals (2025)
From protein supplements and electrolytes to greens powders and energy drinks, these are the discounted picks worth snagging. The wellness industry is a wild marketplace. You can't trust the marketing alone, and FDA regulation is quite limited. It pays to be cautious. So for this year's Black Friday, we sifted through the markdowns, cross-checked claims, verified third-party tests, and sampled the supplements so you don't have to.
Reranking partisan animosity in algorithmic social media feeds alters affective polarization Science
We recruited participants through two online platforms, CloudResearch and Bovitz, targeting US residents over 18 years old who self-identified as either Republican or Democrat and were active users of X (SM section S1.1). Qualified individuals were invited to complete a screening task, which included installing a browser extension that analyzed their X feed. To ensure the interventions could have a meaningful impact on participants' feeds, only those with at least 5% of posts related to politics or social issues were invited to participate. Figure S1 summarizes the recruitment funnel, including the number of individuals at each stage of the process. Participants were not instructed to use X in any particular way, but they received daily reminders if they had not used the platform that day.
Mathematics is hard for mathematicians to understand too Science
At a recent conference on mathematics in the age of automated proofs, mathematician and Fields Medalist Akshay Venkatesh presented โHow do we talk to our students about AI?'' He quoted an email he'd received from a young student who asked, โDo you believe that mathematics is worth being studied in a world in which a machine can answer everything for you? What do you believe would be the 'jobโ of a mathematician in this world?โ Venkatesh framed AI as an opportunity to correct what he called an โessential gap that has opened between the practice of mathematics and our values.โ Mathematician William Thurston has explained these values by writing, โmathematics is not about numbers, equations, computations, or algorithms: it is about understanding.โ But Venkatesh argued that the record on this is terrible, lamenting that โfor a typical paper or talk, very few of us understand it.โ He is not alone in thinking that something is wrong with the current state of mathematics research.
Estimation in high-dimensional linear regression: Post-Double-Autometrics as an alternative to Post-Double-Lasso
Huรฉ, Sullivan, Laurent, Sรฉbastien, Aiounou, Ulrich, Flachaire, Emmanuel
Post-Double-Lasso is becoming the most popular method for estimating linear regression models with many covariates when the purpose is to obtain an accurate estimate of a parameter of interest, such as an average treatment effect. However, this method can suffer from substantial omitted variable bias in finite sample. We propose a new method called Post-Double-Autometrics, which is based on Autometrics, and show that this method outperforms Post-Double-Lasso.
Revisiting Generalization Across Difficulty Levels: It's Not So Easy
Kordi, Yeganeh, Nayak, Nihal V., Zuo, Max, Nguyen, Ilana, Bach, Stephen H.
We investigate how well large language models (LLMs) generalize across different task difficulties, a key question for effective data curation and evaluation. Existing research is mixed regarding whether training on easier or harder data leads to better results, and whether those gains come on easier or harder test data. We address this question by conducting a systematic evaluation of LLMs' generalization across models, datasets, and fine-grained groups of example difficulty. We rank examples in six datasets using the outputs of thousands of different LLMs and Item Response Theory (IRT), a well-established difficulty metric in educational testing. Unlike prior work, our difficulty ratings are therefore determined solely by the abilities of many different LLMs, excluding human opinions of difficulty. With a more objective, larger-scale, and finer-grained analysis, we show that cross-difficulty generalization is often limited; training on either easy or hard data cannot achieve consistent improvements across the full range of difficulties. These results show the importance of having a range of difficulties in both training and evaluation data for LLMs, and that taking shortcuts with respect to difficulty is risky.
RoParQ: Paraphrase-Aware Alignment of Large Language Models Towards Robustness to Paraphrased Questions
Large Language Models (LLMs) often exhibit inconsistent behavior when answering paraphrased questions, suggesting a reliance on surface-level patterns rather than true semantic understanding. To address this limitation, we introduce RoParQ, a benchmark specifically constructed to evaluate cross-paraphrase consistency in closed-book multiple-choice QA. This benchmark is derived from standard datasets by generating paraphrases via proprietary models and selectively retaining examples that elicit inconsistent confidence from a judge model. We further propose XParaCon, a novel evaluation metric that quantifies a model's robustness by measuring the standard deviation of accuracies across question variants. Additionally, we implement a reasoning-based, paraphrase-aware Supervised Fine-Tuning (SFT) strategy designed to align models toward semantic invariance. Our experiments demonstrate that this targeted alignment significantly enhances robustness. Notably, fine-tuned lightweight models achieved consistency levels comparable to much larger pre-trained models. These results highlight the efficacy of our approach in mitigating superficial memorization and fostering more robust, reliable LLMs.
Bangla Sign Language Translation: Dataset Creation Challenges, Benchmarking and Prospects
Rubaiyeat, Husne Ara, Mahmud, Hasan, Hasan, Md Kamrul
Bangla Sign Language Translation (BdSLT) has been severely constrained so far as the language itself is very low resource. Standard sentence level dataset creation for BdSLT is of immense importance for developing AI based assistive tools for deaf and hard of hearing people of Bangla speaking community. In this paper, we present a dataset, IsharaKhobor , and two subset of it for enabling research. We also present the challenges towards developing the dataset and present some way forward by benchmarking with landmark based raw and RQE embedding. We do some ablation on vocabulary restriction and canonicalization of the same within the dataset, which resulted in two more datasets, IsharaKhobor_small and IsharaKhobor_canonical_small. The dataset is publicly available at: www.kaggle.com/datasets/hasanssl/isharakhobor [1].
Pessimistic Verification for Open Ended Math Questions
Huang, Yanxing, Tang, Zihan, Lin, Zejin, Li, Peng, Liu, Yang
The key limitation of the verification performance lies in the ability of error detection. With this intuition we designed several variants of pessimistic verification, which are simple workflows that could significantly improve the verification of open-ended math questions. In pessimistic verification we construct multiple parallel verifications for the same proof, and the proof is deemed incorrect if any one of them reports an error. This simple technique significantly improves the performance across many math verification benchmarks without incurring substantial computational resources. Its token efficiency even surpassed extended long-CoT in test-time scaling. Our case studies further indicate that the majority of false negatives in stronger models are actually caused by annotation errors in the original dataset, so our method's performance is in fact underestimated. Self-verification for mathematical problems can effectively improve the reliability and performance of language model outputs, and it also plays a critical role in enabling long-horizon mathematical tasks. We believe that research on pessimistic verification will help enhance the mathematical capabilities of language models across a wide range of tasks.
Hierarchical Ranking Neural Network for Long Document Readability Assessment
Zheng, Yurui, Chen, Yijun, Zhang, Shaohong
Readability assessment aims to evaluate the reading di ffi culty of a text. In recent years, while deep learning technol - ogy has been gradually applied to readability assessment, m ost approaches fail to consider either the length of the text or the ordinal relationship of readability labels. This pap er proposes a bidirectional readability assessment mechan ism that captures contextual information to identify regions w ith rich semantic information in the text, thereby predicti ng the readability level of individual sentences. These sente nce-level labels are then used to assist in predicting the ov erall readability level of the document. Additionally, a pairwis e sorting algorithm is introduced to model the ordinal relationship between readability levels through label subtrac tion. Experimental results on Chinese and English datasets demonstrate that the proposed model achieves competitive p erformance and outperforms other baseline models. Introduction Automatic Text Readability (ARA) research originated in th e early 20th century, aiming to evaluate text reading di ffi culty and assist educators in recommending appropriate rea ding materials for learners [ 1 ]. Readability assessment approaches are generally classified into three paradig ms: human evaluation, co-selection-based analysis, and content-based analysis. Human evaluation involves expert annotation or reader surveys; co-selection methods leverage user interaction data such as reading time or choices [ 2 ]; and content-based approaches infer readability using linguistic, syntactic, or semantic features extracted fro m the text itself. Early studies predominantly relied on experts' subjective evaluations and simple statistical feat ures, such as sentence length and word complexity.