passport
Domain-Grounded Evaluation of LLMs in International Student Knowledge
Daitx, Claudinei, Amar, Haitham
Large language models (LLMs) are increasingly used to answer high-stakes study-abroad questions about admissions, visas, scholarships, and eligibility. Yet it remains unclear how reliably they advise students, and how often otherwise helpful answers drift into unsupported claims (``hallucinations''). This work provides a clear, domain-grounded overview of how current LLMs behave in this setting. Using realistic questions set drawn from ApplyBoard's advising workflows -- an EdTech platform that supports students from discovery to enrolment -- we evaluate two essentials side by side: accuracy (is the information correct and complete?) and hallucination (does the model add content not supported by the question or domain evidence). These questions are categorized by domain scope which can be a single-domain or multi-domain -- when it must integrate evidence across areas such as admissions, visas, and scholarships. To reflect real advising quality, we grade answers with a simple rubric which is correct, partial, or wrong. The rubric is domain-coverage-aware: an answer can be partial if it addresses only a subset of the required domains, and it can be over-scoped if it introduces extra, unnecessary domains; both patterns are captured in our scoring as under-coverage or reduced relevance/hallucination. We also report measures of faithfulness and answer relevance, alongside an aggregate hallucination score, to capture relevance and usefulness. All models are tested with the same questions for a fair, head-to-head comparison. Our goals are to: (1) give a clear picture of which models are most dependable for study-abroad advising, (2) surface common failure modes -- where answers are incomplete, off-topic, or unsupported, and (3) offer a practical, reusable protocol for auditing LLMs before deployment in education and advising contexts.
- Asia > China (0.05)
- North America > United States > Texas (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- Media (1.00)
- Leisure & Entertainment > Sports (1.00)
- Banking & Finance (0.99)
- (2 more...)
- Information Technology > Communications > Social Media (0.74)
- Information Technology > Communications > Mobile (0.55)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.47)
- North America > Canada (0.04)
- Asia > Malaysia > Kuala Lumpur > Kuala Lumpur (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Asia > China > Hong Kong (0.04)
- North America > Canada (0.04)
The new iPhone feature that could make wallets obsolete
Millions of Apple users can now use their device to replace passports, drivers licenses, state IDs, and credit cards. Breakthroughs, discoveries, and DIY tips sent every weekday. Apple users are one step closer to being able to viably travel around the country with nothing but an iPhone in their pocket. On Wednesday, the company announced "Digital ID," a new feature that lets users store a mobile version of their passport in the Wallet app. Once uploaded, iPhone and Apple Watch users can present their Digital ID to pass through TSA security checkpoints at 250 airports in the US.
- North America > United States > New York (0.05)
- Asia > Middle East > UAE > Dubai Emirate > Dubai (0.05)
- Asia > Japan (0.05)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Communications > Mobile (1.00)
- Information Technology > Artificial Intelligence (1.00)
A Unified Representation Underlying the Judgment of Large Language Models
Lu, Yi-Long, Song, Jiajun, Wang, Wei
A central architectural question for both biological and artificial intelligence is whether judgment relies on specialized modules or a unified, domain-general resource. While the discovery of decodable neural representations for distinct concepts in Large Language Models (LLMs) has suggested a modular architecture, whether these representations are truly independent systems remains an open question. Here we provide evidence for a convergent architecture for evaluative judgment. Across a range of LLMs, we find that diverse evaluative judgments are computed along a dominant dimension, which we term the Valence-Assent Axis (VAA). This axis jointly encodes subjective valence ("what is good") and the model's assent to factual claims ("what is true"). Through direct interventions, we demonstrate this axis drives a critical mechanism, which is identified as the subordination of reasoning: the VAA functions as a control signal that steers the generative process to construct a rationale consistent with its evaluative state, even at the cost of factual accuracy. Our discovery offers a mechanistic account for response bias and hallucination, revealing how an architecture that promotes coherent judgment can systematically undermine faithful reasoning.
- North America > Mexico (0.04)
- Asia > East Asia (0.04)
- Pacific Ocean > North Pacific Ocean > San Francisco Bay > Golden Gate (0.04)
- (6 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Questionnaire & Opinion Survey (1.00)
- North America > Canada (0.04)
- Asia > Malaysia > Kuala Lumpur > Kuala Lumpur (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
We are grateful to reviewers for the constructive comments, which help to improve the quality & clarity of the paper
We are grateful to reviewers for the constructive comments, which help to improve the quality & clarity of the paper. Figure 1: Test accuracy on CIFAR100 as suggested by R1 (i.e. In summary, when ambiguous passports are forged and used ( e.g. We will include above results to the final draft. V1 V2 V3 Training - Passport layers added - Passports needed - 15-30% more training time - Passport layers added - Passports needed - 100-125% more training time - Passport layers added - Passports needed - Trigger set needed - 100-150% more training time Inferencing - Passport layers & passports needed - 10% more inferencing time - Passport layers & passport NOT needed NO extra time incurred - Passport layers & passport NOT needed NO extra time incurred V erification - NO separate verification needed - Passport layers & passports needed - Trigger set needed (black-box verification) - Passport layers & passports needed (white-box verification)Table 2: Summary of network complexity for V1, V2 and V3 schemes.
- Asia > China > Hong Kong (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)