typology
Predictive Querying for Autoregressive Neural Sequence Models
In reasoning about sequential events it is natural to pose probabilistic queries such as "when will event A occur next" or "what is the probability of A occurring before B", with applications in areas such as user modeling, language models, medicine, and finance. These types of queries are complex to answer compared to next-event prediction, particularly for neural autoregressive models such as recurrent neural networks and transformers. This is in part due to the fact that future querying involves marginalization over large path spaces, which is not straightforward to do efficiently in such models. In this paper we introduce a general typology for predictive queries in neural autoregressive sequence models and show that such queries can be systematically represented by sets of elementary building blocks. We leverage this typology to develop new query estimation methods based on beam search, importance sampling, and hybrids. Across four large-scale sequence datasets from different application domains, as well as for the GPT-2 language model, we demonstrate the ability to make query answering tractable for arbitrary queries in exponentially-large predictive path-spaces, and find clear differences in cost-accuracy tradeoffs between search and sampling methods.
A Data-driven Typology of Vision Models from Integrated Representational Metrics
Wu, Jialin, Saha, Shreya, Bo, Yiqing, Khosla, Meenakshi
Large vision models differ widely in architecture and training paradigm, yet we lack principled methods to determine which aspects of their representations are shared across families and which reflect distinctive computational strategies. We leverage a suite of representational similarity metrics, each capturing a different facet-geometry, unit tuning, or linear decodability-and assess family separability using multiple complementary measures. Metrics preserving geometry or tuning (e.g., RSA, Soft Matching) yield strong family discrimination, whereas flexible mappings such as Linear Predictivity show weaker separation. These findings indicate that geometry and tuning carry family-specific signatures, while linearly decodable information is more broadly shared. To integrate these complementary facets, we adapt Similarity Network Fusion (SNF), a method inspired by multi-omics integration. SNF achieves substantially sharper family separation than any individual metric and produces robust composite signatures. Clustering of the fused similarity matrix recovers both expected and surprising patterns: supervised ResNets and ViTs form distinct clusters, yet all self-supervised models group together across architectural boundaries. Hybrid architectures (ConvNeXt, Swin) cluster with masked autoencoders, suggesting convergence between architectural modernization and reconstruction-based training. This biology-inspired framework provides a principled typology of vision models, showing that emergent computational strategies-shaped jointly by architecture and training objective-define representational structure beyond surface design categories.
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > France (0.04)
Constructing Political Coordinates: Aggregating Over the Opposition for Diverse News Recommendation
Earl, Eamon, Ding, Chen, Valenzano, Richard, Paulen-Patterson, Drai
Abstract--In the past two decades, open access to news and information has increased rapidly, empowering educated political growth within democratic societies. News recommender systems (NRSs) have shown to be useful in this process, minimizing political disengagement and information overload by providing individuals with articles on topics that matter to them. Unfortunately, NRSs often conflate underlying user interest with the partisan bias of the articles in their reading history and with the most popular biases present in the coverage of their favored topics. Over extended interaction, this can result in the formation of filter bubbles and the polarization of user partisanship. In this paper, we propose a novel embedding space called Constructed Political Coordinates (CPC), which models the political partisanship of users over a given topic-space, relative to a larger sample population. We apply a simple collaborative filtering (CF) framework using CPC-based correlation to recommend articles sourced from oppositional users, who have different biases from the user in question. We compare against classical CF methods and find that CPC-based methods promote pointed bias diversity and better match the true political tolerance of users, while classical methods implicitly exploit biases to maximize interaction. Recommender system (RS) utility has two main value measurements: users seeing content that they engage positively with, and the content providers maximizing engagement with their content or platform. While the two are evidently correlated (i.e. a user who is not properly catered to will likely cease to use the platform), the latter provides motivation for recommendation algorithms to shift a user's preferences to make them easier to cater to, resulting in higher expectations of long-term engagement [1]. Previous research [2] on the relationship between recom-mender systems and American political typology suggests that users with more extreme political preferences exhibit higher engagement metrics with their recommended news. Additionally, it was found that their engagement can be maximized by recommending articles among which a dominant percentage express a singular partisan bias. This establishes an implicit incentive for a News Recommender System (NRS) to shift user preferences toward political extremes through selection bias, particularly in long-term value systems or those leveraging popularity [1]. This phenomenon results in the formation of filter bubbles, where users are eventually shown only perspectives in their news which comply with their preexisting opinions, and users with heterogeneous partisanship over distinct topics have their political ideology homogenized over time.
- North America > United States (1.00)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Middle East > Syria (0.04)
- (3 more...)
- Research Report (0.50)
- Overview (0.34)
- Government > Regional Government > North America Government > United States Government (1.00)
- Health & Medicine > Therapeutic Area (0.68)
- Media (0.68)
- Government > Military (0.67)
A Typology of Synthetic Datasets for Dialogue Processing in Clinical Contexts
Bedrick, Steven, Doğruöz, A. Seza, Nisioi, Sergiu
Synthetic data sets are used across linguistic domains and NLP tasks, particularly in scenarios where authentic data is limited (or even non-existent). One such domain is that of clinical (healthcare) contexts, where there exist significant and long-standing challenges (e.g., privacy, anonymization, and data governance) which have led to the development of an increasing number of synthetic datasets. One increasingly important category of clinical dataset is that of clinical dialogues which are especially sensitive and difficult to collect, and as such are commonly synthesized. While such synthetic datasets have been shown to be sufficient in some situations, little theory exists to inform how they may be best used and generalized to new applications. In this paper, we provide an overview of how synthetic datasets are created, evaluated and being used for dialogue related tasks in the medical domain. Additionally, we propose a novel typology for use in classifying types and degrees of data synthesis, to facilitate comparison and evaluation.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
- North America > Canada > Ontario > Toronto (0.05)
- Europe > Middle East > Malta > Eastern Region > Northern Harbour District > St. Julian's (0.05)
- (20 more...)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Health Care Providers & Services (0.67)
- Health & Medicine > Consumer Health (0.67)
- Health & Medicine > Health Care Technology > Medical Record (0.47)
Decoding street network morphologies and their correlation to travel mode choice
Riascos-Goyes, Juan Fernando, Lowry, Michael, Guarín-Zapata, Nicolás, Ospina, Juan P.
Urban morphology has long been recognized as a factor shaping human mobility, yet comparative and formal classifications of urban form across metropolitan areas remain limited. Building on theoretical principles of urban structure and advances in unsupervised learning, we systematically classified the built environment of nine U.S. metropolitan areas using structural indicators such as density, connectivity, and spatial configuration. The resulting morphological types were linked to mobility patterns through descriptive statistics, marginal effects estimation, and post hoc statistical testing. Here we show that distinct urban forms are systematically associated with different mobility behaviors, such as reticular morphologies being linked to significantly higher public transport use (marginal effect = 0.49) and reduced car dependence (-0.41), while organic forms are associated with increased car usage (0.44), and substantial declines in public transport (-0.47) and active mobility (-0.30). These effects are statistically robust (p < 1e-19), highlighting that the spatial configuration of urban areas plays a fundamental role in shaping transportation choices. Our findings extend previous work by offering a reproducible framework for classifying urban form and demonstrate the added value of morphological analysis in comparative urban research. These results suggest that urban form should be treated as a key variable in mobility planning and provide empirical support for incorporating spatial typologies into sustainable urban policy design.
- North America > United States > New York > New York County > New York City (0.14)
- North America > United States > Massachusetts > Suffolk County > Boston (0.14)
- North America > United States > North Carolina > Wake County > Cary (0.14)
- (19 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Transportation > Infrastructure & Services (1.00)
- Transportation > Ground > Road (1.00)
Exploring the In-Context Learning Capabilities of LLMs for Money Laundering Detection in Financial Graphs
Abstract--The complexity and inter-connectivity of entities involved in money laundering demand investigative reasoning over graph-structured data. This paper explores the use of large language models (LLMs) as reasoning engines over localized subgraphs extracted from a financial knowledge graph. We propose a lightweight pipeline that retrieves k-hop neighborhoods around entities of interest, serializes them into structured text, and prompts an LLM via few-shot in-context learning to assess suspiciousness and generate justifications. Using synthetic anti-money laundering (AML) scenarios that reflect common laundering behaviors, we show that LLMs can emulate analyst-style logic, highlight red flags, and provide coherent explanations. While this study is exploratory, it illustrates the potential of LLM-based graph reasoning in AML and lays groundwork for explainable, language-driven financial crime analytics.
- Law Enforcement & Public Safety > Fraud (1.00)
- Banking & Finance (1.00)
AMLgentex: Mobilizing Data-Driven Research to Combat Money Laundering
Östman, Johan, Callisen, Edvin, Chen, Anton, Ausmees, Kristiina, Gårdh, Emanuel, Zamac, Jovan, Goldsteine, Jolanta, Wefer, Hugo, Whelan, Simon, Reimegård, Markus
Money laundering enables organized crime by moving illicit funds into the legitimate economy. Although trillions of dollars are laundered each year, detection rates remain low because launderers evade oversight, confirmed cases are rare, and institutions see only fragments of the global transaction network. Since access to real transaction data is tightly restricted, synthetic datasets are essential for developing and evaluating detection methods. However, existing datasets fall short: they often neglect partial observability, temporal dynamics, strategic behavior, uncertain labels, class imbalance, and network-level dependencies. We introduce AMLGentex, an open-source suite for generating realistic, configurable transaction data and benchmarking detection methods. AMLGentex enables systematic evaluation of anti-money laundering systems under conditions that mirror real-world challenges. By releasing multiple country-specific datasets and practical parameter guidance, we aim to empower researchers and practitioners and provide a common foundation for collaboration and progress in combating money laundering.
- North America > United States (0.93)
- Europe > Sweden (0.14)
- Europe > Poland (0.14)
- (3 more...)
The meaning of prompts and the prompts of meaning: Semiotic reflections and modelling
Thellefsen, Martin, Dewi, Amalia Nurma, Sorensen, Bent
This paper explores prompts and prompting in large language models (LLMs) as dynamic semiotic phenomena, drawing on Peirce's triadic model of signs, his nine sign types, and the Dynacom model of communication. The aim is to reconceptualize prompting not as a technical input mechanism but as a communicative and epistemic act involving an iterative process of sign formation, interpretation, and refinement. The theoretical foundation rests on Peirce's semiotics, particularly the interplay between representamen, object, and interpretant, and the typological richness of signs: qualisign, sinsign, legisign; icon, index, symbol; rheme, dicent, argument - alongside the interpretant triad captured in the Dynacom model. Analytically, the paper positions the LLM as a semiotic resource that generates interpretants in response to user prompts, thereby participating in meaning-making within shared universes of discourse. The findings suggest that prompting is a semiotic and communicative process that redefines how knowledge is organized, searched, interpreted, and co-constructed in digital environments. This perspective invites a reimagining of the theoretical and methodological foundations of knowledge organization and information seeking in the age of computational semiosis
- North America > United States > Indiana (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
Annotating Errors in English Learners' Written Language Production: Advancing Automated Written Feedback Systems
Coyne, Steven, Galvan-Sosa, Diana, Spring, Ryan, Guerraoui, Camélia, Zock, Michael, Sakaguchi, Keisuke, Inui, Kentaro
Recent advances in natural language processing (NLP) have contributed to the development of automated writing evaluation (AWE) systems that can correct grammatical errors. However, while these systems are effective at improving text, they are not optimally designed for language learning. They favor direct revisions, often with a click-to-fix functionality that can be applied without considering the reason for the correction. Meanwhile, depending on the error type, learners may benefit most from simple explanations and strategically indirect hints, especially on generalizable grammatical rules. To support the generation of such feedback, we introduce an annotation framework that models each error's error type and generalizability. For error type classification, we introduce a typology focused on inferring learners' knowledge gaps by connecting their errors to specific grammatical patterns. Following this framework, we collect a dataset of annotated learner errors and corresponding human-written feedback comments, each labeled as a direct correction or hint. With this data, we evaluate keyword-guided, keyword-free, and template-guided methods of generating feedback using large language models (LLMs). Human teachers examined each system's outputs, assessing them on grounds including relevance, factuality, and comprehensibility. We report on the development of the dataset and the comparative performance of the systems investigated.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- (13 more...)
Exploring Agentic Artificial Intelligence Systems: Towards a Typological Framework
Wissuchek, Christopher, Zschech, Patrick
Artificial intelligence (AI) systems are evolving beyond passive tools into autonomous agents capable of reasoning, adapting, and acting with minimal human intervention. Despite their growing presence, a structured framework is lacking to classify and compare these systems . This paper develops a typology of agentic AI systems, introducing eight dimensions that define their cognitive and environmental agency in an ordinal structure. Using a multi - phase methodological approach, we construct and refine this typology, which is then evaluated through a human - AI hybrid approach and further distilled into constructed types. The framework enables researchers and practitioners to analyze varying levels of agency in AI systems. By offering a structured perspective on the progression o f AI capabilities, the typology provides a foundation for assessing current systems and anticipating future developments in agentic AI.
- Asia > Malaysia > Kuala Lumpur > Kuala Lumpur (0.05)
- Europe > Germany > Saxony > Leipzig (0.04)
- Europe > Germany > Saxony > Dresden (0.04)
- (4 more...)
- Overview (0.93)
- Workflow (0.92)
- Research Report > New Finding (0.67)
- Information Technology (0.68)
- Government (0.46)