gini coefficient
Taiwan's economy is booming thanks to AI. Not everyone sees the benefits
Taiwan's economy is booming thanks to AI. For Li, an engineer at Taiwanese computer giant ASUS, the AI boom sweeping Taiwan has made it an exciting time to work in tech. Taiwan is a semiconductor powerhouse, producing about 90 percent of the most advanced chips used to power leading AI models such as ChatGPT and Claude. Still, Li worries that the spoils of Taiwan's AI windfall are not being shared equally. "Most industries unrelated to tech don't seem to be feeling the benefits, so it doesn't feel evenly distributed at the moment," Li said, explaining that many of his former classmates working outside of tech do not appear to be doing as well.
AEGIS: Authentic Edge Growth In Sparsity for Link Prediction in Edge-Sparse Bipartite Knowledge Graphs
Liu, Hugh Xuechen, Tatar, Kฤฑvanรง
Bipartite knowledge graphs in niche domains are typically data-poor and edge-sparse, which hinders link prediction. We introduce AEGIS (Authentic Edge Growth In Sparsity), an edge-only augmentation framework that resamples existing training edges -either uniformly simple or with inverse-degree bias degree-aware -thereby preserving the original node set and sidestepping fabricated endpoints. To probe authenticity across regimes, we consider naturally sparse graphs (game design pattern's game-pattern network) and induce sparsity in denser benchmarks (Amazon, MovieLens) via high-rate bond percolation. We evaluate augmentations on two complementary metrics: AUC-ROC (higher is better) and the Brier score (lower is better), using two-tailed paired t-tests against sparse baselines. On Amazon and MovieLens, copy-based AEGIS variants match the baseline while the semantic KNN augmentation is the only method that restores AUC and calibration; random and synthetic edges remain detrimental. On the text-rich GDP graph, semantic KNN achieves the largest AUC improvement and Brier score reduction, and simple also lowers the Brier score relative to the sparse control. These findings position authenticity-constrained resampling as a data-efficient strategy for sparse bipartite link prediction, with semantic augmentation providing an additional boost when informative node descriptions are available.
Computational Analysis of Conversation Dynamics through Participant Responsivity
Hughes, Margaret, Roy, Brandon, Poole-Dayan, Elinor, Roy, Deb, Kabbara, Jad
Growing literature explores toxicity and polarization in discourse, with comparatively less work on characterizing what makes dialogue prosocial and constructive. We explore conversational discourse and investigate a method for characterizing its quality built upon the notion of ``responsivity'' -- whether one person's conversational turn is responding to a preceding turn. We develop and evaluate methods for quantifying responsivity -- first through semantic similarity of speaker turns, and second by leveraging state-of-the-art large language models (LLMs) to identify the relation between two speaker turns. We evaluate both methods against a ground truth set of human-annotated conversations. Furthermore, selecting the better performing LLM-based approach, we characterize the nature of the response -- whether it responded to that preceding turn in a substantive way or not. We view these responsivity links as a fundamental aspect of dialogue but note that conversations can exhibit significantly different responsivity structures. Accordingly, we then develop conversation-level derived metrics to address various aspects of conversational discourse. We use these derived metrics to explore other conversations and show that they support meaningful characterizations and differentiations across a diverse collection of conversations.
Data for Inclusion: The Redistributive Power of Data Economics
While credit is often portrayed as the fuel of development, access to credi t is unevenly distributed -- not merely as a function of income or collateral, but increasingly as a function of data visibility. In this context, the core hypothesis of this paper is that data, when governed ethically and reused efficiently, operates as a re distributive economic asset. The idea that being poor is more expensive is not new; it has been conceptualized as the "poverty premium" -- where low - income individuals pay higher effective prices for credit, insurance, and other services (Carriรจre - Swallow & Haksar, 2019). Y et what has ch anged is the infrastructure of decision - making: creditworthiness is increasingly determined by algorithmic systems whose inputs are not equitably distributed. Individuals with limited credit histories or fragmented digital footprints remain invisible, not due to financial incapacity, but due to informational exclusion. This asymmetry is not merely a market failure -- it is a structural inequality encoded in data regimes. W e argue that positive credit data -- payment histories, utilization patterns, and account stability -- constitutes a nonrival input that, once generated, can be reused across institutions at near - zero marginal cost without diminishing its value (Jones & Tonetti, 2020; Acemoglu et al., 2023). However, the ability to extract value from such data remains highly uneven. In traditional credit markets, the absence of negative signals penalizes borrowers more than the presence of positive behavior benefits them.
Decomposing Representation Space into Interpretable Subspaces with Unsupervised Learning
Understanding internal representations of neural models is a core interest of mechanistic interpretability. Due to its large dimensionality, the representation space can encode various aspects about inputs. To what extent are different aspects organized and encoded in separate subspaces? Is it possible to find these ``natural'' subspaces in a purely unsupervised way? Somewhat surprisingly, we can indeed achieve this and find interpretable subspaces by a seemingly unrelated training objective. Our method, neighbor distance minimization (NDM), learns non-basis-aligned subspaces in an unsupervised manner. Qualitative analysis shows subspaces are interpretable in many cases, and encoded information in obtained subspaces tends to share the same abstract concept across different inputs, making such subspaces similar to ``variables'' used by the model. We also conduct quantitative experiments using known circuits in GPT-2; results show a strong connection between subspaces and circuit variables. We also provide evidence showing scalability to 2B models by finding separate subspaces mediating context and parametric knowledge routing. Viewed more broadly, our findings offer a new perspective on understanding model internals and building circuits.
The Urban Impact of AI: Modeling Feedback Loops in Next-Venue Recommendation
Mauro, Giovanni, Minici, Marco, Pappalardo, Luca
Next-venue recommender systems are increasingly embedded in location-based services, shaping individual mobility decisions in urban environments. While their predictive accuracy has been extensively studied, less attention has been paid to their systemic impact on urban dynamics. In this work, we introduce a simulation framework to model the human-AI feedback loop underpinning next-venue recommendation, capturing how algorithmic suggestions influence individual behavior, which in turn reshapes the data used to retrain the models. Our simulations, grounded in real-world mobility data, systematically explore the effects of algorithmic adoption across a range of recommendation strategies. We find that while recommender systems consistently increase individual-level diversity in visited venues, they may simultaneously amplify collective inequality by concentrating visits on a limited subset of popular places. This divergence extends to the structure of social co-location networks, revealing broader implications for urban accessibility and spatial segregation. Our framework operationalizes the feedback loop in next-venue recommendation and offers a novel lens through which to assess the societal impact of AI-assisted mobility-providing a computational tool to anticipate future risks, evaluate regulatory interventions, and inform the design of ethic algorithmic systems.
Holistic Evaluations of Topic Models
Topic models are gaining increasing commercial and academic interest for their ability to summarize large volumes of unstructured text. As unsupervised machine learning methods, they enable researchers to explore data and help general users understand key themes in large text collections. However, they risk becoming a 'black box', where users input data and accept the output as an accurate summary without scrutiny. This article evaluates topic models from a database perspective, drawing insights from 1140 BERTopic model runs. The goal is to identify trade-offs in optimizing model parameters and to reflect on what these findings mean for the interpretation and responsible use of topic models