Goto

Collaborating Authors

 tract


TrAct: Making First-layer Pre-Activations Trainable

Neural Information Processing Systems

We consider the training of the first layer of vision models and notice the clear relationship between pixel values and gradient update magnitudes: the gradients arriving at the weights of a first layer are by definition directly proportional to (normalized) input pixel values. Thus, an image with low contrast has a smaller impact on learning than an image with higher contrast, and a very bright or very dark image has a stronger impact on the weights than an image with moderate brightness. In this work, we propose performing gradient descent on the embeddings produced by the first layer of the model. However, switching to discrete inputs with an embedding layer is not a reasonable option for vision models. Thus, we propose the conceptual procedure of (i) a gradient descent step on first layer activations to construct an activation proposal, and (ii) finding the optimal weights of the first layer, i.e., those weights which minimize the squared distance to the activation proposal. We provide a closed form solution of the procedure and adjust it for robust stochastic training while computing everything efficiently. Empirically, we find that TrAct (Training Activations) speeds up training by factors between 1.25x and 4x while requiring only a small computational overhead. We demonstrate the utility of TrAct with different optimizers for a range of different vision models including convolutional and transformer architectures.



From Hubs to Deserts: Urban Cultural Accessibility Patterns with Explainable AI

Pranto, Protik Bose, Islam, Minhazul, Saha, Ripon Kumar, Rivera, Abimelec Mercado, Abbasov, Namig

arXiv.org Artificial Intelligence

Cultural infrastructures, such as libraries, museums, theaters, and galleries, support learning, civic life, health, and local economies, yet access is uneven across cities. We present a novel, scalable, and open-data framework to measure spatial equity in cultural access. We map cultural infrastructures and compute a metric called Cultural Infrastructure Accessibility Score (CIAS) using exponential distance decay at fine spatial resolution, then aggregate the score per capita and integrate socio-demographic indicators. Interpretable tree-ensemble models with SHapley Additive exPlanation (SHAP) are used to explain associations between accessibility, income, density, and tract-level racial/ethnic composition. Results show a pronounced core-periphery gradient, where non-library cultural infrastructures cluster near urban cores, while libraries track density and provide broader coverage. Non-library accessibility is modestly higher in higher-income tracts, and library accessibility is slightly higher in denser, lower-income areas.


Investigating the Association Between Text-Based Indications of Foodborne Illness from Yelp Reviews and New York City Health Inspection Outcomes (2023)

Shaveet, Eden, Su, Crystal, Hsu, Daniel, Gravano, Luis

arXiv.org Artificial Intelligence

Foodborne illnesses are gastrointestinal conditions caused by consuming contaminated food. Restaurants are critical venues to investigate outbreaks because they share sourcing, preparation, and distribution of foods. Public reporting of illness via formal channels is limited, whereas social media platforms host abundant user-generated content that can provide timely public health signals. This paper analyzes signals from Yelp reviews produced by a Hierarchical Sigmoid Attention Network (HSAN) classifier and compares them with official restaurant inspection outcomes issued by the New York City Department of Health and Mental Hygiene (NYC DOHMH) in 2023. We evaluate correlations at the Census tract level, compare distributions of HSAN scores by prevalence of C-graded restaurants, and map spatial patterns across NYC. We find minimal correlation between HSAN signals and inspection scores at the tract level and no significant differences by number of C-graded restaurants. We discuss implications and outline next steps toward address-level analyses.



Downscaling human mobility data based on demographic socioeconomic and commuting characteristics using interpretable machine learning methods

Jiang, Yuqin, Popov, Andrey A., Duan, Tianle, Li, Qingchun

arXiv.org Artificial Intelligence

Understanding urban human mobility patterns at various spatial levels is essential for social science. This study presents a machine learning framework to downscale origin-destination (OD) taxi trips flows in New York City from a larger spatial unit to a smaller spatial unit. First, correlations between OD trips and demographic, socioeconomic, and commuting characteristics are developed using four models: Linear Regression (LR), Random Forest (RF), Support Vector Machine (SVM), and Neural Networks (NN). Second, a perturbation-based sensitivity analysis is applied to interpret variable importance for nonlinear models. The results show that the linear regression model failed to capture the complex variable interactions. While NN performs best with the training and testing datasets, SVM shows the best generalization ability in downscaling performance. The methodology presented in this study provides both analytical advancement and practical applications to improve transportation services and urban development.


Revisiting Broken Windows Theory

Cui, Ziyao, Jiang, Erick, Sortisio, Nicholas, Wang, Haiyan, Chen, Eric, Rudin, Cynthia

arXiv.org Artificial Intelligence

We revisit the longstanding question of how physical structures in urban landscapes influence crime. Leveraging machine learning-based matching techniques to control for demographic composition, we estimate the effects of several types of urban structures on the incidence of violent crime in New York City and Chicago. We additionally contribute to a growing body of literature documenting the relationship between perception of crime and actual crime rates by separately analyzing how the physical urban landscape shapes subjective feelings of safety. Our results are twofold. First, in consensus with prior work, we demonstrate a "broken windows" effect in which abandoned buildings, a sign of social disorder, are associated with both greater incidence of crime and a heightened perception of danger. This is also true of types of urban structures that draw foot traffic such as public transportation infrastructure. Second, these effects are not uniform within or across cities. The criminogenic effects of the same structure types across two cities differ in magnitude, degree of spatial localization, and heterogeneity across subgroups, while within the same city, the effects of different structure types are confounded by different demographic variables. Taken together, these results emphasize that one-size-fits-all approaches to crime reduction are untenable and policy interventions must be specifically tailored to their targets.


Modeling Urban Food Insecurity with Google Street View Images

Li, David

arXiv.org Artificial Intelligence

F ood insecurity is a significant social and public health issue that plagues many urban metropolitan areas around the world. Existing approaches to identifying food insecurity rely primarily on qualitative and quantitative survey data, which is difficult to scale. This project seeks to explore the effectiveness of using street-level images in modeling food insecurity at the census tract level. T o do so, we propose a two-step process of feature extraction and gated attention for image aggregation. W e evaluate the effectiveness of our model by comparing against other model architectures, interpreting our learned weights, and performing a case study. While our model falls slightly short in terms of its predictive power, we believe our approach still has the potential to supplement existing methods of identifying food insecurity for urban planners and policymakers.


NY Real Estate Racial Equity Analysis via Applied Machine Learning

Chalavadi, Sanjana, Pastor, Andrei, Leitch, Terry

arXiv.org Artificial Intelligence

This study analyzes tract-level real estate ownership patterns in New York State (NYS) and New York City (NYC) to uncover racial disparities. We use an advanced race/ethnicity imputation model (LSTM+Geo with XGBoost filtering, validated at 89.2% accuracy) to compare the predicted racial composition of property owners to the resident population from census data. We examine both a Full Model (statewide) and a Name-Only LSTM Model (NYC) to assess how incorporating geospatial context affects our predictions and disparity estimates. The results reveal significant inequities: White individuals hold a disproportionate share of properties and property value relative to their population, while Black, Hispanic, and Asian communities are underrepresented as property owners. These disparities are most pronounced in minority-majority neighborhoods, where ownership is predominantly White despite a predominantly non-White population. Corporate ownership (LLCs, trusts, etc.) exacerbates these gaps by reducing owner-occupied opportunities in urban minority communities. We provide a breakdown of ownership vs. population by race for majority-White, -Black, -Hispanic, and -Asian tracts, identify those with extreme ownership disparities, and compare patterns in urban, suburban, and rural contexts. The findings underscore persistent racial inequity in property ownership, reflecting broader historical and socio-economic forces, and highlight the importance of data-driven approaches to address these issues.


TrAct: Making First-layer Pre-Activations Trainable

Neural Information Processing Systems

We consider the training of the first layer of vision models and notice the clear relationship between pixel values and gradient update magnitudes: the gradients arriving at the weights of a first layer are by definition directly proportional to (normalized) input pixel values. Thus, an image with low contrast has a smaller impact on learning than an image with higher contrast, and a very bright or very dark image has a stronger impact on the weights than an image with moderate brightness. In this work, we propose performing gradient descent on the embeddings produced by the first layer of the model. However, switching to discrete inputs with an embedding layer is not a reasonable option for vision models. Thus, we propose the conceptual procedure of (i) a gradient descent step on first layer activations to construct an activation proposal, and (ii) finding the optimal weights of the first layer, i.e., those weights which minimize the squared distance to the activation proposal. We provide a closed form solution of the procedure and adjust it for robust stochastic training while computing everything efficiently.