Goto

Collaborating Authors

 intermediate level


Neural Disaggregation via Spatially Coherent Architectures

arXiv.org Artificial Intelligence

Open data is frequently released spatially and temporally aggregated, usually to comply with privacy policies. Varying aggregation levels (e.g., zip code, census tract, city block) complicate the integration across variables needed to provide multi-variate training sets for downstream AI/ML systems. In this work, we consider models to disaggregate spatial data, learning a function from a low-resolution irregular partition (e.g., zip code) to s high-resolution irregular partition (e.g., city block). We propose a hierarchical architecture that aligns each geographic aggregation level with a layer in the network such that all aggregation levels can be learned simultaneously by including loss terms for all intermediate levels as well as the final output. We then consider additional loss terms that compare the re-aggregated output against ground truth to further improve performance. To balance the tradeoff between training time and accuracy, we consider three training regimes, including a layer-by-layer process that achieves competitive predictions with significantly reduced training time. For situations where limited historical training data is available, we study transfer learning scenarios and show that a model pre-trained on one city variable can be fine-tuned for another city variable using only a few hundred samples, highlighting the common dynamics among variables from the same built environment and underlying population. Evaluating these techniques on four datasets across two cities, three variables, and two application domains, we find that geographically coherent architectures provide a significant improvement over baseline models as well as typical heuristic methods, advancing our long-term goal of synthesizing any variable, at any location, at any resolution.


When Does Bottom-up Beat Top-down in Hierarchical Community Detection?

arXiv.org Artificial Intelligence

Hierarchical clustering of networks consists in finding a tree of communities, such that lower levels of the hierarchy reveal finer-grained community structures. There are two main classes of algorithms tackling this problem. Divisive ($\textit{top-down}$) algorithms recursively partition the nodes into two communities, until a stopping rule indicates that no further split is needed. In contrast, agglomerative ($\textit{bottom-up}$) algorithms first identify the smallest community structure and then repeatedly merge the communities using a $\textit{linkage}$ method. In this article, we establish theoretical guarantees for the recovery of the hierarchical tree and community structure of a Hierarchical Stochastic Block Model by a bottom-up algorithm. We also establish that this bottom-up algorithm attains the information-theoretic threshold for exact recovery at intermediate levels of the hierarchy. Notably, these recovery conditions are less restrictive compared to those existing for top-down algorithms. This shows that bottom-up algorithms extend the feasible region for achieving exact recovery at intermediate levels. Numerical experiments on both synthetic and real data sets confirm the superiority of bottom-up algorithms over top-down algorithms. We also observe that top-down algorithms can produce dendrograms with inversions. These findings contribute to a better understanding of hierarchical clustering techniques and their applications in network analysis.


Ten Data Science Books That Are Worth Reading in 2022

#artificialintelligence

With exponential growth over the past years, the data science field has become very popular in the IT sector. Many businesses have started adopting data science techniques in order to derive meaningful information to make precise business decisions. Because of this data science has become an in-demand skill and one of the most highly paid careers in the tech industry. In order to be a successful business data scientist, it is crucial to understand and know how to use complex algorithms to build models, manipulate different datasets found from various sources, and be able to analyze and present findings to non-technical audiences. With so many resources available one can use them to learn more about data science but nothing beats reading data science books.


Statistics With R - Intermediate Level

#artificialintelligence

If you want to learn how to perform the most useful statistical analyses in the R program, you have come to the right place. Now you don't have to scour the web endlessly in order to find how to do a Pearson or Spearman correlation, an independent t test or a factorial ANOVA, how to perform a sequential regression analysis or how to compute the Cronbach's alpha. Everything is here, in this course, explained visually, step by step. So, what will you learn in this course? First of all, you will learn how to perform association tests in R, both parametric and non-parametric: the Pearson correlation, the Spearman and Kendall correlation, the partial correlation and the chi-square test for independence.


A guide to the field of Deep Learning

#artificialintelligence

Since the list has gotten rather long, I have included an excerpt above; the full list is at the bottom of this post. At the entry level, the datasets used are small. Often, they easily fit into the main memory. If they don't already come pre-processed then it's only a few lines of code to apply such operations. Mainly you'll do so for the major domains Audio, Image, Time-series, and Text. Before diving into the large field of Deep Learning it's a good choice to study the basic techniques.


A checklist to track your Machine Learning progress

#artificialintelligence

Have you ever asked yourself where you currently are on your Machine Learning journey? And whatโ€™s there that you can still learn about? This checklist helps you answer such questions. It provides anโ€ฆ


A Minesweeper Solver Using Logic Inference, CSP and Sampling

arXiv.org Artificial Intelligence

Minesweeper as a puzzle video game and is proved that it is an NPC problem. We use CSP, Logic Inference and Sampling to make a minesweeper solver and we limit us each select in 5 seconds.


Looking for Machine Learning Talent Among Data Scientists

#artificialintelligence

Data scientists have a variety of different skills that they bring to bear on Big Data projects. One valuable skill that is becoming popular in data science is machine learning. Machine learning is a method of data analysis that automates model building that allows computers to find hidden insights without being explicitly programmed to find a particular insight. Machine learning can be applied to data to help businesses quickly find clusters of similar objects (e.g., identify segments of customers) and to predict outcomes (e.g., identify customers who are at-risk of churning). While machine learning is a hot skill to possess, a recent study by Evans Data Corp. found that about a third of developers (36%) who are working on Big Data projects employ elements of machine learning.


Looking for Machine Learning Talent Among Data Scientists

#artificialintelligence

Data scientists have a variety of different skills that they bring to bear on Big Data projects. One valuable skill that is becoming popular in data science is machine learning. Machine learning is a method of data analysis that automates model building that allows computers to find hidden insights without being explicitly programmed to find a particular insight. Machine learning can be applied to data to help businesses quickly find clusters of similar objects (e.g., identify segments of customers) and to predict outcomes (e.g., identify customers who are at-risk of churning). While machine learning is a hot skill to possess, a recent study by Evans Data Corp. found that about a third of developers (36%) who are working on Big Data projects employ elements of machine learning.


Looking for Machine Learning Talent Among Data Scientists

#artificialintelligence

Data scientists have a variety of different skills that they bring to bear on Big Data projects. One valuable skill that is becoming popular in data science is machine learning. Machine learning is a method of data analysis that automates model building that allows computers to find hidden insights without being explicitly programmed to find a particular insight. Machine learning can be applied to data to help businesses quickly find clusters of similar objects (e.g., identify segments of customers) and to predict outcomes (e.g., identify customers who are at-risk of churning). While machine learning is a hot skill to possess, a recent study by Evans Data Corp. found that about a third of developers (36%) who are working on Big Data projects employ elements of machine learning.