Collaborating Authors


Social Network Analysis: From Graph Theory to Applications with Python


Social network analysis is the process of investigating social structures through the use of networks and graph theory. This article introduces data scientists to the theory of social networks, with a short introduction to graph theory and information spread. It dives into Python code with NetworkX constructing and implying social networks from real datasets. We'll start with a brief intro in network's basic components: nodes and edges. Nodes (A,B,C,D,E in the example) are usually representing entities in the network, and can hold self-properties (such as weight, size, position and any other attribute) and network-based properties (such as Degree- number of neighbours or Cluster- a connected component the node belongs to etc.).

Machine Learning : Handle missing data


In the previous blog post on Machine Learning, we saw how you can import a library into a GOOGLE COLAB and how you can run your first Machine learning program using the data in a CSV file. If you haven't read that post, here is the link. I urge you to read it first. In today's post on machine learning, I will explain how to work in your CSV file if there is no data/ missing data in a row, which means some of your rows contain blank space. I will explain this to you.

Forgetting in Deep Learning


Neural network models suffer from the phenomenon of catastrophic forgetting: a model can drastically lose its generalization ability on a task after being trained on a new task. This usually means a new task will likely override the weights that have been learned in the past (see Figure 1), and thus degrade the model performance for the past tasks. Without fixing this problem, a single neural network will not be able to adapt itself to a continuous learning scenario, because it forgets the existing information/knowledge when it learns new things. For realistic applications of deep learning, where continual learning can be crucial, catastrophic forgetting would need to be avoided. However, there is only limited study about catastrophic forgetting and its underlying causes.

Facebook and NYU trained an AI to estimate COVID outcomes


COVID-19 has infected more than 23 million Americans and killed 386,000 of them to date, since the global pandemic began last March. Complicating the public health response is the fact that we still know so little about how the virus operates -- such as why some patients remain asymptomatic while it ravages others. Effectively allocating resources like ICU beds and ventilators becomes a Sisyphean task when doctors can only guess as to who might recover and who might be intubated within the next 96 hours. However a trio of new machine learning algorithms developed by Facebook's AI division (FAIR) in cooperation with NYU Langone Health can help predict patient outcomes up to four days in advance using just a patient's chest x-rays. The models can, respectively, predict patient deterioration based on either a single X-ray or a sequence as well as determine how much supplemental oxygen the patient will likely need.

Researchers find race, gender, and style biases in art-generating AI systems


As research pushes the boundaries of what's possible with AI, the popularity of art created by algorithms -- generative art -- continues to grow. From creating paintings to inventing new art styles, AI-based generative art has been showcased in a range of applications. But a new study from researchers at Fujitsu investigates whether biases might creep into the AI tools used to create art. Leveraging models, they claim that current AI methods fail to take into account socioeconomic impacts and exhibit clear prejudices. In their work, the researchers surveyed academic papers, online platforms, and apps that generate art using AI, selecting examples that focused on simulating established art schools and styles.

Hot papers on arXiv from the past month – December 2020


Here are the most tweeted papers that were uploaded onto arXiv during December 2020. Results are powered by Arxiv Sanity Preserver. Abstract: Self-attention networks have revolutionized natural language processing and are making impressive strides in image analysis tasks such as image classification and object detection. Inspired by this success, we investigate the application of self-attention networks to 3D point cloud processing. We design self-attention layers for point clouds and use these to construct self-attention networks for tasks such as semantic scene segmentation, object part segmentation, and object classification.

How, When, and Why Should You Normalize / Standardize / Rescale Your Data?


Before diving into this topic, lets first start with some definitions. "Rescaling" a vector means to add or subtract a constant and then multiply or divide by a constant, as you would do to change the units of measurement of the data, for example, to convert a temperature from Celsius to Fahrenheit. "Normalizing" a vector most often means dividing by a norm of the vector. It also often refers to rescaling by the minimum and range of the vector, to make all the elements lie between 0 and 1 thus bringing all the values of numeric columns in the dataset to a common scale. "Standardizing" a vector most often means subtracting a measure of location and dividing by a measure of scale.

An Existential Crisis in Neuroscience - Issue 94: Evolving


This week we are reprinting our top stories of 2020. This article first appeared online in our "Maps" issue in January, 2020. On a chilly evening last fall, I stared into nothingness out of the floor-to-ceiling windows in my office on the outskirts of Harvard's campus. As a purplish-red sun set, I sat brooding over my dataset on rat brains. I thought of the cold windowless rooms in downtown Boston, home to Harvard's high-performance computing center, where computer servers were holding on to a precious 48 terabytes of my data. I have recorded the 13 trillion numbers in this dataset as part of my Ph.D. experiments, asking how the visual parts of the rat brain respond to movement. Printed on paper, the dataset would fill 116 billion pages, double-spaced. When I recently finished writing the story of my data, the magnum opus fit on fewer than two dozen printed pages. Performing the experiments turned out to be the easy part.

The NLP Cypher


Around five percent of papers from the conference were on graphs so lots to discuss. A new paper (with authors from every major big tech), was recently published showing how one can attack language models like GPT-2 and extract information verbatim like personal identifiable information from just by querying the model. The information extracted derived from the models' training data that was based on scraped internet info. This is a big problem especially when you train a language model on a private custom dataset. Looks like wants a new recommendation engine and they are offering up their dataset of over 1 million anonymized hotel reservations to get you in the game.

Response to Comment on "Meta-analysis reveals declines in terrestrial but increases in freshwater insect abundances"


Desquilbet et al. take issue with our data inclusion criteria and make several other dubious claims regarding data processing, analysis, and interpretation. Most of their concerns stem from disagreement on data inclusion criteria and analysis, misunderstanding of our goals, and unrealistic expectations. We maintain that our synthesis provides a state-of-the-art analysis of patterns of trends in insect abundances. In their Comment, Desquilbet et al. (1) argue for more rigorous methodology applied to broad-scale syntheses of biodiversity trends. They claim that a large proportion of the datasets used in our meta-analysis (2) are flawed.