Collaborating Authors


Mathematics: The Tao of Data Science · Harvard Data Science Review


Confucius once said, "Fish forget they live in water; people forget they live in the Tao" (Lin, 2007). Analogously, it may be easy for data scientists to forget they live in a world defined and permeated by mathematics. The two pieces, "Ten Research Challenge Areas in Data Science" by Jeannette M. Wing and "Challenges and Opportunities in Statistics and Data Science: Ten Research Areas" by Xuming He and Xihong Lin, provide an impressively complete list of data science challenges from luminaries in the field of data science. They have done an extraordinary job, so this response offers a complementary viewpoint from a mathematical perspective and evangelizes advanced mathematics as a key tool for meeting the challenges they have laid out. Notably, we pick up the themes of scientific understanding of machine learning and deep learning, computational considerations such as cloud computing and scalability, balancing computational and statistical considerations, and inference with limited data.

Stability Expanded, in Reality · Harvard Data Science Review


It is thought-provoking to read the pair of articles on 10 challenges in data science by Xuming He and Xihong Lin from a statistics perspective and Jeannette Wing from a computer science perspective. Unsurprisingly, there is a good overlap of important topics including multimodal and heterogenous data, data privacy, fairness and interpretability, and causal inference or reasoning. This overlap reflects and confirms the foundational and shared roles of statistics and computer science in data science, which is the merging of statistical and computing thinking in the context of solving domain problems. The challenges in both articles are presented as separate, not integrated, topics, and mostly decoupled from domain problems, possibly because of the mandate of "10 challenges." In my mind, the most exciting 10 challenges in data science are to solve 10 pressing real-world data problems with positive impacts. For example, how is data science going to help control covid-19 spread while allowing a healthy economy?

Ten Research Challenge Areas in Data Science · Harvard Data Science Review


To drive progress in the field of data science, we propose 10 challenge areas for the research community to pursue. Since data science is broad, with methods drawing from computer science, statistics, and other disciplines, and with applications appearing in all sectors, these challenge areas speak to the breadth of issues spanning science, technology, and society. We preface our enumeration with meta-questions about whether data science is a discipline. We then describe each of the 10 challenge areas. The goal of this article is to start a discussion on what could constitute a basis for a research agenda in data science, while recognizing that the field of data science is still evolving. Although data science builds on knowledge from computer science, engineering, mathematics, statistics, and other disciplines, data science is a unique field with many mysteries to unlock: fundamental scientific questions and pressing problems of societal importance.

Atlantis Highlights in Intelligent Systems


The proceedings series Atlantis Highlights in Intelligent Systems aims to publish high-quality peer-reviewed proceedings from conferences on research and applications in the field of intelligent systems. All proceedings in this series are open access, i.e. the articles published in them are immediately and permanently free to read, download, copy & distribute. Each volume is published under the CC BY-NC 4.0 user license which defines the permitted 3rd-party reuse of its articles. The online publication of each proceedings is sponsored by the conference organizers and hence no additional publication fees are required. Should you wish to publish a proceedings in this series, then please request a proceedings proposal form by sending an email to

Machine learning detects early signs of osteoarthritis


Osteoarthritis is the most common type of arthritis. Cartilage can sometimes wear down so much that the bones start to rub together. People with osteoarthritis can have joint pain, stiffness, or swelling. Some develop serious pain and disability from the disease. Doctors use a combination of medical history and lab or imaging tests to diagnose the condition.

How AI & Data Analytics Is Impacting Indian Legal System


In a survey conducted by Gurugram-based BML Munjal University (School of Law) in July 2020, it was found that about 42% of lawyers believed that in the next 3 to 5 years as much as 20% of regular, day-to-day legal works could be performed with technologies such as artificial intelligence. The survey also found that about 94% of law practitioners favoured research and analytics as to the most desirable skills in young lawyers. Earlier this year, Chief Justice of India SA Bobde, in no uncertain terms, underlined that the Indian judiciary must equip itself with incorporating artificial intelligence in its system, especially in dealing with document management and cases of repetitive nature. With more industries and professional sectors embracing AI and data analytics, the legal industry, albeit in a limited way, is no exception. According to the 2020 report of the National Judicial Data Grid, over the last decade, 3.7 million cases were pending across various courts in India, including high courts, district and taluka courts.

A Decade of Social Bot Detection

Communications of the ACM

On the morning of November 9, 2016, the world woke up to the shocking outcome of the U.S. Presidential election: Donald Trump was the 45th President of the United States of America. An unexpected event that still has tremendous consequences all over the world. Today, we know that a minority of social bots--automated social media accounts mimicking humans--played a central role in spreading divisive messages and disinformation, possibly contributing to Trump's victory.16,19 In the aftermath of the 2016 U.S. elections, the world started to realize the gravity of widespread deception in social media. Following Trump's exploit, we witnessed to the emergence of a strident dissonance between the multitude of efforts for detecting and removing bots, and the increasing effects these malicious actors seem to have on our societies.27,29 This paradox opens a burning question: What strategies should we enforce in order to stop this social bot pandemic? In these times--during the run-up to the 2020 U.S. elections--the question appears as more crucial than ever. Particularly so, also in light of the recent reported tampering of the electoral debate by thousands of AI-powered accounts.a What struck social, political, and economic analysts after 2016--deception and automation--has been a matter of study for computer scientists since at least 2010. Via a longitudinal analysis, we discuss the main trends of research in the fight against bots, the major results that were achieved, and the factors that make this never-ending battle so challenging. Capitalizing on lessons learned from our extensive analysis, we suggest possible innovations that could give us the upper hand against deception and manipulation. Studying a decade of endeavors in social bot detection can also inform strategies for detecting and mitigating the effects of other--more recent--forms of online deception, such as strategic information operations and political trolls.

Spatial-Temporal Block and LSTM Network for Pedestrian Trajectories Prediction Artificial Intelligence

Pedestrian trajectory prediction is a critical to avoid autonomous driving collision. But this prediction is a challenging problem due to social forces and cluttered scenes. Such human-human and human-space interactions lead to many socially plausible trajectories. In this paper, we propose a novel LSTM-based algorithm. We tackle the problem by considering the static scene and pedestrian which combine the Graph Convolutional Networks and Temporal Convolutional Networks to extract features from pedestrians. Each pedestrian in the scene is regarded as a node, and we can obtain the relationship between each node and its neighborhoods by graph embedding. It is LSTM that encode the relationship so that our model predicts nodes trajectories in crowd scenarios simultaneously. To effectively predict multiple possible future trajectories, we further introduce Spatio-Temporal Convolutional Block to make the network flexible. Experimental results on two public datasets, i.e. ETH and UCY, demonstrate the effectiveness of our proposed ST-Block and we achieve state-of-the-art approaches in human trajectory prediction.

Ethical Machine Learning in Health Care Artificial Intelligence

The use of machine learning (ML) in health care raises numerous ethical concerns, especially as models can amplify existing health inequities. Here, we outline ethical considerations for equitable ML in the advancement of health care. Specifically, we frame ethics of ML in health care through the lens of social justice. We describe ongoing efforts and outline challenges in a proposed pipeline of ethical ML in health, ranging from problem selection to post-deployment considerations. We close by summarizing recommendations to address these challenges.