Collaborating Authors

Data Science

66 data science teams compete in challenge to help reopen Los Angeles


California is one of the hardest-hit states when it comes to coronavirus with more than 200,000 total cases. Data scientists seeking ways to help the state reopen the economy participated in a two-week 2020 COVID-19 Computational Challenge (CCC) in mid-June. The challenge was to provide guidance for risk mitigation for Los Angeles County. Additionally, the solution "must incorporate the ethical protection of individual data and respect data privacy norms." The winning teams revealed location-based COVID-19 exposure at different L.A. communities, developed apps for people to calculate their potential for infection, and delivered applicable data-driven recommendations along with L.A.'s reopening stages, officials said.

Global Big Data Conference


Fraym is using artificial intelligence and machine learning to help aid organizations in Africa and South Asia identify populations at risk due to Covid-19 using new geospatial visualizations. Fraym identifies high-risk populations and how to best communicate with them – making it an invaluable tool for more than 40 organizations and governments fighting the pandemic, including the Nigerian CDC, Kenyan presidential office, Zambian public health policymakers and aid organizations in Pakistan. Fraym has mapped communities based on concentrations of common transmission variables and then combined this with data from household surveys and remote sensing data, to then understand how these individuals consume news at a hyper-local level. The company is providing this information, which is at a 1-square kilometer level, for free to help fight the spread of Covid19. Since March 2020, Fraym has produced more than 300 COVID-19 related data layers in nearly 20 different countries.

7 Open Source Data Science Projects


My aim, as always, was to keep the projects as diverse as possible so you can pick the ones that fit into your data science journey. If you're a beginner, I would suggest starting with the PalmerPenguins dataset as most folks aren't even aware of it right now. A great chance to get a head start. I would love to hear your thoughts on which open source project you found the most useful. Or let me know if you want me to feature any other data science projects here or in next month's edition.

More AI, ease of use will shape Sisense analytics platform


The Sisense analytics platform is known for its augmented analytics capabilities and ease of use, and as it moves forward it will do so with a new leader in charge of its product development. Just over a year after its acquisition of Periscope Data, a purchase that added capabilities aimed at data scientists to the features geared toward business users Sisense was already know for, the New York-based vendor is focused on third-generation analytics in which AI and business intelligence embedded throughout the workflow will be prominent. Most recently, Sisense updated its analytics platform with new natural language query capabilities and introduced Knowledge Graph, a graph analytics engine the vendor developed that was trained on more than 650 billion past analytic events and informs the machine learning capabilities of the query tool. Now, to help shape its vision, Sisense has added Ashley Kramer as its first chief product officer. Kramer began her career as a software engineering manager at NASA.

Feature Engineering in SQL and Python: A Hybrid Approach - KDnuggets


I knew SQL long before learning about Pandas, and I was intrigued by the way Pandas faithfully emulates SQL. Stereotypically, SQL is for analysts, who crunch data into informative reports, whereas Python is for data scientists, who use data to build (and overfit) models. Although they are almost functionally equivalent, I'd argue both tools are essential for a data scientist to work efficiently. From my experience with Pandas, I've noticed the following: Those problems are naturally solved when I began feature engineering directly in SQL. If you know a little bit of SQL, it's time to put it into good use.

Top 10 Big Data Startups in the United States to Watch In 2020


Data is growing by leaps and bounds, the convergence of extremely large data sets both structured and unstructured define Big Data. The increasing awareness of the Internet of Things (IoT) devices among organizations and volume, variety, velocity and veracity at which data is generated have caught the attention of the enterprise in a bid to enhance digital technologies and guide digital transformation. Analytics Insights eliminates that the big data market size will grow at a CAGR of 10.9%, globally from US$ 193.5 billion in 2020 to US$ 301.5 billion by 2023. This region is witnessing significant developments in the big data market gaining remarkable traction in the BFSI industry vertical. Numerai is the world's first hedge fund, to predict the stock market.

Apple Data Science Interview Questions


Apple Inc. is one of the biggest technology companies in the world that designs, develops, and sells consumer electronics, computer software, and online services. Apple is constantly in need of creative, passionate, and dedicated data scientists that can sit on any number of their teams. From its researched-based artificial intelligence development team at Siri to cloud-base architecture development team at iCloud, Apple has slowly but steadily been building data science teams to handle the avalanche of data accumulated on a daily basis. As with other big tech companies, the role of a data scientist at Apple varies a lot and is dependent on the teams you are assigned to. This means the job will require everything from analytics to machine learning software design to plain engineering.

Databricks Contributes MLflow Machine Learning Platform to The Linux Foundation


Databricks, the company behind big data processing and analytics engine Apache Spark, contributes open source machine learning platform MLflow …

Defining data science, machine learning, and artificial intelligence


With the ever-increasing volume, variety, and velocity of available data, scientific disciplines have provided us with advanced mathematical tools, processes, and algorithms enabling us to use this data in meaningful ways. Data science (DS), machine learning (ML), and artificial intelligence (AI) are three such disciplines. A question that frequently comes up in many data-related discussions is what the difference between DS, ML, and AI is? Can they even be compared? Depending on who you talk to, how many years of experience they have had, and what projects they have worked on, you may get widely different answers to the above question. In this blog, I will attempt to answer this based on my research, academic, and industry experience; and having facilitated numerous conversations on the topic.