Goto

Collaborating Authors

Data Science


Understanding satellite images: a data mining module for Sentinel images

#artificialintelligence

The Copernicus Access Platform Intermediate Layers Small Scale Demonstrator (CANDELA) project is a European Horizon 2020 research and innovation project for easy interactive analysis of satellite images on a web platform. Among its objectives are the development of efficient data retrieval and image mining methods augmented with machine learning techniques as well as interoperability capabilities in order to fully benefit from the available assets, the creation of additional value, and subsequently economic growth and development in the European member states (Candela, 2019). The potential target groups of users of the CANDELA platform are: space industries and data professionals, data scientists, end users (e.g., governmental and local authorities), and researchers in the areas covered by the project use cases (e.g., urban expansion and agriculture, forest and vineyard monitoring, and assessment of natural disasters) (Candela, 2019). In our case, this activity will generate a large geographical and temporal volume of EO data to be ingested into the data analytics building blocks. Activity 2: Tools for the fusion of various multi-sensor Earth observation satellite data (comprising, besides Sentinel, also several other contributing missions) with in-situ data and additional information from the web such as social networks or Open Data, in order to pave the way for new applications and services.


Essential data science skills that no one talks about.

#artificialintelligence

The top results are long lists of technical terms, named hard skills. Python, algebra, statistics, and SQL are some of the most popular ones. Later, there come soft skills -- communication, business acumen, team player, etc. Let's pretend that you are a super-human possessing all the above abilities. You code from the age of five, you are a Kaggle grandmaster and your conference papers are guaranteed to get a best-paper award. There is still a very high chance that your projects struggle to reach maturity and become full-fledged commercial products. Recent studies estimate that more than 85% of data science projects fail to reach production. The studies provide numerous reasons for the failures.


How AI & Data Analytics Can Solve Supply Chain Pitfalls

#artificialintelligence

The supply chain is an ecosystem that affects businesses around the world, and the COVID-19 pandemic has thrown a monkey wrench into this previously undisturbed process. With region-specific restrictions, limited supply of certain goods, and a constantly changing consumer mindset, almost all businesses are playing catch up in addressing the needs of every consumer. Add to that the oil price war and the result is near chaos for both consumers and businesses. It may be a gamble to implement new supply chain systems in these circumstances, but it's a bet that could pay dividends not just now but in the long term. Artificial intelligence (AI) and data analytics tools can provide the much-needed push companies need to keep their businesses afloat--and maybe even thrive--despite the global crisis.


The Evolution of Data Science … As I Remember It

#artificialintelligence

Those who cannot remember the past are condemned to repeat it. It's written by anyone with the will to write it down and the forum to distribute it. It's valuable to understand different perspectives and the contexts that created them. The evolution of the term Data Science is a good example. I learned statistics in the 1970s in a department of behavioral scientists and educators rather than a department of mathematics. At that time, the image of statistics was framed by academic mathematical-statisticians. They wrote the textbooks and controlled the jargon. Applied statisticians were the silent majority, a sizable group overshadowed by the academic celebrities. For me, reading Tukey's 1977 book Exploratory Data Analysis was a revelation.


A Tour of Dependable Computing Research in Latin America

Communications of the ACM

Computing technology has become pervasive and with it the expectation for its ready availability when needed, thus basically at all times. Dependability is the set of techniques to build, configure, operate, and manage computer systems to ensure that they are reliable, available, safe, and secure.1 But alas, faults seem to be inherent to computer systems. Components can simply crash or produce incorrect output due to hardware or software bugs or can be invaded by impostors that orchestrate their behavior. Fault tolerance is the ability to enable a system as a whole to continue operating correctly and with acceptable performance, even if some of its components are faulty.3 Fault tolerance is not new; von Neumann himself designed techniques for computers to survive faults.4


Digital Healthcare in Latin America

Communications of the ACM

The healthcare system in Latin America (LATAM) has made significant improvements in the last few decades. Nevertheless, it still faces significant challenges, including poor access to healthcare services, insufficient resources, and inequalities in health that may lead to decreased life expectancy, lower quality of life, and poor economic growth. Digital Healthcare (DH) enables the convergence of innovative technology with recent advances in neuroscience, medicine, and public healthcare policy.a In this article, we discuss key DH efforts that can help address some of the challenges of the healthcare system in LATAM focusing on two countries: Brazil and Mexico. We chose to study DH in the context of Brazil and Mexico as both countries are good representatives of the situation of the healthcare system in LATAM and face similar challenges along with other LATAM countries. Brazil and Mexico have the largest economies in the region and account for approximately half of the population and geographic territory of LATAM.11


Chile's New Interdisciplinary Institute for Foundational Research on Data

Communications of the ACM

The Millennium Institute for Foundational Research on Dataa (IMFD) started its operations in June 2018, funded by the Millennium Science Initiative of the Chilean National Agency of Research and Development.b IMFD is a joint initiative led by Universidad de Chile and Universidad Católica de Chile, with the participation of five other Chilean universities: Universidad de Concepción, Universidad de Talca, Universidad Técnica Federico Santa María, Universidad Diego Portales, and Universidad Adolfo Ibáñez. IMFD aims to be a reference center in Latin America related to state-of-the-art research on the foundational problems with data, as well as its applications to tackling diverse issues ranging from scientific challenges to complex social problems. As tasks of this kind are interdisciplinary by nature, IMFD gathers a large number of researchers in several areas that include traditional computer science areas such as data management, Web science, algorithms and data structures, privacy and verification, information retrieval, data mining, machine learning, and knowledge representation, as well as some areas from other fields, including statistics, political science, and communication studies. IMFD currently hosts 36 researchers, seven postdoctoral fellows, and more than 100 students.


Using Data and Respecting Users

Communications of the ACM

Transaction data is like a friendship tie: both parties must respect the relationship and if one party exploits it the relationship sours. As data becomes increasingly valuable, firms must take care not to exploit their users or they will sour their ties. Ethical uses of data cover a spectrum: at one end, using patient data in healthcare to cure patients is little cause for concern. At the other end, selling data to third parties who exploit users is serious cause for concern.2 Between these two extremes lies a vast gray area where firms need better ways to frame data risks and rewards in order to make better legal and ethical choices.


Council Post: Lack Of Cybersecurity Consideration Could Upend Industry 4.0

#artificialintelligence

Industry 4.0 signifies a seismic shift in the way the modern factories and industrial systems operate. They consist of large-scale integration across an entire ecosystem where data inside and outside the organization converges to create new products, predict market demands and reinvent the value chain. In Industry 4.0, we see the convergence of information technology (IT) and operational technology (OT) at scale. The convergence of IT/OT is pushing the boundaries of conventional corporate security strategies where the focus has always been placed on protecting networks, systems, applications and processed data involving people and information. In the context of manufacturing industries with smart factories and industrial systems, robotics, sensor technology, 3D printing, augmented reality, artificial intelligence, machine learning and big data platforms work in tandem to deliver breakthrough efficiencies.


ML Guide: Feature Store vs Data Warehouse - Logical Clocks

#artificialintelligence

TLDR; The feature store is a data warehouse of features for machine learning (ML). Architecturally, it differs from the traditional data warehouse in that it is a dual-database, with one database (row-oriented) serving features at low latency to online applications and the other database (column-oriented) storing large volumes of features, used by Data Scientists to create train/test datasets and by batch applications doing offline model scoring. Data warehouses democratized access to Enterprise data by centralizing data in a single platform and then empowering business analysts with visual tools, such as Tableau and Power BI. No longer did they need to know what data resides where and how to query that data in that platform. They could derive historical insights into the business using BI tools.