Computers have become adept at extracting patterns from very large collections of data. For example, shopping transactions can reveal consumers' preferences and message traffic on social networks can reveal political trends.
Did you miss a session from the Future of Work Summit? This article was contributed by Hassan Lâasri, consultant in data strategy, data governance, and data activation. Since McKinsey's report on big data in May 2011, we have entered an era where virtually everything we do on this planet is designed and digitized to generate data, consume it, or both. Recent projects, including the Metaverse, want to translate the earth into a virtual data planet. Since that report, data has been considered to be a strategic asset in any company whose business depends on data -- not just Google, Amazon, Meta, Apple, and Microsoft, all of which have paved the way.
Data Science (DS) and Machine Learning (ML) are the spines of today's data-driven business decision-making. From a human viewpoint, ML often consists of multiple phases: from gathering requirements and datasets to deploying a model, and to support human decision-making--we refer to these stages together as DS/ML Lifecycle. There are also various personas in the DS/ML team and these personas must coordinate across the lifecycle: stakeholders set requirements, data scientists define a plan, and data engineers and ML engineers support with data cleaning and model building. Later, stakeholders verify the model, and domain experts use model inferences in decision making, and so on. Throughout the lifecycle, refinements may be performed at various stages, as needed. It is such a complex and time-consuming activity that there are not enough DS/ML professionals to fill the job demands, and as much as 80% of their time is spent on low-level activities such as tweaking data or trying out various algorithmic options and model tuning. These two challenges -- the dearth of data scientists, and time-consuming low-level activities -- have stimulated AI researchers and system builders to explore an automated solution for DS/ML work: Automated Data Science (AutoML). Several AutoML algorithms and systems have been built to automate the various stages of the DS/ML lifecycle. For example, the ETL (extract/transform/load) task has been applied to the data readiness, pre-processing & cleaning stage, and has attracted research attention.
At Databricks, we have had the opportunity to help thousands of organizations modernize their data architectures to be cloud-first and extract value from their data at scale with analytics and AI. Over the past few years, we've been fortunate to engage directly with customers across industries and regions about their data-driven aspirations – and the roadblocks that slow down their ability to get there. While challenges vary greatly among industries and even individual organizations, we have developed a rich understanding of the top four habits of data and AI-driven organizations. Before diving into the habits, let's take a quick look at how organizations have approached enabling data strategies. First, data teams have made technology decisions over time that propel a way of thinking that is based around technology stacks: data warehousing, data engineering, streaming real-time data science, and machine learning.
There is enormous interest in and momentum around using AI to reduce the need for human monitoring while improving enterprise security. Machine learning and other techniques are used for behavioral threat analytics, anomaly detection and reducing false-positive alerts. At the same time, private and nation-state cybercriminals are applying AI to the other side of the security coin. Artificial intelligence is used to find vulnerabilities, shape exploits and conduct targeted attacks. How does an enterprise protect the tools it is building and secure those it is running during the production process?
AI is the simulation of human intelligence by computers. By applying machine learning algorithms, we can make'intelligent' machines, which can employ cognitive reasoning to make decisions based on the data fed to them. Big Data, on the other hand, is a blanket term for computational strategies and techniques applied to large datasets to mine information from them. BD technology includes capturing and storing the data, and then analyzing it to make strategic decisions and improve business outcomes. Most companies deploy bigdata and AI in silos to structure their existing data sets and to develop machines which can think for themselves.
For many organizations, real-time data collection and data processing at scale can provide immense advantages for business and operational insights. The need for real-time data introduces technical challenges that require skilled expert experience to build custom integration for a successful real-time implementation. For customers looking to implement streaming real-time applications, our partner Confluent recently announced a new Databricks Connector for Confluent Cloud. This new fully-managed connector is designed specifically for the data lakehouse and provides a powerful solution to build and scale real-time applications such as application monitoring, internet of things (IoT), fraud detection, personalization and gaming leaderboards. Organizations can now use an integrated capability that streams legacy and cloud data from Confluent Cloud directly into the Databricks Lakehouse for business intelligence (BI), data analytics and machine learning use cases on a single platform.
About Apptegy Since our start in 2015, we've gone from a group of individuals to a community pushing toward the same goal of building a fantastic company with great people, great products, and most importantly, a great culture. To date, we have grown from a handful of school districts in Arkansas to thousands of school districts across the U.S. Apptegy is building products to empower school leaders to run better schools. We have the opportunity to help schools as they go through a radical shift in how they operate and to provide great technology to make that transition. Our Engineering team has grown significantly and so too has the number of schools and users. We look forward to meeting with you and telling you more about this opportunity to be part of a growing company, engineering organization, and to support a fast-scaling set of products.
Customer data platforms (CDPs) and data management platforms (DMPs) are regularly confused. The mix-up stems from the fact that marketers use both CDPs and DMPs to collect data and to create audiences. A customer data platform allows you to collect data from relevant touchpoints where customers interact with your business. A CDP will organize this data and relay it to your other martech tools. Armed with such data, you can better target your audience with messages honed in line with your brand communication strategy.
Some tech skills remain extraordinarily high-paying. The average tech salary broke six figures for the first time in 2021, according to a report by Dice, highlighting the "continued and sustained" demand for digital talent across all industries. The 2022 Tech Salary Report by jobs marketplace Dice found that the average salary for technologists rose by nearly 7% between 2020 and 2021, reaching $104,566. Dice said this marked the highest salary recorded in the 17 years it has been conducting the survey. IT chiefs took home the highest salaries in 2021, with an average $151,983 per year.
Artificial Intelligence (AI) is one of the most high-profile technology developments in recent history. It would appear that there is no end to what AI can do. Fom driverless cars, dictation tools, translator apps, predictive analytics and application tracking, as well as retail tools such as smart shelves and carts to apps that help people with disabilities, AI can be a powerful component of wonderful tech products and services. But it can also be used for nefarious purposes, and ethical considerations around the use of AI are in their infancy. In their book, Tools and Weapons, the authors talk about the need for ethics, and with a good reason.