Successful artificial intelligence (AI) and machine learning (ML) initiatives bring value to the entire organization by delivering insights to the right person or system at the right time within the right context. But many organizations are unable to do this because they are too focused on algorithms. Data science is more than neural networks and deep learning! Organizations need to instead leverage people, processes, and technology to infuse AI and ML into business processes. It sounds simple, only four ingredients: flour, water, yeast, and a bit of salt.
There is also a complementary Domino project available. Many data scientists deliver value to their organizations by mapping, developing, and deploying an appropriate ML solution to address a business problem. Feature engineering is useful for data scientists when assessing tradeoff decisions regarding the impact of their ML models. It is a framework for approaching ML as well as providing techniques for extracting features from raw data that can be used within the models. As Domino seeks to help data scientists accelerate their work, we reached out to AWP Pearson for permission to excerpt the chapter "Manual Feature Engineering: Manipulating Data for Fun and Profit" from the book, Machine Learning with Python for Everyone by Mark E. Fenner. Many thanks to AWP Pearson for providing the permissions to excerpt the work and enabling us to provide a complementary publicly viewable Domino project. We are going to turn our attention away from expanding our catalog of models [as mentioned previously in the book] and instead take a closer look at the data. Feature engineering refers to manipulation--addition, deletion, combination, mutation--of the features. Remember that features are attribute- value pairs, so we could add or remove columns from our data table and modify values within columns. Feature engineering can be used in a broad sense and in a narrow sense. I'm going to use it in a broad, inclusive sense and point out some gotchas along the way. Two drivers of feature engineering are (1) background knowledge from the domain of the task and (2) inspection of the data values. The first case includes a doctor's knowledge of important blood pressure thresholds or an accountant's knowledge of tax bracket levels. Another example is the use of body mass index (BMI) by medical providers and insurance companies. While it has limitations, BMI is quickly calculated from body weight and height and serves as a surrogate for a characteristic that is very hard to accurately measure: proportion of lean body mass. Inspecting the values of a feature means looking at a histogram of its distribution. For distribution-based feature engineering, we might see multimodal distributions--histograms with multiple humps--and decide to break the humps into bins.
Below is a list of the topics I am planning to cover. Note that while these topics are numerated by lectures, note that some lectures are longer or shorter than others. Also, we may skip over certain topics in favor of others if time is a concern. While this section provides an overview of potential topics to be covered, the actual topics will be listed in the course calendar.
Two years ago artificial intelligence (AI) reached the peak of absurd expectations, as I tried to capture in this post. Well, reality seems to have crept in, something that shows in how companies have been approaching AI, which has been to focus on low-hanging fruit, rather than moonshots. This is according to Deloitte's State of AI in the Enterprise, 2nd Edition, which unveils a world increasingly serious about AI. That said, there are still some head-scratchers in the data. Let's dive into the report.
My understanding so far has been that most of the research on text summarization has been done in English. However, I can't find any reliable numbers for this. My best idea so far has been to search for "automatic summarization language " for a few languages on Google Scholar and see the number of results to get a rough estimate of the proportions. I get 42k for English, 25k for French, 24k for Spanish... But more surprising is I find 46k for Chinese.
This project (code, data, and results) is publicly available on Domino. Crunchbase recently converted its backend database to a Neo4j graph database. This will give it great flexibility in the future, but for now, the data is exposed similarly to how it always has been: individual entities are retrieved and attribute data must be used to form edges between them prior to any graph analysis. Aside from traversing links manually on the web pages, there are no provisions for graph analysis. To enable more powerful manipulations of this data, during my time at Zipfian Academy, I created my "Visibly Connected" project.
In this paper, we focus on the classification of books using short descriptive texts (cover blurbs) and additional metadata. Building upon BERT, a deep neural language model, we demonstrate how to combine text representations with metadata and knowledge graph embeddings, which encode author information. Compared to the standard BERT approach we achieve considerably better results for the classification task. For a more coarse-grained classification using eight labels we achieve an F1- score of 87.20, while a detailed classification using 343 labels yields an F1-score of 64.70. We make the source code and trained models of our experiments publicly available.
The world needs more computing power and Huawei wants to build the architectures needed to answer the call, spanning processors, networks, devices, and cloud services. It also believes artificial intelligence (AI) will fuel much of this need and is gearing up a "full stack" portfolio to tap the growing enterprise demand. Statistical computing, which is needed in dealing with undefinable tasks such as voice and image recognition, will soon become mainstream and Huawei believes AI computing, five years from now, will account for more than 80% of computing power used worldwide. At the Chinese tech vendor's annual Connect conference Wednesday, deputy chairman Ken Hu noted that training AI algorithms requires a metric ton of computing power, while other more complex applications such as autonomous driving and weather forecasting requires even more compute power. Touting its low latency and high speeds, Ericsson says 5G can introduce a multitude of new applications for businesses and give telcos the cost efficiencies they seek, but the persistent controversy over cybersecurity--specifically involving Huawei--is leading to uncertainty and a general slowdown in the market.
Are you looking for a new challenge or a fresh start? Do you leave work each day feeling frustrated? If you answered yes to these questions, SAIC is looking to help jump start your career with new and exciting challenges. SAIC is a premier technology integrator, solving our nation's most complex modernization and systems engineering challenges across the defense, space, federal civilian, and intelligence markets. Our robust portfolio of offerings includes high-end solutions in systems engineering and integration; enterprise IT, including cloud services; cyber; software; advanced analytics and simulation; and training.
No longer simply the plot from futuristic Hollywood action movies or sci-fi novels, AI is now powering a future filled with potential. Is the financial services industry ready to deal with this kind of disruption? What happens when machines learn faster than humans, when automation drives service delivery, and when software makes decisions? How does the financial services industry ensure we've coded the right values into the technology in order to avoid unintended consequences in its application? Earlier this summer, TD Bank Group brought together some of the best thinkers in the AI field -- academics, bank technologists, fintechs, global consultants, public sector leaders -- and asked them to work with us to identify critical areas of focus for the financial services industry.