Let me start by saying a few things that seem obvious," Geoffrey Hinton, "Godfather" of deep learning, and one of the most celebrated scientists of our time, told a leading AI conference in Toronto in 2016. "If you work as a radiologist you're like the coyote that's already over the edge of the cliff but hasn't looked down." Deep learning is so well-suited to reading images from MRIs and CT scans, he reasoned, that people should "stop training radiologists now" and that it's "just completely obvious within five years deep learning is going to do better." Fast forward to 2022, and not a single radiologist has been replaced. Rather, the consensus view nowadays is that machine learning for radiology is harder than it looks1; at least for now, humans and machines complement each other's strengths.2 Deep learning is at its best when all we need are rough-ready results. Few fields have been more filled with hype and bravado than artificial intelligence. It has flitted from fad to fad decade ...
Few fields have been more filled with hype and bravado than artificial intelligence. It has flitted from fad to fad decade by decade, always promising the moon, and only occasionally delivering. "Let me start by saying a few things that seem obvious," Geoffrey Hinton, "Godfather" of deep learning, and one of the most celebrated scientists of our time, told a leading AI conference in Toronto in 2016. "If you work as a radiologist, you're like the coyote that's already over the edge of the cliff but hasn't looked down." Deep learning is so well-suited to reading images from MRIs and CT scans, he reasoned, that people should "stop training radiologists now" and that it's "just completely obvious within five years deep learning is going to do better."
In Vision Transformer Part I, I discussed a fairly new image classification model in this post called Vision Transformer (ViT) introduced in An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale paper (2020) and fine-tuning ViT on Xray threat detection dataset. In this post, I will discuss how I improved ViT's prediction performance using Iterative Erasing Prediction Strategy, which iteratively masks the test image with interpolated ViT attention weights, highlighting the object in interest and raising the class confidence as shown below: When we are judging whether an object is present in an image, we zoom in on the area that seems to contain the object and pay less and less attention to the surrounding areas, until we feel certain that the object is indeed present in the image. Such attention process of humans may take less than a second in the brain so we might not even be aware of it. This was the motivation behind the algorithm that iteratively "erases" (i.e. Keep in mind that there is no training involved for this algorithm, as it is only a prediction heuristic for the testing stage.
Cohere Inc., an AI startup founded by University of Toronto alumni that uses natural language processing to improve human-machine interactions, has raised US$125 million as it looks to open a new office in Silicon Valley, the Globe and Mail reports. The latest financing round, led by New York-based Tiger Global Management, comes only five months after Cohere secured $US40 million in venture capital financing, according to the Globe. Cohere's software platform helps companies infuse natural language processing capabilities into their business using tools like chatbots, without requiring AI expertise of their own. The company originated in a 2017 paper co-authored by CEO Aidan Gomez, who interned at the Google Brain lab of deep learning pioneer and University Professor Emeritus Geoffrey Hinton, a Cohere investor. Cohere's other co-founders are alumnus Nick Frosst, who also worked with Hinton at Google, and Ivan Zhang, a former U of T computer science student.
Reproducibility is an increasing concern in Artificial Intelligence (AI), particularly in the area of Deep Learning (DL). Being able to reproduce DL models is crucial for AI-based systems, as it is closely tied to various tasks like training, testing, debugging, and auditing. However, DL models are challenging to be reproduced due to issues like randomness in the software (e.g., DL algorithms) and non-determinism in the hardware (e.g., GPU). There are various practices to mitigate some of the aforementioned issues. However, many of them are either too intrusive or can only work for a specific usage context. In this paper, we propose a systematic approach to training reproducible DL models. Our approach includes three main parts: (1) a set of general criteria to thoroughly evaluate the reproducibility of DL models for two different domains, (2) a unified framework which leverages a record-and-replay technique to mitigate software-related randomness and a profile-and-patch technique to control hardware-related non-determinism, and (3) a reproducibility guideline which explains the rationales and the mitigation strategies on conducting a reproducible training process for DL models. Case study results show our approach can successfully reproduce six open source and one commercial DL models.
Artificial intelligence is largely a numbers game. When deep neural networks, a form of AI that learns to discern patterns in data, began surpassing traditional algorithms 10 years ago, it was because we finally had enough data and processing power to make full use of them. Today's neural networks are even hungrier for data and power. Training them requires carefully tuning the values of millions or even billions of parameters that characterize these networks, representing the strengths of the connections between artificial neurons. The goal is to find nearly ideal values for them, a process known as optimization, but training the networks to reach this point isn't easy.
Machine Learning is being used in countless applications today. It is a natural fit in domains where there is no single algorithm that works perfectly, and there is a large amount of unseen data that the algorithm needs to do a good job predicting the right output. Unlike traditional algorithm problems where we expect exact optimal answers, machine learning applications can tolerate approximate answers. Deep Learning with neural networks has been the dominant methodology of training new machine learning models for the past decade. Its rise to prominence is often attributed to the ImageNet  competition in 2012.
Half of long-term care (LTC) residents are malnourished increasing hospitalization, mortality, morbidity, with lower quality of life. Current tracking methods are subjective and time consuming. This paper presents the automated food imaging and nutrient intake tracking (AFINI-T) technology designed for LTC. We propose a novel convolutional autoencoder for food classification, trained on an augmented UNIMIB2016 dataset and tested on our simulated LTC food intake dataset (12 meal scenarios; up to 15 classes each; top-1 classification accuracy: 88.9%; mean intake error: -0.4 mL$\pm$36.7 mL). Nutrient intake estimation by volume was strongly linearly correlated with nutrient estimates from mass ($r^2$ 0.92 to 0.99) with good agreement between methods ($\sigma$= -2.7 to -0.01; zero within each of the limits of agreement). The AFINI-T approach is a deep-learning powered computational nutrient sensing system that may provide a novel means for more accurately and objectively tracking LTC resident food intake to support and prevent malnutrition tracking strategies.
Deep generative models of molecules have grown immensely in popularity, trained on relevant datasets, these models are used to search through chemical space. The downstream utility of generative models for the inverse design of novel functional compounds depends on their ability to learn a training distribution of molecules. The most simple example is a language model that takes the form of a recurrent neural network and generates molecules using a string representation. More sophisticated are graph generative models, which sequentially construct molecular graphs and typically achieve state of the art results. However, recent work has shown that language models are more capable than once thought, particularly in the low data regime. In this work, we investigate the capacity of simple language models to learn distributions of molecules. For this purpose, we introduce several challenging generative modeling tasks by compiling especially complex distributions of molecules. On each task, we evaluate the ability of language models as compared with two widely used graph generative models. The results demonstrate that language models are powerful generative models, capable of adeptly learning complex molecular distributions -- and yield better performance than the graph models. Language models can accurately generate: distributions of the highest scoring penalized LogP molecules in ZINC15, multi-modal molecular distributions as well as the largest molecules in PubChem.
We have put together a list of 10 most cited and discussed research papers in machine learning that published over the past 10 years, from AlexNet to GPT-3. These are great readings for researchers new to this field and freshers for experienced researchers. For each paper, we provide links to the short overview, author presentations and detailed paper walkthrough for readers with different levels of expertise. Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art.