Humans learn to speak before they can read or write, so why can't computers do the same? In this paper, we present a deep neural network model capable of rudimentary spoken language acquisition using untranscribed audio training data, whose only supervision comes in the form of contextually relevant visual images. We describe the collection of our data comprised of over 120,000 spoken audio captions for the Places image dataset and evaluate our model on an image search and annotation task. We also provide some visualizations which suggest that our model is learning to recognize meaningful words within the caption spectrograms.
Prominent growth areas include virtual agents in banking, infotainment systems in automotive, managing vast amounts of data and tele-health in healthcare, and education, which has seen a significant switch to online learning due to pandemic-related social distancing. Also, the number of IoT devices at home and in the workplace will multiply exponentially, with the number of active devices increasing from 7.6 billion at the end of 2019 to an estimated 24.1 billion in 2030. In the more immediate future, it's estimated that by the end of 2020, 50% of all search will be conducted via voice and 75% of U.S. homes will have at least one smart speaker. Tech providers thus have a growing requirement for vast amounts of speech data upon which to base reliable and comprehensive services. They have to protect against fraud and impersonation, recognize dialects and accents, even identify a user's emotional mood to respond in the most appropriate manner.
Artificial intelligence is a hot topic right now. Driven by a fear of losing out, companies in many industries have announced AI-focused initiatives. Unfortunately, most of these efforts will fail. They will fail not because AI is all hype, but because companies are approaching AI-driven innovation incorrectly. And this isn't the first time companies have made this kind of mistake.
If there's one thing that companies large and small can agree on, it's that deploying effective artificial intelligence (AI) is challenging. Not every organization has the funds, specialized teams, and annotators required for a large-scale AI deployment, and even those that do struggle with collecting enough high-quality data to build accurate models quickly, or update them with the right frequency. Deploying and maintaining AI with speed is essential for a competitive advantage in this rapidly-evolving space, which is why many companies are looking to third-party options that enable them to scale quickly. In particular, organizations are increasingly relying on off-the-shelf, or pre-built, datasets to provide needed data conveniently with limited risk. These datasets are cost-effective alternatives that can accelerate deployments and provide that last percentage or two of accuracy required to meet desired confidence thresholds.