"What exactly is computer vision then? Computer vision is a research field working to equip computers with the ability to process and understand visual data, as sighted humans can. Human brains process the gigabytes of data passing through our eyes every second and translate that data into sight - that is, into discrete objects and entities we can recognise or understand. Similarly, computer vision aims to give computers the ability to understand what they are seeing, and act intelligently on that knowledge."
– Computer vision: Cheat Sheet. ZDNet.com (December 6, 2011), by Natasha Lomas.
Machine learning is the ability for computers to learn without explicit programming. For example, iconoclastic author Tom Peters highlights 159 cognitive biases that impact management decision-making (Peters, Tom. Given a computer is devoid of emotion and the hubris of human ego, it would seem logical that machine learning is not impacted by cognitive bias. Machine learning technology is deployed today for many business uses, including self-driving cars, online recommendation, search engines, handwriting recognition, computer vision, online ad serving, pricing, prediction of equipment failure, credit scoring, fraud detection, OCR (optical character recognition), spam filtering and many other uses.
WASHINGTON – Launched today, the American College of Radiology (ACR) Data Science Institute (DSI) will work with government, industry and others to guide and facilitate the appropriate development and implementation of artificial intelligence (AI) tools to help radiologists improve medical imaging care. "Patients will benefit most from artificial intelligence if radiologists serve a leading role in guiding the technologies that best enhance medical imaging diagnosis and treatment," said James A. "The ACR Data Science Institute will create, gather, manage and integrate AI knowledge as these tools emerge to improve patient care." About the American College of Radiology The American College of Radiology (ACR), founded in 1924, is a professional medical society dedicated to serving patients and society by empowering radiology professionals to advance the practice, science and professions of radiological care.
Through deep learning, algorithms and data help machines improve their accuracy and knowledge over time, which means we don't have to do the grunt work normally required to be strategic because the program is doing it for us. Machines can now do this for us (well, Gravyty can), just like Facebook can tell us who to tag in photos through face recognition--but identifying donors is way more helpful and lucrative than that.
Chinese tech giant Baidu's text-to-speech system, Deep Voice, is making a lot of progress toward sounding more human. Baidu says that unlike previous text-to-speech systems, Deep Voice 2 finds shared qualities between the training voices entirely on its own, and without any previous guidance. "Deep voice 2 can learn from hundreds of voices and imitate them perfectly," a blog post says. In a research paper (PDF), Baidu concludes that its neural network can create voice pretty effectively even from small voice samples from hundreds of different speakers.
TL;DR Baidu's TTS system now supports multi-speaker conditioning, and can learn new speakers with very little data (a la LyreBird). I'm really excited about the recent influx of neural-net TTS systems, but all of the them seem to be too slow for real time dialog, or not publicly available, or both. Hoping that one of them gets a high quality open-source implementation soon!
Next time you hear a voice generated by Baidu's Deep Voice 2, you might not be able to tell whether it's human. That's leaps and bounds better than early versions of Deep Voice, which took multiple hours to learn one voice. Then, it autonomously derives unique voices from that model -- unlike voice assistants like Apple's Siri, which require that a human record thousands of hours of speech that engineers tune by hand, Deep Voice 2 doesn't require guidance or manual intervention. Google's WaveNet, a product of the company's DeepMind division, generates voices by sampling real human speech and independently creating its own sounds in a variety of voices.
So by adding a few lines of code, developers can mix, match and customize AI functionalities to suit their needs, spanning functions such as translation, video deconstruction and search, gesture recognition and real-time captioning. Broadly speaking, therefore, Cognitive Services are plugin functionalities that developers can use to enable systems within their apps to hear, speak, understand and interpret human needs. For example, LUIS (Language Understanding Intelligent Service) helps developers to integrate language models to understand users using either prebuilt or customized models. While the Custom Vision Service makes it easy to create your own image recognition service.
It also has a lot of the AI-driven smarts you'll find in higher-end models like the Mavic Pro and the Phantom that help it handle things like object detection and automated flying. This isn't a totally new idea, but gesture control often requires some kind of external sensor or glove. Gesture control isn't magic, of course. You can't do a special wave and expect it to go all Michael Bay, but there are some pre-planned flight modes baked in that can be accessed through the DJI app.
The drone, which goes on pre-sale today and ships mid-June, seeks to dramatically shorten the distance from a consumer buying a drone (DJI said 46% of costumers have the intention to buy one this year) to actually flying one. The quadcopter's 1 2/3-inch sensor can shoot 12-megapixel images, but only shoots up to 1080p video at 30 frames per second (fps), which may still be more than enough for most consumers.
At the Summit, I had the opportunity to share some thoughts on computer vision, and its impact on financial services. Activity using computer vision input has increased across the general technology landscape. As financial services players attempt to get ahead of the curve, what is the potential for computer vision? We think computer vision is most likely to transform insurance, commerce, capital markets, and banking.