Liem

AAAI Conferences

In this paper, we develop a new human computation algorithm for speech-to-text transcription that can potentially achieve the high accuracy of professional transcription using only microtasks deployed via an online task market or a game. The algorithm partitions audio clips into short 10-second segments for independent processing and joins adjacent outputs to produce the full transcription. Each segment is sent through an iterative dual pathway structure that allows participants in either path to iteratively refine the transcriptions of others in their path while being rewarded based on transcriptions in the other path, eliminating the need to check transcripts in a separate process. Initial experiments with local subjects show that produced transcripts are on average 96.6% accurate.


Automated or Manual Transcription Service: Which Is Better?

#artificialintelligence

Although lately, speech recognition technology has improved considerably, it is yet no match to the human transcriptionist in achieving accuracy. Speech recognition software that are commercially available show an average error rate of about 12% while transcribing phone conversation. Read on to learn more. Automated transcription is a process where an audio and video file is converted into a written format using voice & speech recognition technology. Like most AI streams, artificial intelligence for transcription works in the same way, training specific software with high-quality datasets or examples.


How artificial intelligence and machine learning are unlocking content

#artificialintelligence

The pace of content creation has never been faster. Each day, the world generates 2.5 quintillion bytes of data, and more than 90 percent of all data in existence has been produced since 2016. In the process, it is being hidden from search engines, locked away in multimedia formats that cannot be catalogued. A gold mine of information is waiting to be tapped if only the spoken word could be easily converted to text. Doing so by hand is time consuming and, often, prohibitively expensive.


Transcribing Audio Sucks--So Make the Machines Do It

WIRED

A new voice-transcription technology can tell you not only what's being said, but who is saying it. The software, named Trint, can listen to an audio recording or a video of two or more speakers engaged in a natural conversation, then provide a written transcript of what each person said. While news organizations have invested heavily in video content, the ability to optimize those clips for search engines remains elusive. Trint's technology is still nascent, but it could eventually give new life to vast swaths of non-text-based media on the internet, like videos and podcasts, by making them readable to both humans and search engines. People could read podcasts they lack the time or ability to listen to.


Transcribing Audio Sucks--So Make the Machines Do It

@machinelearnbot

An unprecedented voice-transcription technology can tell you not only what's being said, but who is saying it. The web app, named Trint, can listen to an audio recording or a video of two or more speakers (or just one) engaged in natural speech, then provide a written transcript of what was said. Unlike Siri or Google Talk, Trint is designed to transcribe long blocks of text. While news organizations have invested heavily in video content, the ability to optimize those clips for search engines remains elusive. Trint's technology is still nascent, but it could eventually give new life to vast swaths of non-text-based media on the internet, like videos and podcasts, by making them readable to both humans and search engines.