Goto

Collaborating Authors

 Machine Translation


Why Ambitious Predictions About A.I. Are Always Wrong

Slate

Since the very beginning of the computer revolution, researchers have dreamed of creating computers that would rival the human brain. Our brains are information machines that use inputs to generate outputs, and so are computers. How hard could it be to build computers that work as well as our brains? In 1954 a Georgetown-IBM team predicted that language translation programs would be perfected in three to five years. In 1965 Herbert Simon said that "machines will be capable, within twenty years, of doing any work a man can do."


The internet is excluding Asian-Americans who don't speak English

MIT Technology Review

And it starts right at the beginning. Instead of the Hmong word for "hello" or "welcome," she says, is "something else that said, like, 'your honor' or'the queen' or'the king' instead." Seeing something so simple done incorrectly was frustrating and off-putting. "Not only was it just probably churned through Google Translate, it wasn't even peer edited and reviewed to ensure that there was fluency and coherence," she says. Xiong says this kind of carelessness is common online--and it's one reason she and others in the Hmong community can feel excluded from politics.


Limited English Skills Can Mean Limited Access to the COVID-19 Vaccine

Slate

This story was published in partnership with Type Investigations with support from the Puffin Foundation. In California, non-English speakers handed COVID-19 vaccination cards without information on what they mean. In Pennsylvania, people who speak Mandarin, Korean, and Japanese unable to make vaccine appointments due to a lack of interpreters at hospital call centers. These are just a few of the examples captured in a new complaint filed on Friday to the U.S. Department of Health and Human Services' Office for Civil Rights, Federal Emergency Management Agency's Office of Equal Rights, and Department of Homeland Security's Office for Civil Rights and Civil Liberties. The complaint, brought by the National Health Law Program, finds widespread problems across the country that inhibit access to COVID-19 resources for people with limited English proficiency (LEP).


Translate All: Automating multiple file type batch translation with AWS CloudFormation

#artificialintelligence

This is a guest post by Cyrus Wong, an AWS Machine Learning Hero. You can learn more about and connect with AWS Machine Learning Heroes at the community page. On July 29, 2020, AWS announced that Amazon Translate now supports Microsoft Office documents, including .docx, The world is full of bilingual countries and cities like Hong Kong. I find myself always needing to prepare Office documents and presentation slides in both English and Chinese.


Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation

arXiv.org Artificial Intelligence

Human evaluation of modern high-quality machine translation systems is a difficult problem, and there is increasing evidence that inadequate evaluation procedures can lead to erroneous conclusions. While there has been considerable research on human evaluation, the field still lacks a commonly-accepted standard procedure. As a step toward this goal, we propose an evaluation methodology grounded in explicit error analysis, based on the Multidimensional Quality Metrics (MQM) framework. We carry out the largest MQM research study to date, scoring the outputs of top systems from the WMT 2020 shared task in two language pairs using annotations provided by professional translators with access to full document context. We analyze the resulting data extensively, finding among other results a substantially different ranking of evaluated systems from the one established by the WMT crowd workers, exhibiting a clear preference for human over machine output. Surprisingly, we also find that automatic metrics based on pre-trained embeddings can outperform human crowd workers. We make our corpus publicly available for further research.


AI 50: America's Most Promising Artificial Intelligence Companies

#artificialintelligence

The Covid-19 pandemic was devastating for many industries, but it only accelerated the use of artificial intelligence across the U.S. economy. Amid the crisis, companies scrambled to create new services for remote workers and students, beef up online shopping and dining options, make customer call centers more efficient and speed development of important new drugs. Even as applications of machine learning and perception platforms become commonplace, a thick layer of hype and fuzzy jargon clings to AI-enabled software.That makes it tough to identify the most compelling companies in the space--especially those finding new ways to use AI that create value by making humans more efficient, not redundant. With this in mind, Forbes has partnered with venture firms Sequoia Capital and Meritech Capital to create our third annual AI 50, a list of private, promising North American companies that are using artificial intelligence in ways that are fundamental to their operations. To be considered, businesses must be privately-held and utilizing machine learning (where systems learn from data to improve on tasks), natural language processing (which enables programs to "understand" written or spoken language) or computer vision (which relates to how machines "see"). AI companies incubated at, largely funded through or acquired by large tech, manufacturing or industrial firms aren't eligible for consideration. Our list was compiled through a submission process open to any AI company in the U.S. and Canada. The application asked companies to provide details on their technology, business model, customers and financials like funding, valuation and revenue history (companies had the option to submit information confidentially, to encourage greater transparency). Forbes received several hundred entries, of which nearly 400 qualified for consideration. From there, our data partners applied an algorithm to identify 100 companies with the highest quantitative scores--and that also made diversity a priority. Next, a panel of expert AI judges evaluated the finalists to find the 50 most compelling companies (they were precluded from judging companies in which they have a vested interest). Among trends this year are what Sequoia Capital's Konstantine Buhler calls AI workbench companies--building of platforms tailored to different enterprises, including Dataiku, DataRobot Domino Data and Databricks.


Family of Origin and Family of Choice: Massively Parallel Lexiconized Iterative Pretraining for Severely Low Resource Machine Translation

arXiv.org Artificial Intelligence

We translate a closed text that is known in advance into a severely low resource language by leveraging massive source parallelism. In other words, given a text in 124 source languages, we translate it into a severely low resource language using only ~1,000 lines of low resource data without any external help. Firstly, we propose a systematic method to rank and choose source languages that are close to the low resource language. We call the linguistic definition of language family Family of Origin (FAMO), and we call the empirical definition of higher-ranked languages using our metrics Family of Choice (FAMC). Secondly, we build an Iteratively Pretrained Multilingual Order-preserving Lexiconized Transformer (IPML) to train on ~1,000 lines (~3.5%) of low resource data. To translate named entities correctly, we build a massive lexicon table for 2,939 Bible named entities in 124 source languages, and include many that occur once and covers more than 66 severely low resource languages. Moreover, we also build a novel method of combining translations from different source languages into one. Using English as a hypothetical low resource language, we get a +23.9 BLEU increase over a multilingual baseline, and a +10.3 BLEU increase over our asymmetric baseline in the Bible dataset. We get a 42.8 BLEU score for Portuguese-English translation on the medical EMEA dataset. We also have good results for a real severely low resource Mayan language, Eastern Pokomchi.


AI 50: America's Most Promising Artificial Intelligence Companies

#artificialintelligence

The Covid-19 pandemic was devastating for many industries, but it only accelerated the use of artificial intelligence across the U.S. economy. Amid the crisis, companies scrambled to create new services for remote workers and students, beef up online shopping and dining options, make customer call centers more efficient and speed development of important new drugs. Even as applications of machine learning and perception platforms become commonplace, a thick layer of hype and fuzzy jargon clings to AI-enabled software.That makes it tough to identify the most compelling companies in the space--especially those finding new ways to use AI that create value by making humans more efficient, not redundant. With this in mind, Forbes has partnered with venture firms Sequoia Capital and Meritech Capital to create our third annual AI 50, a list of private, promising North American companies that are using artificial intelligence in ways that are fundamental to their operations. To be considered, businesses must be privately-held and utilizing machine learning (where systems learn from data to improve on tasks), natural language processing (which enables programs to "understand" written or spoken language) or computer vision (which relates to how machines "see"). AI companies incubated at, largely funded through or acquired by large tech, manufacturing or industrial firms aren't eligible for consideration. Our list was compiled through a submission process open to any AI company in the U.S. and Canada. The application asked companies to provide details on their technology, business model, customers and financials like funding, valuation and revenue history (companies had the option to submit information confidentially, to encourage greater transparency). Forbes received several hundred entries, of which nearly 400 qualified for consideration. From there, our data partners applied an algorithm to identify 100 companies with the highest quantitative scores--and that also made diversity a priority. Next, a panel of expert AI judges evaluated the finalists to find the 50 most compelling companies (they were precluded from judging companies in which they have a vested interest). Among trends this year are what Sequoia Capital's Konstantine Buhler calls AI workbench companies--building of platforms tailored to different enterprises, including Dataiku, DataRobot Domino Data and Databricks.


AI 50: America's Most Promising Artificial Intelligence Companies

#artificialintelligence

The Covid-19 pandemic was devastating for many industries, but it only accelerated the use of artificial intelligence across the U.S. economy. Amid the crisis, companies scrambled to create new services for remote workers and students, beef up online shopping and dining options, make customer call centers more efficient and speed development of important new drugs. Even as applications of machine learning and perception platforms become commonplace, a thick layer of hype and fuzzy jargon clings to AI-enabled software.That makes it tough to identify the most compelling companies in the space--especially those finding new ways to use AI that create value by making humans more efficient, not redundant. With this in mind, Forbes has partnered with venture firms Sequoia Capital and Meritech Capital to create our third annual AI 50, a list of private, promising North American companies that are using artificial intelligence in ways that are fundamental to their operations. To be considered, businesses must be privately-held and utilizing machine learning (where systems learn from data to improve on tasks), natural language processing (which enables programs to "understand" written or spoken language) or computer vision (which relates to how machines "see"). AI companies incubated at, largely funded through or acquired by large tech, manufacturing or industrial firms aren't eligible for consideration. Our list was compiled through a submission process open to any AI company in the U.S. and Canada. The application asked companies to provide details on their technology, business model, customers and financials like funding, valuation and revenue history (companies had the option to submit information confidentially, to encourage greater transparency). Forbes received several hundred entries, of which nearly 400 qualified for consideration. From there, our data partners applied an algorithm to identify 100 companies with the highest quantitative scores--and that also made diversity a priority. Next, a panel of expert AI judges evaluated the finalists to find the 50 most compelling companies (they were precluded from judging companies in which they have a vested interest). Among trends this year are what Sequoia Capital's Konstantine Buhler calls AI workbench companies--building of platforms tailored to different enterprises, including Dataiku, DataRobot Domino Data and Databricks.


The NLP Week: How NLTM can make India a world leader in Speech-to-Speech Translation

#artificialintelligence

India is a melting pot of multiple cultures, religions, diaspora and languages. Although 22 languages are recognised officially, more than 100 languages and dialects are spoken across the country. In the past decade, India has witnessed stupendous growth digitally - in 2019, the number of smartphone users in rural areas surpassed that of urban India. There is a burgeoning market for digital products, going well beyond borders of urban pockets. However, less than 1% of content on the Internet is in English.