AITopics

2109.13238

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(34 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(3 more...)

#artificialintelligenceOct-20-2021, 14:21:36 GMT

How AI is booming in Berlin

With a wave of new startups and cutting-edge research, Berlin is taking part in this revolution. With a total of more than 300 companies, Berlin is a European melting pot for innovators and visionaries in the field of AI. AI is a buzzword – often associated with sci-fi dystopian scenarios, like robots outsmarting mankind, flying cars, or a suave speaking computer operating system we fall in love with. But as we now know, we're surrounded by AI every day. Helpful chat bots, parking assistants or face recognition, just to name a few examples, are making our lives more convenient.

artificial intelligence, berlin, institute, (7 more...)

#artificialintelligence

Country:

Europe > Germany > Berlin (0.16)
North America > United States (0.05)
Asia > China (0.05)

Industry:

Health & Medicine (1.00)
Information Technology > Robotics & Automation (0.37)
Transportation > Air (0.36)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.32)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.31)

#artificialintelligenceOct-20-2021, 03:55:53 GMT

AI: The Inverse Tower of Babbel

The Old Testament's'Tower of Babel' story is an origin myth that tries to explain why humanity doesn't speak a single, universal language. According to the Bible, a united human race that speaks the same language arrived in the land of Shinar and decided to build a tower tall enough to reach heaven. Annoyed -- once again, it can probably be said -- by humanity's growing arrogance and budding hubris, God confounded humanity's speech, dividing its people into separate linguistic groups that couldn't understand one another. Just to ensure they don't start comparing and contrasting their languages to reach some form of translating breakthrough, God dispersed humankind to all corners of the earth and set the stage for what is today a world of 6,500 languages. For God, a job well done and the situation remained static for centuries, that was until tribes starting trading with each other, armies started fighting one another, and diplomats initiated conflict resolution measures to try to end the wars that were often started due to misunderstandings of one kind or another.

defined crowd, machine translation, translation, (14 more...)

#artificialintelligence

Country:

South America (0.05)
North America > United States (0.05)
North America > Central America (0.05)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.50)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.32)

Jin, Zhijing, von Kügelgen, Julius, Ni, Jingwei, Vaidhya, Tejas, Kaushal, Ayush, Sachan, Mrinmaya, Schölkopf, Bernhard

Causal Direction of Data Collection Matters: Implications of Causal and Anticausal Learning for NLP

arXiv.org Artificial IntelligenceOct-19-2021

The principle of independent causal mechanisms (ICM) states that generative processes of real world data consist of independent modules which do not influence or inform each other. While this idea has led to fruitful developments in the field of causal inference, it is not widely-known in the NLP community. In this work, we argue that the causal direction of the data collection process bears nontrivial implications that can explain a number of published NLP findings, such as differences in semi-supervised learning (SSL) and domain adaptation (DA) performance across different settings. We categorize common NLP tasks according to their causal direction and empirically assay the validity of the ICM principle for text data using minimum description length. We conduct an extensive meta-analysis of over 100 published SSL and 30 DA studies, and find that the results are consistent with our expectations based on causal insights. This work presents the first attempt to analyze the ICM principle in NLP, and provides constructive suggestions for future modeling choices. Code available at https://github.com/zhijing-jin/icm4nlp

anticausal, computational linguistic, proceedings, (14 more...)

2110.03618

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Italy > Tuscany > Florence (0.04)
North America > Canada > Quebec > Montreal (0.04)
(27 more...)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.71)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceOct-18-2021

Natural Language Processing for Smart Healthcare

Zhou, Binggui, Yang, Guanghua, Shi, Zheng, Ma, Shaodan

Smart healthcare has achieved significant progress in recent years. Emerging artificial intelligence (AI) technologies enable various smart applications across various healthcare scenarios. As an essential technology powered by AI, natural language processing (NLP) plays a key role in smart healthcare due to its capability of analysing and understanding human language. In this work we review existing studies that concern NLP for smart healthcare from the perspectives of technique and application. We focus on feature extraction and modelling for various NLP tasks encountered in smart healthcare from a technical point of view. In the context of smart healthcare applications employing NLP techniques, the elaboration largely attends to representative smart healthcare scenarios, including clinical practice, hospital management, personal care, public health, and drug development. We further discuss the limitations of current works and identify the directions for future works.

application, healthcare, smart healthcare, (14 more...)

2110.15803

Country:

Asia > Macao (0.14)
Asia > China > Guangdong Province > Zhuhai (0.04)
North America > United States > New York > New York County > New York City (0.04)
(18 more...)

Genre:

Overview (0.87)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
(4 more...)

arXiv.org Artificial IntelligenceOct-18-2021

Monotonic Simultaneous Translation with Chunk-wise Reordering and Refinement

Han, HyoJung, Ahn, Seokchan, Choi, Yoonjung, Chung, Insoo, Kim, Sangha, Cho, Kyunghyun

Recent work in simultaneous machine translation is often trained with conventional full sentence translation corpora, leading to either excessive latency or necessity to anticipate as-yet-unarrived words, when dealing with a language pair whose word orders significantly differ. This is unlike human simultaneous interpreters who produce largely monotonic translations at the expense of the grammaticality of a sentence being translated. In this paper, we thus propose an algorithm to reorder and refine the target side of a full sentence translation corpus, so that the words/phrases between the source and target sentences are aligned largely monotonically, using word alignment and non-autoregressive neural machine translation. We then train a widely used wait-k simultaneous translation model on this reordered-and-refined corpus. The proposed approach improves BLEU scores and resulting translations exhibit enhanced monotonicity with source sentences.

computational linguistic, proceedings, translation, (14 more...)

2110.09646

Country:

Europe > Belgium > Brussels-Capital Region > Brussels (0.05)
Africa > Middle East > Egypt > Giza Governorate > Giza (0.04)
North America > United States > New York (0.04)
(13 more...)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Multilingual Neural Machine Translation:Can Linguistic Hierarchies Help?

Saleh, Fahimeh, Buntine, Wray, Haffari, Gholamreza, Du, Lan

Multilingual Neural Machine Translation (MNMT) trains a single NMT model that supports translation between multiple languages, rather than training separate models for different languages. Learning a single model can enhance the low-resource translation by leveraging data from multiple languages. However, the performance of an MNMT model is highly dependent on the type of languages used in training, as transferring knowledge from a diverse set of languages degrades the translation performance due to negative transfer. In this paper, we propose a Hierarchical Knowledge Distillation (HKD) approach for MNMT which capitalises on language groups generated according to typological features and phylogeny of languages to overcome the issue of negative transfer. HKD generates a set of multilingual teacher-assistant models via a selective knowledge distillation mechanism based on the language groups, and then distils the ultimate multilingual model from those assistants in an adaptive way. Experimental results derived from the TED dataset with 53 languages demonstrate the effectiveness of our approach in avoiding the negative transfer effect in MNMT, leading to an improved translation performance (about 1 BLEU score on average) compared to strong baselines.

knowledge distillation, mnmt model, translation, (15 more...)

2110.07816

Country:

North America > United States > Massachusetts (0.04)
North America > United States > Louisiana (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Education (0.66)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

GlobalWoZ: Globalizing MultiWoZ to Develop Multilingual Task-Oriented Dialogue Systems

Ding, Bosheng, Hu, Junjie, Bing, Lidong, Aljunied, Sharifah Mahani, Joty, Shafiq, Si, Luo, Miao, Chunyan

Much recent progress in task-oriented dialogue (ToD) systems has been driven by available annotation data across multiple domains for training. Over the last few years, there has been a move towards data curation for multilingual ToD systems that are applicable to serve people speaking different languages. However, existing multilingual ToD datasets either have a limited coverage of languages due to the high cost of data curation, or ignore the fact that dialogue entities barely exist in countries speaking these languages. To tackle these limitations, we introduce a novel data curation method that generates GlobalWoZ -- a large-scale multilingual ToD dataset globalized from an English ToD dataset for three unexplored use cases. Our method is based on translating dialogue templates and filling them with local entities in the target-language countries. We release our dataset as well as a set of strong baselines to encourage research on learning multilingual ToD systems for real use cases.

target language, test data, use case, (16 more...)

2110.07679

Country:

Asia > China > Shanghai > Shanghai (0.05)
Asia > Singapore (0.04)
Europe > Germany > Saarland > Saarbrücken (0.04)
(20 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech (0.93)
(2 more...)

Kuling, Grey, Curpen, Dr. Belinda, Martel, Anne L.

BI-RADS BERT & Using Section Tokenization to Understand Radiology Reports

Radiology reports are the main form of communication between radiologists and other clinicians, and contain important information for patient care. However in order to use this information for research it is necessary to convert the raw text into structured data suitable for analysis. Domain specific contextual word embeddings have been shown to achieve impressive accuracy at such natural language processing tasks in medicine. In this work we pre-trained a contextual embedding BERT model using breast radiology reports and developed a classifier that incorporated the embedding with auxiliary global textual features in order to perform a section tokenization task. This model achieved a 98% accuracy at segregating free text reports into sections of information outlined in the Breast Imaging Reporting and Data System (BI-RADS) lexicon, a significant improvement over the Classic BERT model without auxiliary information. We then evaluated whether using section tokenization improved the downstream extraction of the following fields: modality/procedure, previous cancer, menopausal status, purpose of exam, breast density and background parenchymal enhancement. Using the BERT model pre-trained on breast radiology reports combined with section tokenization resulted in an overall accuracy of 95.9% in field extraction. This is a 17% improvement compared to an overall accuracy of 78.9% for field extraction for models without section tokenization and with Classic BERT embeddings. Our work shows the strength of using BERT in radiology report analysis and the advantages of section tokenization in identifying key features of patient factors recorded in breast radiology reports.

bert, radiology report, section tokenization, (13 more...)

2110.07552

Country:

North America > Canada > Ontario > Toronto (0.29)
North America > United States > Virginia > Fairfax County > Reston (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.47)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Building Chinese Biomedical Language Models via Multi-Level Text Discrimination

Wang, Quan, Dai, Songtai, Xu, Benfeng, Lyu, Yajuan, Zhu, Yong, Wu, Hua, Wang, Haifeng

Pre-trained language models (PLMs), such as BERT and GPT, have revolutionized the field of NLP, not only in the general domain but also in the biomedical domain. Most prior efforts in building biomedical PLMs have resorted simply to domain adaptation and focused mainly on English. In this work we introduce eHealth, a biomedical PLM in Chinese built with a new pre-training framework. This new framework trains eHealth as a discriminator through both token-level and sequence-level discrimination. The former is to detect input tokens corrupted by a generator and select their original signals from plausible candidates, while the latter is to further distinguish corruptions of a same original sequence from those of the others. As such, eHealth can learn language semantics at both the token and sequence levels. Extensive experiments on 11 Chinese biomedical language understanding tasks of various forms verify the effectiveness and superiority of our approach. The pre-trained model is available to the public at \url{https://github.com/PaddlePaddle/Research/tree/master/KG/eHealth} and the code will also be released later.

ehealth, proceedings, sequence, (17 more...)

2110.07244

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > China > Anhui Province > Hefei (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Technology > Telehealth (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)