Personal
API4AI Computer Vision engine is available on Eden AI
We are pleased to announce that API4AI has been integrated into Eden AI API. API4AI is a cloud-native computer vision & AI platform for startups, enterprises and individual developers. They build their APIs on a complete cloud technology stack which provides full operability, scalability and stable uptime. API4AI's goal is to create out-of-the-box self-contained AI solutions that can easily be integrated into any application with just a few simple steps. Eden AI offers multiple AI APIs on its platform amongst several technologies.
COLING 2022 Highlights
Recent metrics for natural language generation rely on pre-trained language models, for instance BERTScore, BLEURT, and COMET. These metrics achieve a high correlation with human evaluations on standard benchmarks. However, it is unclear how these metrics perform for styles and domains that aren't well represented in their training data. In other words, are these metrics robust? The authors found that BERTScore isn't robust to character-level perturbations.
Similarity between Units of Natural Language: The Transition from Coarse to Fine Estimation
Capturing the similarities between human language units is crucial for explaining how humans associate different objects, and therefore its computation has received extensive attention, research, and applications. With the ever-increasing amount of information around us, calculating similarity becomes increasingly complex, especially in many cases, such as legal or medical affairs, measuring similarity requires extra care and precision, as small acts within a language unit can have significant real-world effects. My research goal in this thesis is to develop regression models that account for similarities between language units in a more refined way. Computation of similarity has come a long way, but approaches to debugging the measures are often based on continually fitting human judgment values. To this end, my goal is to develop an algorithm that precisely catches loopholes in a similarity calculation. Furthermore, most methods have vague definitions of the similarities they compute and are often difficult to interpret. The proposed framework addresses both shortcomings. It constantly improves the model through catching different loopholes. In addition, every refinement of the model provides a reasonable explanation. The regression model introduced in this thesis is called progressively refined similarity computation, which combines attack testing with adversarial training. The similarity regression model of this thesis achieves state-of-the-art performance in handling edge cases.
IELM: An Open Information Extraction Benchmark for Pre-Trained Language Models
Wang, Chenguang, Liu, Xiao, Song, Dawn
We introduce a new open information extraction (OIE) benchmark for pre-trained language models (LM). Recent studies have demonstrated that pre-trained LMs, such as BERT and GPT, may store linguistic and relational knowledge. In particular, LMs are able to answer ``fill-in-the-blank'' questions when given a pre-defined relation category. Instead of focusing on pre-defined relations, we create an OIE benchmark aiming to fully examine the open relational information present in the pre-trained LMs. We accomplish this by turning pre-trained LMs into zero-shot OIE systems. Surprisingly, pre-trained LMs are able to obtain competitive performance on both standard OIE datasets (CaRB and Re-OIE2016) and two new large-scale factual OIE datasets (TAC KBP-OIE and Wikidata-OIE) that we establish via distant supervision. For instance, the zero-shot pre-trained LMs outperform the F1 score of the state-of-the-art supervised OIE methods on our factual OIE datasets without needing to use any training sets. Our code and datasets are available at https://github.com/cgraywang/IELM
Radiology: Artificial Intelligence
Following the recent award of the Nobel Prize in Physics to Aspect, Clauser, and Zeilinger for their work in quantum mechanics, the journal's October 2022 tweet chat introduced the cutting-edge world of Quantum Machine Learning (QML) and its potential in healthcare. How is QML different from "classical" machine learning? First, to describe the basics of quantum computing, we'll use an analogy from MRI physics with the Bloch sphere (below). In classical computing (a), a binary digit ("bit") has a value of 0 (up) or 1 (down). In quantum computing (b), each quantum bit ("qubit") can hold an infinite number of values between 0 and 1.
AI and the Equality Machine: An Interview with Orly Lobel - TeachPrivacy
We often hear of the dark side of artificial intelligence (AI), how it will plunge us into a dystopian world of lost privacy and bad automated decisions, culminating in the robots killing us all. Professor Orly Lobel's The Equality Machine: Harnessing Digital Technology for a Brighter, More Inclusive Future (Public Affairs, October 2022) offers a very different view – one of optimism. Orly's book is an exuberant and insightful account of the bright side of AI and related digital technologies. Her book is filled with fascinating facts and engaging stories. Orly Lobel is the Warren Distinguished Professor of Law; University Professor; and Director, Center for Employment and Labor Policy at the U.C. San Diego School of Law.
Artificial Intelligence and Natural Language Processing and Understanding in Space: A Methodological Framework and Four ESA Case Studies
Gómez-Pérez, José Manuel, García-Silva, Andrés, Leone, Rosemarie, Albani, Mirko, Fontaine, Moritz, Poncet, Charles, Summerer, Leopold, Donati, Alessandro, Roma, Ilaria, Scaglioni, Stefano
The European Space Agency is well known as a powerful force for scientific discovery in numerous areas related to Space. The amount and depth of the knowledge produced throughout the different missions carried out by ESA and their contribution to scientific progress is enormous, involving large collections of documents like scientific publications, feasibility studies, technical reports, and quality management procedures, among many others. Through initiatives like the Open Space Innovation Platform, ESA also acts as a hub for new ideas coming from the wider community across different challenges, contributing to a virtuous circle of scientific discovery and innovation. Handling such wealth of information, of which large part is unstructured text, is a colossal task that goes beyond human capabilities, hence requiring automation. In this paper, we present a methodological framework based on artificial intelligence and natural language processing and understanding to automatically extract information from Space documents, generating value from it, and illustrate such framework through several case studies implemented across different functional areas of ESA, including Mission Design, Quality Assurance, Long-Term Data Preservation, and the Open Space Innovation Platform. In doing so, we demonstrate the value of these technologies in several tasks ranging from effortlessly searching and recommending Space information to automatically determining how innovative an idea can be, answering questions about Space, and generating quizzes regarding quality procedures. Each of these accomplishments represents a step forward in the application of increasingly intelligent AI systems in Space, from structuring and facilitating information access to intelligent systems capable to understand and reason with such information.
D4: a Chinese Dialogue Dataset for Depression-Diagnosis-Oriented Chat
Yao, Binwei, Shi, Chao, Zou, Likai, Dai, Lingfeng, Wu, Mengyue, Chen, Lu, Wang, Zhen, Yu, Kai
In a depression-diagnosis-directed clinical session, doctors initiate a conversation with ample emotional support that guides the patients to expose their symptoms based on clinical diagnosis criteria. Such a dialogue system is distinguished from existing single-purpose human-machine dialog systems, as it combines task-oriented and chit-chats with uniqueness in dialogue topics and procedures. However, due to the social stigma associated with mental illness, the dialogue data related to depression consultation and diagnosis are rarely disclosed. Based on clinical depression diagnostic criteria ICD-11 and DSM-5, we designed a 3-phase procedure to construct D$^4$: a Chinese Dialogue Dataset for Depression-Diagnosis-Oriented Chat, which simulates the dialogue between doctors and patients during the diagnosis of depression, including diagnosis results and symptom summary given by professional psychiatrists for each conversation. Upon the newly-constructed dataset, four tasks mirroring the depression diagnosis process are established: response generation, topic prediction, dialog summary, and severity classification of depressive episode and suicide risk. Multi-scale evaluation results demonstrate that a more empathy-driven and diagnostic-accurate consultation dialogue system trained on our dataset can be achieved compared to rule-based bots.
Deep Learning is Human, Through and Through
Bengio and LeCun see no reason why deep learning systems cannot be made to reason. Said Bengio, "Humans also use some kind of neural nets in their brains, and I believe that there are ways to get to human-like reasoning with deep learning architectures." It was 10 years ago, in 2012, that deep learning made its breakthrough, when an innovative algorithm for classifying images based on multi-layered neural networks suddenly turned out to do spectacularly better than all algorithms before it. That breakthrough has led to deep learning's adoption in domains like speech and image recognition, automatic translation and transcription, and robotics. As deep learning was embedded into ever-more everyday applications, more and more examples of what can go wrong also surfaced: artificial intelligence (AI) systems that discriminate, confirm stereotypes, make inscrutable decisions and require a lot of data and sometimes also a huge amount of energy.
Mapping Process for the Task: Wikidata Statements to Text as Wikipedia Sentences
Ta, Hoang Thang, Gelbukha, Alexander, Sidorov, Grigori
Acknowledged as one of the most successful online cooperative projects in human society, Wikipedia has obtained rapid growth in recent years and desires continuously to expand content and disseminate knowledge values for everyone globally. The shortage of volunteers brings to Wikipedia many issues, including developing content for over 300 languages at the present. Therefore, the benefit that machines can automatically generate content to reduce human efforts on Wikipedia language projects could be considerable. In this paper, we propose our mapping process for the task of converting Wikidata statements to natural language text (WS2T) for Wikipedia projects at the sentence level. The main step is to organize statements, represented as a group of quadruples and triples, and then to map them to corresponding sentences in English Wikipedia. We evaluate the output corpus in various aspects: sentence structure analysis, noise filtering, and relationships between sentence components based on word embedding models. The results are helpful not only for the data-to-text generation task but also for other relevant works in the field.