Goto

Collaborating Authors

Results


No Joke: Google's AI Is Smart Enough to Understand Your Humor

#artificialintelligence

Google's natural language AI is smart enough to define jokes. The ability to understand the nuances of human language will lead to better and more natural interactions with machines. Google wants to educate people about the benefits of these kinds of AI smarts through upcoming devices like its Pixel 7. Amid a flurry of new hardware including the Pixel 7, the Pixel Buds Pro and a new Pixel Tablet, Google dropped one development at its I/O developer conference that went largely unnoticed: Its AI can now understand jokes. Jokes, sarcasm and humor require understanding the subtleties of language and human behavior. When a comedian says something sarcastic or controversial, usually the audience can discern the tone and know it's more of an exaggeration, something that's learned from years of human interaction.


The Voice Synthesis Business: 2022 Update

#artificialintelligence

Sounds like the perfect technology for anyone who wants to advance Steve Bannon's'flood the zone with shit' strategy.


Can You Code Empathy? with Pascale Fung

#artificialintelligence

ANJA KASPERSEN: Today I am very pleased to be joined by Pascale Fung. Pascale is a;rofessor in the Department of Electronic and Computer Engineering and Department of Computer Science and Engineering at The Hong Kong University of Science and Technology. She is known globally for her pioneering work on conversational artificial intelligence (AI), computational linguistics, and was one of the earliest proponents of statistical and machine-learning approaches for natural language processing (NLP). She is now leading groundbreaking research on how to build intelligent systems that can understand and empathize with humans. I have really been looking forward to this conversation with you. Your professional accolades are many, most of which we will touch on during our conversation. However, for our listeners to get to know you a bit better, I would like us to go back to your upbringing during what I understand to be a very tenuous political period in China. I was born, spent my childhood, ...


LaMDA: Language Models for Dialog Applications

arXiv.org Artificial Intelligence

We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of Transformer-based neural language models specialized for dialog, which have up to 137B parameters and are pre-trained on 1.56T words of public dialog data and web text. While model scaling alone can improve quality, it shows less improvements on safety and factual grounding. We demonstrate that fine-tuning with annotated data and enabling the model to consult external knowledge sources can lead to significant improvements towards the two key challenges of safety and factual grounding. The first challenge, safety, involves ensuring that the model's responses are consistent with a set of human values, such as preventing harmful suggestions and unfair bias. We quantify safety using a metric based on an illustrative set of human values, and we find that filtering candidate responses using a LaMDA classifier fine-tuned with a small amount of crowdworker-annotated data offers a promising approach to improving model safety. The second challenge, factual grounding, involves enabling the model to consult external knowledge sources, such as an information retrieval system, a language translator, and a calculator. We quantify factuality using a groundedness metric, and we find that our approach enables the model to generate responses grounded in known sources, rather than responses that merely sound plausible. Finally, we explore the use of LaMDA in the domains of education and content recommendations, and analyze their helpfulness and role consistency.


LTC-SUM: Lightweight Client-driven Personalized Video Summarization Framework Using 2D CNN

arXiv.org Artificial Intelligence

This paper proposes a novel lightweight thumbnail container-based summarization (LTC-SUM) framework for full feature-length videos. This framework generates a personalized keyshot summary for concurrent users by using the computational resource of the end-user device. State-of-the-art methods that acquire and process entire video data to generate video summaries are highly computationally intensive. In this regard, the proposed LTC-SUM method uses lightweight thumbnails to handle the complex process of detecting events. This significantly reduces computational complexity and improves communication and storage efficiency by resolving computational and privacy bottlenecks in resource-constrained end-user devices. These improvements were achieved by designing a lightweight 2D CNN model to extract features from thumbnails, which helped select and retrieve only a handful of specific segments. Extensive quantitative experiments on a set of full 18 feature-length videos (approximately 32.9 h in duration) showed that the proposed method is significantly computationally efficient than state-of-the-art methods on the same end-user device configurations. Joint qualitative assessments of the results of 56 participants showed that participants gave higher ratings to the summaries generated using the proposed method. To the best of our knowledge, this is the first attempt in designing a fully client-driven personalized keyshot video summarization framework using thumbnail containers for feature-length videos.


Send in the clones: Using artificial intelligence to digitally replicate human voices

#artificialintelligence

Reporter Chloe Veltman reacts to hearing her digital voice double, "Chloney," for the first time, with Speech Morphing chief linguist Mark Seligman. Reporter Chloe Veltman reacts to hearing her digital voice double, "Chloney," for the first time, with Speech Morphing chief linguist Mark Seligman. The science behind making machines talk just like humans is very complex, because our speech patterns are so nuanced. "The voice is not easy to grasp," says Klaus Scherer, emeritus professor of the psychology of emotion at the University of Geneva. "To analyze the voice really requires quite a lot of knowledge about acoustics, vocal mechanisms and physiological aspects. So it is necessarily interdisciplinary, and quite demanding in terms of what you need to master in order to do anything of consequence."


Send in the clones: Using artificial intelligence to digitally replicate human voices

NPR Technology

Reporter Chloe Veltman reacts to hearing her digital voice double, "Chloney," for the first time, with Speech Morphing chief linguist Mark Seligman. Reporter Chloe Veltman reacts to hearing her digital voice double, "Chloney," for the first time, with Speech Morphing chief linguist Mark Seligman. The science behind making machines talk just like humans is very complex, because our speech patterns are so nuanced. "The voice is not easy to grasp," says Klaus Scherer, emeritus professor of the psychology of emotion at the University of Geneva. "To analyze the voice really requires quite a lot of knowledge about acoustics, vocal mechanisms and physiological aspects. So it is necessarily interdisciplinary, and quite demanding in terms of what you need to master in order to do anything of consequence."


Challenges of Artificial Intelligence -- From Machine Learning and Computer Vision to Emotional Intelligence

arXiv.org Artificial Intelligence

Artificial intelligence (AI) has become a part of everyday conversation and our lives. It is considered as the new electricity that is revolutionizing the world. AI is heavily invested in both industry and academy. However, there is also a lot of hype in the current AI debate. AI based on so-called deep learning has achieved impressive results in many problems, but its limits are already visible. AI has been under research since the 1940s, and the industry has seen many ups and downs due to over-expectations and related disappointments that have followed. The purpose of this book is to give a realistic picture of AI, its history, its potential and limitations. We believe that AI is a helper, not a ruler of humans. We begin by describing what AI is and how it has evolved over the decades. After fundamentals, we explain the importance of massive data for the current mainstream of artificial intelligence. The most common representations for AI, methods, and machine learning are covered. In addition, the main application areas are introduced. Computer vision has been central to the development of AI. The book provides a general introduction to computer vision, and includes an exposure to the results and applications of our own research. Emotions are central to human intelligence, but little use has been made in AI. We present the basics of emotional intelligence and our own research on the topic. We discuss super-intelligence that transcends human understanding, explaining why such achievement seems impossible on the basis of present knowledge,and how AI could be improved. Finally, a summary is made of the current state of AI and what to do in the future. In the appendix, we look at the development of AI education, especially from the perspective of contents at our own university.


The Web Is Your Oyster -- Knowledge-Intensive NLP against a Very Large Web Corpus

arXiv.org Artificial Intelligence

In order to address the increasing demands of real-world applications, the research for knowledge-intensive NLP (KI-NLP) should advance by capturing the challenges of a truly open-domain environment: web scale knowledge, lack of structure, inconsistent quality, and noise. To this end, we propose a new setup for evaluating existing KI-NLP tasks in which we generalize the background corpus to a universal web snapshot. We repurpose KILT, a standard KI-NLP benchmark initially developed for Wikipedia, and ask systems to use a subset of CCNet - the Sphere corpus - as a knowledge source. In contrast to Wikipedia, Sphere is orders of magnitude larger and better reflects the full diversity of knowledge on the Internet. We find that despite potential gaps of coverage, challenges of scale, lack of structure and lower quality, retrieval from Sphere enables a state-of-the-art retrieve-and-read system to match and even outperform Wikipedia-based models on several KILT tasks - even if we aggressively filter content that looks like Wikipedia. We also observe that while a single dense passage index over Wikipedia can outperform a sparse BM25 version, on Sphere this is not yet possible. To facilitate further research into this area, and minimise the community's reliance on proprietary black box search engines, we will share our indices, evaluation metrics and infrastructure.


Est-ce que vous compute? Code-switching, cultural identity, and AI

arXiv.org Artificial Intelligence

Cultural code-switching concerns how we adjust our overall behaviours, manners of speaking, and appearance in response to a perceived change in our social environment. We defend the need to investigate cultural code-switching capacities in artificial intelligence systems. We explore a series of ethical and epistemic issues that arise when bringing cultural code-switching to bear on artificial intelligence. Building upon Dotson's (2014) analysis of testimonial smothering, we discuss how emerging technologies in AI can give rise to epistemic oppression, and specifically, a form of self-silencing that we call 'cultural smothering'. By leaving the socio-dynamic features of cultural code-switching unaddressed, AI systems risk negatively impacting already-marginalised social groups by widening opportunity gaps and further entrenching social inequalities.