Azure AI is driving innovation and improving experiences for employees, users, and customers in a variety of ways, from increasing workday productivity to promoting inclusion and accessibility. The success of Azure AI--featuring Azure Cognitive Services, Azure Machine Learning, and Azure OpenAI Service--is built on a foundation of Microsoft Research, a wide range of Azure products that have been tested at scale within Microsoft apps, and Azure customers who use these services for the benefit of their end users. As 2023 begins, we are excited to highlight 10 use cases where Azure AI is utilized within Microsoft and beyond. Speech transcription and captioning in Microsoft Teams is powered by Azure Cognitive Services for Speech. Microsoft achieved human parity in conversational speech recognition when it reached an error rate of 5.9 percent.
Feb 23 (Reuters) - Facebook-owner Meta (FB.O) is working on artificial intelligence research to generate worlds through speech, improve how people chat to voice assistants and translate between languages, CEO Mark Zuckerberg said on Wednesday, as he sketched out key steps to building the metaverse. Zuckerberg is betting that the metaverse, a futuristic idea of virtual environments where users can work, socialize and play, will be the successor to the mobile internet. "The key to unlocking a lot of these advances is AI," he said, speaking at the company's live-streamed "Inside the Lab" event. Zuckerberg said Meta was working on a new class of generative AI models that will allow people to describe a world and generate aspects of it. In a prerecorded demo, Zuckerberg showcased an AI concept called Builder Bot, where he appeared as a legless 3D avatar on an island and gave speech commands to create a beach and then add clouds, trees and even a picnic blanket.
Facebook-owner Meta is working on artificial intelligence research to generate worlds through speech to improve how people chat to voice assistants and translate between languages, CEO Mark Zuckerberg said on Wednesday, as he sketched out key steps to building the metaverse. Zuckerberg is betting that the metaverse, a futuristic idea of virtual environments where users can work, socialize and play, will be the successor to the mobile internet. "The key to unlocking a lot of these advances is AI," he said, speaking at the company's live-streamed "Inside the Lab" event. Zuckerberg said Meta was working on a new class of generative AI models that will allow people to describe a world and generate aspects of it. In a prerecorded demo, Zuckerberg showcased an AI concept called Builder Bot, where he appeared as a legless 3D avatar on an island and gave speech commands to create a beach and then add clouds, trees and even a picnic blanket.
Google has activated a safety feature that lets minors under 18 request that images of themselves be removed from search results, The Verge has reported. Google first announced the option back in August as part of a slate of new safety measures for kids, but it's now rolling out widely to users. Google said it will remove any images of minors "with the exception of case of compelling public interest or newsworthiness." The requests can be made by minors, their parents, guardians or other legal representatives. To do so, you'll need to supply the URLs you want removed, the name and age of the minor and the name of the person acting on their behalf.
Grauman, Kristen, Westbury, Andrew, Byrne, Eugene, Chavis, Zachary, Furnari, Antonino, Girdhar, Rohit, Hamburger, Jackson, Jiang, Hao, Liu, Miao, Liu, Xingyu, Martin, Miguel, Nagarajan, Tushar, Radosavovic, Ilija, Ramakrishnan, Santhosh Kumar, Ryan, Fiona, Sharma, Jayant, Wray, Michael, Xu, Mengmeng, Xu, Eric Zhongcong, Zhao, Chen, Bansal, Siddhant, Batra, Dhruv, Cartillier, Vincent, Crane, Sean, Do, Tien, Doulaty, Morrie, Erapalli, Akshay, Feichtenhofer, Christoph, Fragomeni, Adriano, Fu, Qichen, Fuegen, Christian, Gebreselasie, Abrham, Gonzalez, Cristina, Hillis, James, Huang, Xuhua, Huang, Yifei, Jia, Wenqi, Khoo, Weslie, Kolar, Jachym, Kottur, Satwik, Kumar, Anurag, Landini, Federico, Li, Chao, Li, Yanghao, Li, Zhenqiang, Mangalam, Karttikeya, Modhugu, Raghava, Munro, Jonathan, Murrell, Tullie, Nishiyasu, Takumi, Price, Will, Puentes, Paola Ruiz, Ramazanova, Merey, Sari, Leda, Somasundaram, Kiran, Southerland, Audrey, Sugano, Yusuke, Tao, Ruijie, Vo, Minh, Wang, Yuchen, Wu, Xindi, Yagi, Takuma, Zhu, Yunyi, Arbelaez, Pablo, Crandall, David, Damen, Dima, Farinella, Giovanni Maria, Ghanem, Bernard, Ithapu, Vamsi Krishna, Jawahar, C. V., Joo, Hanbyul, Kitani, Kris, Li, Haizhou, Newcombe, Richard, Oliva, Aude, Park, Hyun Soo, Rehg, James M., Sato, Yoichi, Shi, Jianbo, Shou, Mike Zheng, Torralba, Antonio, Torresani, Lorenzo, Yan, Mingfei, Malik, Jitendra
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,025 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 855 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards with consenting participants and robust de-identification procedures where relevant. Ego4D dramatically expands the volume of diverse egocentric video footage publicly available to the research community. Portions of the video are accompanied by audio, 3D meshes of the environment, eye gaze, stereo, and/or synchronized videos from multiple egocentric cameras at the same event. Furthermore, we present a host of new benchmark challenges centered around understanding the first-person visual experience in the past (querying an episodic memory), present (analyzing hand-object manipulation, audio-visual conversation, and social interactions), and future (forecasting activities). By publicly sharing this massive annotated dataset and benchmark suite, we aim to push the frontier of first-person perception. Project page: https://ego4d-data.org/
Liu, Haochen, Wang, Yiqi, Fan, Wenqi, Liu, Xiaorui, Li, Yaxin, Jain, Shaili, Liu, Yunhao, Jain, Anil K., Tang, Jiliang
In the past few decades, artificial intelligence (AI) technology has experienced swift developments, changing everyone's daily life and profoundly altering the course of human society. The intention of developing AI is to benefit humans, by reducing human labor, bringing everyday convenience to human lives, and promoting social good. However, recent research and AI applications show that AI can cause unintentional harm to humans, such as making unreliable decisions in safety-critical scenarios or undermining fairness by inadvertently discriminating against one group. Thus, trustworthy AI has attracted immense attention recently, which requires careful consideration to avoid the adverse effects that AI may bring to humans, so that humans can fully trust and live in harmony with AI technologies. Recent years have witnessed a tremendous amount of research on trustworthy AI. In this survey, we present a comprehensive survey of trustworthy AI from a computational perspective, to help readers understand the latest technologies for achieving trustworthy AI. Trustworthy AI is a large and complex area, involving various dimensions. In this work, we focus on six of the most crucial dimensions in achieving trustworthy AI: (i) Safety & Robustness, (ii) Non-discrimination & Fairness, (iii) Explainability, (iv) Privacy, (v) Accountability & Auditability, and (vi) Environmental Well-Being. For each dimension, we review the recent related technologies according to a taxonomy and summarize their applications in real-world systems. We also discuss the accordant and conflicting interactions among different dimensions and discuss potential aspects for trustworthy AI to investigate in the future.
Facebook, one of the most popular social media application with 2.6 billion users worldwide, is also a leading developer and an active user of technology. Facebook uses its own AI based translations that helps users from different worldwide regions, to convert news feed, and facebook stories in their own languages. Taking its technology one step further, facebook has trained an AI model that will not require transcribed data. Facebook will use this AI model for building the speech recognition system. Facebook's unsupervised speech recognition model Wav-2vcu will be fed with unknown data with no previously defined datasets. The system will teach itself to classify data.
Zhang, Daniel, Mishra, Saurabh, Brynjolfsson, Erik, Etchemendy, John, Ganguli, Deep, Grosz, Barbara, Lyons, Terah, Manyika, James, Niebles, Juan Carlos, Sellitto, Michael, Shoham, Yoav, Clark, Jack, Perrault, Raymond
Welcome to the fourth edition of the AI Index Report. This year we significantly expanded the amount of data available in the report, worked with a broader set of external organizations to calibrate our data, and deepened our connections with the Stanford Institute for Human-Centered Artificial Intelligence (HAI). The AI Index Report tracks, collates, distills, and visualizes data related to artificial intelligence. Its mission is to provide unbiased, rigorously vetted, and globally sourced data for policymakers, researchers, executives, journalists, and the general public to develop intuitions about the complex field of AI. The report aims to be the most credible and authoritative source for data and insights about AI in the world.
And there's a pretty broad range of people that this will be helpful to." "It's definitely a great help for people with a hearing disability, but also for international, distributed workforces who don't speak English as their native language. And education as well: online classes could benefit from captions, on top of the Live Notes that they can go back to, to facilitate learning." The transcription is not exactly pitch perfect: some sentences don't make sense and words occasionally come up deformed.
In today's world, it is nearly impossible to avoid voice-controlled digital assistants. From the interactive intelligent agents used by corporations, government agencies, and even personal devices, automated speech recognition (ASR) systems, combined with machine learning (ML) technology, increasingly are being used as an input modality that allows humans to interact with machines, ostensibly via the most common and simplest way possible: by speaking in a natural, conversational voice. Yet as a study published in May 2020 by researchers from Stanford University indicated, the accuracy level of ASR systems from Google, Facebook, Microsoft, and others vary widely depending on the speaker's race. While this study only focused on the differing accuracy levels for a small sample of African American and white speakers, it points to a larger concern about ASR accuracy and phonological awareness, including the ability to discern and understand accents, tonalities, rhythmic variations, and speech patterns that may differ from the voices used to initially train voice-activated chatbots, virtual assistants, and other voice-enabled systems. The Stanford study, which was published in the journal Proceedings of the National Academy of Sciences, measured the error rates of ASR technology from Amazon, Apple, Google, IBM, and Microsoft, by comparing the system's performance in understanding identical phrases (taken from pre-recorded interviews across two datasets) spoken by 73 black and 42 white speakers, then comparing the average word error rate (WER) for black and white speakers.