Calgary
End-to-end Audiovisual Speech Activity Detection with Bimodal Recurrent Neural Models
Speech activity detection (SAD) plays an important role in current speech processing systems, including automatic speech recognition (ASR). SAD is particularly difficult in environments with acoustic noise. A practical solution is to incorporate visual information, increasing the robustness of the SAD approach. An audiovisual system has the advantage of being robust to different speech modes (e.g., whisper speech) or background noise. Recent advances in audiovisual speech processing using deep learning have opened opportunities to capture in a principled way the temporal relationships between acoustic and visual features. This study explores this idea proposing a \emph{bimodal recurrent neural network} (BRNN) framework for SAD. The approach models the temporal dynamic of the sequential audiovisual data, improving the accuracy and robustness of the proposed SAD system. Instead of estimating hand-crafted features, the study investigates an end-to-end training approach, where acoustic and visual features are directly learned from the raw data during training. The experimental evaluation considers a large audiovisual corpus with over 60.8 hours of recordings, collected from 105 speakers. The results demonstrate that the proposed framework leads to absolute improvements up to 1.2% under practical scenarios over a VAD baseline using only audio implemented with deep neural network (DNN). The proposed approach achieves 92.7% F1-score when it is evaluated using the sensors from a portable tablet under noisy acoustic environment, which is only 1.0% lower than the performance obtained under ideal conditions (e.g., clean speech obtained with a high definition camera and a close-talking microphone).
End-to-end Multimodal Emotion and Gender Recognition with Dynamic Weights of Joint Loss
Chae, Myungsu, Kim, Tae-Ho, Shin, Young Hoon, Kim, June-Woo, Lee, Soo-Young
Multi-task learning (MTL) is one of the method for improving generalizability of multiple tasks. In order to perform multiple classification tasks with one neural network model, the losses of each task should be combined. Previous studies have mostly focused on prediction of multiple tasks using joint loss with static weights for training model. Choosing weights between tasks have not taken any considerations while it is set by uniformly or empirically. In this study, we propose a method to make joint loss using dynamic weights to improve total performance not an individual performance of tasks, and apply this method to end-to-end multimodal emotion and gender recognition model using audio and video data. This approach provides proper weights for each loss of the tasks when training ends. In our experiment, a performance of emotion and gender recognition with proposed method shows lower joint loss which is computed as negative log-likelihood than the one with static weights of joint loss. Also, our proposed model shows better generalizability than compared models. In our best knowledge, this research shows the strength of dynamic weights of joint loss for maximizing total performance at first in emotion and gender recognition task.
Blind Community Detection from Low-rank Excitations of a Graph Filter
Wai, Hoi-To, Segarra, Santiago, Ozdaglar, Asuman E., Scaglione, Anna, Jadbabaie, Ali
Abstract-- This paper considers a novel framework to detect communities in a graph from the observation of signals at its nodes. We model the observed signals as noisy outputs of an unknown network process -- represented as a graph filter -- that is excited by a set of low-rank inputs. Rather than learning the precise parameters of the graph itself, the proposed method retrieves the community structure directly; Furthermore, as in blind system identification methods, it does not require knowledge of the system excitation. The paper shows that communities can be detected by applying spectral clustering to the low-rank output covariance matrix obtained from the graph signals. The performance analysis indicates that the community detection accuracy depends on the spectral properties of the graph filter considered. Furthermore, we show that the accuracy can be improved via a low-rank matrix decomposition method when the excitation signals are known. Numerical experiments demonstrate that our approach is effective for analyzing network data from diffusion, consumers, and social dynamics. The emerging field of network science and availability of big data have motivated researchers to extend signal processing techniques to the analysis of signals defined on graphs, motivating a new area of research referred to as graph signal processing (GSP) [2]-[4].
Analysis of Network Lasso For Semi-Supervised Regression
We characterize the statistical properties of network Lasso for semi-supervised regression problems involving network- structured data. This characterization is based on the con- nectivity properties of the empirical graph which encodes the similarities between individual data points. Loosely speaking, network Lasso is accurate if the available label informa- tion is well connected with the boundaries between clusters of the network-structure datasets. We make this property precise using the notion of network flows. In particular, the existence of a sufficiently large network flow over the empirical graph implies a network compatibility condition which, in turn, en- sures accuracy of network Lasso.
K-medoids Clustering of Data Sequences with Composite Distributions
Wang, Tiexing, Li, Qunwei, Bucci, Donald J., Liang, Yingbin, Chen, Biao, Varshney, Pramod K.
This paper studies clustering of data sequences using the k-medoids algorithm. All the data sequences are assumed to be generated from \emph{unknown} continuous distributions, which form clusters with each cluster containing a composite set of closely located distributions (based on a certain distance metric between distributions). The maximum intra-cluster distance is assumed to be smaller than the minimum inter-cluster distance, and both values are assumed to be known. The goal is to group the data sequences together if their underlying generative distributions (which are unknown) belong to one cluster. Distribution distance metrics based k-medoids algorithms are proposed for known and unknown number of distribution clusters. Upper bounds on the error probability and convergence results in the large sample regime are also provided. It is shown that the error probability decays exponentially fast as the number of samples in each data sequence goes to infinity. The error exponent has a simple form regardless of the distance metric applied when certain conditions are satisfied. In particular, the error exponent is characterized when either the Kolmogrov-Smirnov distance or the maximum mean discrepancy are used as the distance metric. Simulation results are provided to validate the analysis.
Report: AI drives VC investment as Canada hits $900 million USD for second straight quarter
PwC has released its latest MoneyTree report for Q2 2018, founding that the AI sector continued to thrive in Canadian venture capital. AI experienced a 104 percent funding increase in Q2 2018 compared to the last quarter, with $222 million CAD ($169 million USD) invested across 13 deals. Total quarterly deals and investment reached an all-time high this quarter; the second highest quarter was Q2 2017, when $209 million ($159 million USD) was invested across 12 deals, boosted by Element AI's historic Series A. "Approximately half of the deal volume this quarter went to businesses that provide analytics tools to their customers," said Dave Planques, national deals leader at PwC Canada. "These companies are supporting enterprises in making better data-driven decisions." "If you think of this industry as analytics on one end of the spectrum to artificial intelligence on the other, the Canadian tech sector is firing on all of those cylinders."
ELICA: An Automated Tool for Dynamic Extraction of Requirements Relevant Information
Abad, Zahra Shakeri Hossein, Gervasi, Vincenzo, Zowghi, Didar, Barker, Ken
Abstract--Requirements elicitation requires extensive knowledge and deep understanding of the problem domain where the final system will be situated. However, in many software development projects, analysts are required to elicit the requirements from an unfamiliar domain, which often causes communication barriers between analysts and stakeholders. In this paper, we propose a requirements ELICitation Aid tool (ELICA) to help analysts better understand the target application domain by dynamic extraction and labeling of requirementsrelevant knowledge. To extract the relevant terms, we leverage the flexibility and power of Weighted Finite State Transducers (WFSTs) in dynamic modeling of natural language processing tasks. In addition to the information conveyed through text, ELICA captures and processes nonlinguistic information about the intention of speakers such as their confidence level, analytical tone, and emotions. The extracted information is made available to the analysts as a set of labeled snippets with highlighted relevant terms which can also be exported as an artifact of the Requirements Engineering (RE) process. The application and usefulness of ELICA are demonstrated through a case study. This study shows how preexisting relevant information about the application domain and the information captured during an elicitation meeting, such as the conversation and stakeholders' intentions, can be captured and used to support analysts achieving their tasks.
Robot Truck Upstart Embark Hauls In $30 Million To Take On Waymo, Uber
Embark co-founders Alex Rodrigues, left, and Brandon Moak with their fleet of autonomous semi-trucks at the startup's operations center in Ontario, California. Ask Embark Trucks CEO Alex Rodrigues how his small autonomous tech startup can compete with giants in the space like Alphabet Inc.'s Waymo or Uber and the confident 22-year-old is ready with an answer. "We're able to move really fast," he told Forbes aboard the cab of one of Embark's sensor-laden Peterbilt semi-trucks as it barreled down the I-10 on a sunny morning, hauling a commercial load from Ontario, California, to Phoenix. As required by law a safety driver's hands are on the wheel, but the big rig is driving itself down the busy highway. "Waymo may have the conglomerate advantage' of build once, use many times," he said, because its new robot truck program has the same tech that goes into its self-driving minivans.
โฆAnd the technologies that could save it!
Since man hunted and got a taste for the meat of the Auroch, later domesticated into the ancestors of modern cattle breeds, the market for beef has grown steadily. The last 10 years have not been so kind, with plummeting beef consumption and higher prices. There is some light, as meat intense diets like paleo and keto have turned some consumers back to beef, but just at the point when the cattle industry has become more consolidated, sophisticated and consumer focused it is ironically facing some of the greatest existential threats to its 10,000 years existence. Touted as sustainable, welfare friendly or conversely dismissed as'fake meat' the clear intent of growing meat on petri dishes is to displace the consumption of red-meat. Despite concerns of how'friendly' the technology really is, meat producers such as Cargill and Tyson foods have invested in startups in this market. Environmentalists advocating'Meatless Mondays' and other initiatives at consumer level have been unremitting in their attacks on the meat industry. These action groups have used sometimes dubious data to support their contention that cattle, and specifically beef uses more water, more resources and emits more greenhouses gases then other human choices. Their relentless attack appears to be having an effect on red meat consumption in the US and Europe.
OpenHouse.AI: Disrupting Real Estate through Transparency
In the meantime, the market has emerged into an on-demand economy that has been driven by information. Today home buyers' increasing access to information allows them, in some ways, to circumvent the agent. The need for instant gratification and knowledge to find the best value at the lowest cost is slowly evolving this industry. How real estate succeeds in the next decade will fully rely on the changing habits of the home buyer, with an acquiescence to slowly dismantling this market structure to improve buyer access to information, while creating new sources of value. One start-up based in Calgary and Toronto is paving the way for this disruption and is challenging the players: the realtors and the home buyers to think differently about these transactions.