Goto

Collaborating Authors

 Oceania


SMILK, linking natural language and data from the web

arXiv.org Artificial Intelligence

As part of the SMILK Joint Lab, we studied the use of Natural Language Processing to: (1) enrich knowledge bases and link data on the web, and conversely (2) use this linked data to contribute to the improvement of text analysis and the annotation of textual content, and to support knowledge extraction. The evaluation focused on brand-related information retrieval in the field of cosmetics. This article describes each step of our approach: the creation of ProVoc, an ontology to describe products and brands; the automatic population of a knowledge base mainly based on ProVoc from heterogeneous textual resources; and the evaluation of an application which that takes the form of a browser plugin providing additional knowledge to users browsing the web.


Sequential Attention GAN for Interactive Image Editing via Dialogue

arXiv.org Machine Learning

In this paper, we introduce a new task - interactive image editing via conversational language, where users can guide an agent to edit images via multi-turn dialogue in natural language. In each dialogue turn, the agent takes a source image and a natural language description from the user as the input, and generates a target image following the textual description. Two new datasets are created for this task,Zap-Seq and DeepFashion-Seq, collected via crowdsourcing. For this task, we propose a new Sequential Attention Genrative Adversarial Network (SeqAttnGAN) framework, which applies a neural state tracker to encode both source image and textual descriptions, and generates high quality images in each dialogue turn. To achieve better region specific text-to-image generation, we also introducean attention mechanism into the model. Experiments on the two datasets, including quantitative evaluation and user study, show that our model outperforms state-of-the-art ap-proaches in both image quality and text-to-image consistency.


Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems

arXiv.org Machine Learning

We study derivative-free methods for policy optimization over the class of linear policies. We focus on characterizing the convergence rate of these methods when applied to linear-quadratic systems, and study various settings of driving noise and reward feedback. We show that these methods provably converge to within any pre-specified tolerance of the optimal policy with a number of zero-order evaluations that is an explicit polynomial of the error tolerance, dimension, and curvature properties of the problem. Our analysis reveals some interesting differences between the settings of additive driving noise and random initialization, as well as the settings of one-point and two-point reward feedback. Our theory is corroborated by extensive simulations of derivative-free methods on these systems. Along the way, we derive convergence rates for stochastic zero-order optimization algorithms when applied to a certain class of non-convex problems.


CNN based Multi-Instance Multi-Task Learning for Syndrome Differentiation of Diabetic Patients

arXiv.org Machine Learning

Syndrome differentiation in Traditional Chinese Medicine (TCM) is the process of understanding and reasoning body condition, which is the essential step and premise of effective treatments. However, due to its complexity and lack of standardization, it is challenging to achieve. In this study, we consider each patient's record as a one-dimensional image and symptoms as pixels, in which missing and negative values are represented by zero pixels. The objective is to find relevant symptoms first and then map them to proper syndromes, that is similar to the object detection problem in computer vision. Inspired from it, we employ multi-instance multi-task learning combined with the convolutional neural network (MIMT-CNN) for syndrome differentiation, which takes region proposals as input and output image labels directly. The neural network consists of region proposals generation, convolutional layer, fully connected layer, and max pooling (multi-instance pooling) layer followed by the sigmoid function in each syndrome prediction task for image representation learning and final results generation. On the diabetes dataset, it performs better than all other baseline methods. Moreover, it shows stability and reliability to generate results, even on the dataset with small sample size, a large number of missing values and noises.


A New Font, Sans Forgetica, Helps You Remember What You Read

WIRED

Remember all those classics you devoured in comp-lit class? Research shows that we retain an embarrassingly small sliver of what we read. In an effort to help college students boost that percentage, a team made up of a designer, a psychologist, and a behavioral economist at Australia's RMIT University recently introduced a new typeface, Sans Forgetica, that uses clever tricks to lodge information in your brain. The font-makers drew on the psychological theory of "desirable difficulty"--that is, we learn better when we actively overcome an obstruction. Sans Forgetica is purposefully hard to decipher, forcing the reader to focus.


Mask-aware networks for crowd counting

arXiv.org Machine Learning

Crowd counting problem aims to count the number of objects within an image or a frame in the videos and is usually solved by estimating the density map generated from the object location annotations. The values in the density map, by nature, take two possible states: zero indicating no object around, a non-zero value indicating the existence of objects and the value denoting the local object density. In contrast to traditional methods which do not differentiate the density prediction of these two states, we propose to use a dedicated network branch to predict the object/non-object mask and then combine its prediction with the input image to produce the density map. Our rationale is that the mask prediction could be better modeled as a binary segmentation problem and the difficulty of estimating the density could be reduced if the mask is known. A key to the proposed scheme is the strategy of incorporating the mask prediction into the density map estimator. To this end, we study five possible solutions, and via analysis and experimental validation we identify the most effective one. Through extensive experiments on five public datasets, we demonstrate the superior performance of the proposed approach over the baselines and show that our network could achieve the state-of-the-art performance.


Attention-based Recurrent Neural Network for Urban Vehicle Trajectory Prediction

arXiv.org Artificial Intelligence

As the number of various positioning sensors and location-based devices increase, a huge amount of spatial and temporal information data is collected and accumulated. These data are expressed as trajectory data by connecting the data points in chronological sequence, and thses data contain movement information of any moving object. Particularly, in this study, urban vehicle trajectory prediction is studied using trajectory data of vehicles in urban traffic network. In the previous work, Recurrent Neural Network model for urban vehicle trajectory prediction is proposed. For the further improvement of the model, in this study, we propose Attention-based Recurrent Neural Network model for urban vehicle trajectory prediction. In this proposed model, we use attention mechanism to incorporate network traffic state data into urban vehicle trajectory prediction. The model is evaluated by using the Bluetooth data collected in Brisbane, Australia, which contains the movement information of private vehicles. The performance of the model is evaluated with 5 metrics, which are BLEU-1, BLEU-2, BLEU-3, BLEU-4, and METEOR. The result shows that ARNN model have better performance compared to RNN model.


Human Pose and Path Estimation from Aerial Video using Dynamic Classifier Selection

arXiv.org Machine Learning

We consider the problem of estimating human pose and trajectory by an aerial robot with a monocular camera in near real time. We present a preliminary solution whose distinguishing feature is a dynamic classifier selection architecture. In our solution, each video frame is corrected for perspective using projective transformation. Then, two alternative feature sets are used: (i) Histogram of Oriented Gradients (HOG) of the silhouette, (ii) Convolutional Neural Network (CNN) features of the RGB image. The features (HOG or CNN) are classified using a dynamic classifier. A class is defined as a pose-viewpoint pair, and a total of 64 classes are defined to represent a forward walking and turning gait sequence. Our solution provides three main advantages: (i) Classification is efficient due to dynamic selection (4-class vs. 64-class classification). (ii) Classification errors are confined to neighbors of the true view-points. (iii) The robust temporal relationship between poses is used to resolve the left-right ambiguities of human silhouettes. Experiments conducted on both fronto-parallel videos and aerial videos confirm our solution can achieve accurate pose and trajectory estimation for both scenarios. We found using HOG features provides higher accuracy than using CNN features. For example, applying the HOG-based variant of our scheme to the 'walking on a figure 8-shaped path' dataset (1652 frames) achieved estimation accuracies of 99.6% for viewpoints and 96.2% for number of poses.


An Active Information Seeking Model for Goal-oriented Vision-and-Language Tasks

arXiv.org Machine Learning

As Computer Vision algorithms move from passive analysis of pixels to active reasoning over semantics, the breadth of information algorithms need to reason over has expanded significantly. One of the key challenges in this vein is the ability to identify the information required to make a decision, and select an action that will recover this information. We propose an reinforcement-learning approach that maintains an distribution over its internal information, thus explicitly representing the ambiguity in what it knows, and needs to know, towards achieving its goal. Potential actions are then generated according to particles sampled from this distribution. For each potential action a distribution of the expected answers is calculated, and the value of the information gained is obtained, as compared to the existing internal information. We demonstrate this approach applied to two vision-language problems that have attracted significant recent interest, visual dialogue and visual query generation. In both cases the method actively selects actions that will best reduce its internal uncertainty, and outperforms its competitors in achieving the goal of the challenge.


Technological Advances in Applied Intelligence (IEA/AIE-2018)

AI Magazine

The 31st International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems (IEA/AIE-2018) was held at Concordia University in Montreal, Canada, June 25–28, 2018. This report summarizes the The 31st International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems (IEA/AIE-2018) was held at Concordia University in Montreal, Canada, June 25–28, 2018.  IEA/AIE 2018 continued the tradition of emphasizing on applications of applied intelligent systems to solve real-life problems in all areas including engineering, science, industry, automation a robotics, business and finance, medicine and biomedicine, bioinformatics, cyberspace, and human-machine interactions.