Collaborating Authors


Imaging Sciences R&D Laboratories in Argentina

Communications of the ACM

We use the term imaging sciences to refer to the overarching spectrum of scientific and technological contexts which involve images in digital format including, among others, image and video processing, scientific visualization, computer graphics, animations in games and simulators, remote sensing imagery, and also the wide set of associated application areas that have become ubiquitous during the last decade in science, art, human-computer interaction, entertainment, social networks, and many others. As an area that combines mathematics, engineering, and computer science, this discipline arose in a few universities in Argentina mostly in the form of elective classes and small research projects in electrical engineering or computer science departments. Only in the mid-2000s did some initiatives aiming to generate joint activities and to provide identity and visibility to the discipline start to appear. In this short paper, we present a brief history of the three laboratories with the most relevant research and development (R&D) activities in the discipline in Argentina, namely the Imaging Sciences Laboratory of the Universidad Nacional del Sur, the PLADEMA Institute at the Universidad Nacional del Centro de la Provincia de Buenos Aires, and the Image Processing Laboratory at the Universidad Nacional de Mar del Plata. The Imaging Sciences Laboratorya of the Electrical and Computer Engineering Department of the Universidad Nacional del Sur Bahía Blanca began its activities in the 1990s as a pioneer in Argentina and Latin America in research and teaching in computer graphics, and in visualization.

AI that scans a construction site can spot when things are falling behind


The system uses a GoPro camera mounted on top of a hard hat. When managers tour a site once or twice a week, the camera on their head captures video footage of the whole project and uploads it to image recognition software, which compares the status of many thousands of objects on site--such as electrical sockets and bathroom fittings--with a digital replica of the building. The AI also uses the video feed to track where the camera is in the building to within a few centimeters so that it can identify the exact location of the objects in each frame. The system can track the status of around 150,000 objects several times a week, says Danon. For each object the AI can tell which of three or four states it is in, from not yet begun to fully installed.

Machine Vision: A Boon for the Manufacturing Industry


FREMONT, CA: Machine vision is one of the important additions to the manufacturing sector. It has provided automated inspection capabilities as part of QC procedures. Nevertheless, the world of automation is becoming more complex with time. With rapid developments in many different areas, such as imaging techniques, robot interfaces, CMOS sensors, machine and deep learning, embedded vision, data transmission standards, and image processing capabilities, vision technology can benefit the manufacturing industry at multiple different levels. New imaging techniques have brought new application opportunities.

Art with AI: Turning photographs into artwork with Neural Style Transfer


Please Note: I reserve the rights of all the media used in this blog -- photographs, animations, videos, etc. they are my work (except the 7 mentioned artworks by artists which were used as style images). GIFs might take a while to load, please be patient. If that is the case please open in browser instead. The world today doesn't make sense, so why should I paint pictures that do? -- Pablo Picasso Here are the results, some combinations produced astounding artwork. Here's an image of a bride & graffiti, combining them results in an output similar to doodle painting. Here, you can see the buildings being popped up in the background.

Machine Vision: You CAN Fix What You Can't See - Railway Age


RAILWAY AGE, SEPTEMBER 2020 ISSUE: Whether it's the track structure or the equipment that operates on it, there are many things that the naked eye cannot readily see. Increasingly, machine vision technology is becoming the best way to identify potential flaws before they lead to failures. "The various machine vision technologies deployed detect thousands of conditions each year that could potentially lead to accidents," says Robert Coakley, Director of Business Development, ENSCO Rail. Compared to manual visual inspections, he says, autonomous machine vision offers advantages of speed, reduced track occupancy, inspection frequency and consistency. The equipment is installed on revenue service trains, can perform inspections at track speed and does not require the additional occupancy of a hi-rail vehicle.

Visual Methods for Sign Language Recognition: A Modality-Based Review Artificial Intelligence

Sign language visual recognition from continuous multi-modal streams is still one of the most challenging fields. Recent advances in human actions recognition are exploiting the ascension of GPU-based learning from massive data, and are getting closer to human-like performances. They are then prone to creating interactive services for the deaf and hearing-impaired communities. A population that is expected to grow considerably in the years to come. This paper aims at reviewing the human actions recognition literature with the sign-language visual understanding as a scope. The methods analyzed will be mainly organized according to the different types of unimodal inputs exploited, their relative multi-modal combinations and pipeline steps. In each section, we will detail and compare the related datasets, approaches then distinguish the still open contribution paths suitable for the creation of sign language related services. Special attention will be paid to the approaches and commercial solutions handling facial expressions and continuous signing.

TRECVID 2019: An Evaluation Campaign to Benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & Retrieval Artificial Intelligence

The TREC Video Retrieval Evaluation (TRECVID) 2019 was a TREC-style video analysis and retrieval evaluation, the goal of which remains to promote progress in research and development of content-based exploitation and retrieval of information from digital video via open, metrics-based evaluation. Over the last nineteen years this effort has yielded a better understanding of how systems can effectively accomplish such processing and how one can reliably benchmark their performance. TRECVID has been funded by NIST (National Institute of Standards and Technology) and other US government agencies. In addition, many organizations and individuals worldwide contribute significant time and effort. TRECVID 2019 represented a continuation of four tasks from TRECVID 2018. In total, 27 teams from various research organizations worldwide completed one or more of the following four tasks: 1. Ad-hoc Video Search (AVS) 2. Instance Search (INS) 3. Activities in Extended Video (ActEV) 4. Video to Text Description (VTT) This paper is an introduction to the evaluation framework, tasks, data, and measures used in the workshop.

What is the simplest entry into NN image classification systems, as a C-callable library?


The data set would be astronomy sub-images that are either bad (edge of chip artifacts, bright star saturation and spikes, internal reflections, chip flaws) or good (populated with fuzzy-dot stars and galaxies and asteroids and stuff). Let's say the typical image is 512x512 but it varies a lot. Because the bad features tend to be big, I'd probably like to bin the images down to say 64x64 for compactness and speed. It has to run fast on tens of thousands of images. I'm sort of tempted by the solution of adopting PlaidML as my back end (if I understand what its role is), because it can compile the problem for many architectures, like CUDA, CPU-only, OpenCL.

Subverting Privacy-Preserving GANs: Hiding Secrets in Sanitized Images Artificial Intelligence

Unprecedented data collection and sharing have exacerbated privacy concerns and led to increasing interest in privacy-preserving tools that remove sensitive attributes from images while maintaining useful information for other tasks. Currently, state-of-the-art approaches use privacy-preserving generative adversarial networks (PP-GANs) for this purpose, for instance, to enable reliable facial expression recognition without leaking users' identity. However, PP-GANs do not offer formal proofs of privacy and instead rely on experimentally measuring information leakage using classification accuracy on the sensitive attributes of deep learning (DL)-based discriminators. In this work, we question the rigor of such checks by subverting existing privacy-preserving GANs for facial expression recognition. We show that it is possible to hide the sensitive identification data in the sanitized output images of such PP-GANs for later extraction, which can even allow for reconstruction of the entire input images, while satisfying privacy checks. We demonstrate our approach via a PP-GAN-based architecture and provide qualitative and quantitative evaluations using two public datasets. Our experimental results raise fundamental questions about the need for more rigorous privacy checks of PP-GANs, and we provide insights into the social impact of these.

Modeling human visual search: A combined Bayesian searcher and saliency map approach for eye movement guidance in natural scenes Artificial Intelligence

Finding objects is essential for almost any daily-life visual task. Saliency models have been useful to predict fixation locations in natural images, but are static, i.e., they provide no information about the time-sequence of fixations. Nowadays, one of the biggest challenges in the field is to go beyond saliency maps to predict a sequence of fixations related to a visual task, such as searching for a given target. Bayesian observer models have been proposed for this task, as they represent visual search as an active sampling process. Nevertheless, they were mostly evaluated on artificial images, and how they adapt to natural images remains largely unexplored. Here, we propose a unified Bayesian model for visual search guided by saliency maps as prior information. We validated our model with a visual search experiment in natural scenes recording eye movements. We show that, although state-of-the-art saliency models perform well in predicting the first two fixations in a visual search task, their performance degrades to chance afterward. This suggests that saliency maps alone are good to model bottom-up first impressions, but are not enough to explain the scanpaths when top-down task information is critical. Thus, we propose to use them as priors of Bayesian searchers. This approach leads to a behavior very similar to humans for the whole scanpath, both in the percentage of target found as a function of the fixation rank and the scanpath similarity, reproducing the entire sequence of eye movements.