Presenting a real picture as a set of numbers by managing and storing it through a computer is known as a digital image. The above definition states that a set of numbers represents a digital image. Well, then we don't know what is meant by a set of numbers. The set of numbers is referred to as the picture element, which we know as pixels. The imaging device records a number or a small set of numbers that describe some of the properties of that pixel, such as its brightness, light intensity, or color for each pixel, and these numbers are arranged in rows and columns to match the vertical and horizontal position of the pixel.
Computer Vision or CV can be defined as a field of study that aims to develop techniques to enable computers to "see" or develop "vision" and also understand the content of digital images such as photographs and videos. Images and text are all around us these days, and they encircle human society. Smartphones these days have cameras that can capture high-resolution images in just a touch. Sharing photos and videos have never been easier, thanks to social media platforms like Instagram and Facebook. Even with messaging apps like Whatsapp and Telegram, connectivity today has become much easier, and hence it also seems to be getting even simplified day by day.
The advent of social media platforms has been a catalyst for the development of digital photography that engendered a boom in vision applications. With this motivation, we introduce a large-scale dataset termed 'Photozilla', which includes over 990k images belonging to 10 different photographic styles. The dataset is then used to train 3 classification models to automatically classify the images into the relevant style which resulted in an accuracy of ~96%. With the rapid evolution of digital photography, we have seen new types of photography styles emerging at an exponential rate. On that account, we present a novel Siamese-based network that uses the trained classification models as the base architecture to adapt and classify unseen styles with only 25 training samples. We report an accuracy of over 68% for identifying 10 other distinct types of photography styles. This dataset can be found at https://trisha025.github.io/Photozilla/
There are now two ways of creating digital images with a camera. You can either follow a software-centric computational photography approach. The other way is to stick to traditional hardware-centric optical photography. The former is used with AI to help enhance the final image, the latter relies on the quality of the camera's components (e.g. The two techniques may differ, but they are not at all on a collision course.
Wouldn't it be convenient if you could see who's outside your door before you even open it? With the Wireless IP 1081P Smart Video Camera Doorbell, you can get the security and mental freedom you're looking for. It connects to your home WiFi and transmits a real-time video feed to your phone, tablet, or another device of your choice. You'll receive alerts any time movement is detected, thanks to built-in motion sensors, as well as a crisp, high-quality photo. It's even equipped with night vision, so you'll be able to see what's going on in the dark. With its built-in dual speakers, you'll be able to communicate with whoever is at the door as well.
The Bokeh Effect is one of the most desirable effects in photography for rendering artistic and aesthetic photos. Usually, it requires a DSLR camera with different aperture and shutter settings and certain photography skills to generate this effect. In smartphones, computational methods and additional sensors are used to overcome the physical lens and sensor limitations to achieve such effect. Most of the existing methods utilized additional sensor's data or pretrained network for fine depth estimation of the scene and sometimes use portrait segmentation pretrained network module to segment salient objects in the image. Because of these reasons, networks have many parameters, become runtime intensive and unable to run in mid-range devices. In this paper, we used an end-to-end Deep Multi-Scale Hierarchical Network (DMSHN) model for direct Bokeh effect rendering of images captured from the monocular camera. To further improve the perceptual quality of such effect, a stacked model consisting of two DMSHN modules is also proposed. Our model does not rely on any pretrained network module for Monocular Depth Estimation or Saliency Detection, thus significantly reducing the size of model and run time. Stacked DMSHN achieves state-of-the-art results on a large scale EBB! dataset with around 6x less runtime compared to the current state-of-the-art model in processing HD quality images.
Digital watermarking has been widely used to protect the copyright and integrity of multimedia data. Previous studies mainly focus on designing watermarking techniques that are robust to attacks of destroying the embedded watermarks. However, the emerging deep learning based image generation technology raises new open issues that whether it is possible to generate fake watermarked images for circumvention. In this paper, we make the first attempt to develop digital image watermark fakers by using generative adversarial learning. Suppose that a set of paired images of original and watermarked images generated by the targeted watermarker are available, we use them to train a watermark faker with U-Net as the backbone, whose input is an original image, and after a domain-specific preprocessing, it outputs a fake watermarked image. Our experiments show that the proposed watermark faker can effectively crack digital image watermarkers in both spatial and frequency domains, suggesting the risk of such forgery attacks.
In today's post, we are going to quickly find out why we should prefer raw images vs frequently used jpeg images when training neural nets. But before we jump into why raw images, let's first quickly revisit what JPEG images are and how they are generated. According to Wikipedia, JPEG or JPG is a commonly used method of lossy compression for digital images, particularly for those images produced by digital photography. Essentially, what it means is that while you are taking a photograph using your camera (say, mobile phone camera), the camera post-processor will automatically do some processing to "compress" the image to reduce its file size or memory footage without any perceivable quality degradation. It is these Discrete Cosine Transform (DCT) and Quantization stages that essentially affects or manipulates the ultimate image intensity values.
Super-resolution is a fundamental problem in computer vision which aims to overcome the spatial limitation of camera sensors. While significant progress has been made in single image super-resolution, most algorithms only perform well on synthetic data, which limits their applications in real scenarios. In this paper, we study the problem of real-scene single image super-resolution to bridge the gap between synthetic data and real captured images. We focus on two issues of existing super-resolution algorithms: lack of realistic training data and insufficient utilization of visual information obtained from cameras. To address the first issue, we propose a method to generate more realistic training data by mimicking the imaging process of digital cameras. For the second issue, we develop a two-branch convolutional neural network to exploit the radiance information originally-recorded in raw images. In addition, we propose a dense channel-attention block for better image restoration as well as a learning-based guided filter network for effective color correction. Our model is able to generalize to different cameras without deliberately training on images from specific camera types. Extensive experiments demonstrate that the proposed algorithm can recover fine details and clear structures, and achieve high-quality results for single image super-resolution in real scenes.
COURSE DESCRIPTION: The aim of this advanced undergraduate course is to introduce students to computing with visual data (images and video). We will cover acquisition, representation, and manipulation of visual information from digital photographs (image processing), image analysis and visual understanding (computer vision), and image synthesis (computational photography). Key algorithms will be presented, ranging from classical (e.g. ConvNets, GANs), with an emphasis on using these techniques to build practical systems. This hands-on emphasis will be reflected in the programming assignments, in which students will have the opportunity to acquire their own images and develop, largely from scratch, the image analysis and synthesis tools for solving applications.