LET'S RECREATE OLD SWEET MEMORIES THROUGH DEEP LATENT SPACE TRANSLATION

Created 2 years ago

87 Views

1 Comments

@CHITHIRACH

Deep Latent Space Translation

Restoring old photographs is very special for us as it recreates the sweet old memories. Even though time elapses, one can still evoke memories of the past by viewing them. Deep latent space translation is a powerful technique that allows for the transformation of images from one domain to another. Deep latent space translation restores old photos that suffer from severe degradation through a deep learning approach. This approach that uses deep neural networks to map the input image to a lower-dimensional latent space and then manipulate the image in the latent space to generate a desired output image. The mapping between the input and latent space is learned during training using a dataset of paired input and output images. The network consists of two parts: an encoder network that maps the input image to the latent space, and a decoder network that maps the latent space to the output image. The "latent space" is a lower-dimensional space that is learned by the network during training. This can be thought of as a compressed representation of the input image that captures its essential features.

By manipulating the image in the latent space, the network can generate a range of outputs that share the same underlying structure as the input image. The deep latent space translation can be used for a variety of tasks, including old photo restoration, style transfer, and image-to-image translation. The Deep latent space translation Produces high-quality restored images with a level of detail that is often difficult to achieve through manual restoration. This is because deep learning algorithms can learn and restore minute details that may not be visible to the human eye and can be used to restore a wide range of old photographs, including those that are faded, creased, torn, and damaged in other ways.

Fig. 1: Old photo restoration results produced: For each image pair, left is the input while the retouched output is shown on the right.

Main challenges of restoration of old photos:

1.Generalization issue: old photos contain far more complex degradation that is hard to be modelled realistically and there always exists a domain gap between synthetic and real photos. As such, the network usually cannot generalize well to real photos by purely learning from synthetic data.

2.Mixed Degradation issue: the defects of old photos are a compound of multiple degradations, thus essentially requiring different strategies for restoration. Unstructured defects such as film noise, blurriness and colour fading, etc. can be restored with spatially homogeneous filters by making use of surrounding pixels within the local patch; structured defects such as scratches and blotches, on the other hand, should be unpainted by considering the global context to ensure the structural consistency.

Different Steps of restoring images through Deep Latent Translation

Restoration via latent space translation:

Treating clean images and old photos as images from distinct domains. Translating images across three domains: the real photo domain R, the synthetic domain X where images suffer from artificial degradation, and the corresponding ground truth domain Y that comprises images without degradation. Such triplet domain translation is crucial in this task as it leverages the unlabelled real photos as well as a large amount of synthetic data associated with ground truth.

fig2: Illustration of translation method with three domains. The domain gap between ZX and ZR will be reduced in the shared latent space.

Architecture of the restoration network:

(1.) At first train two VAEs (variational autoencoders): VAE1 for images in real photos r ∈ R and synthetic images x ∈ X, with their domain gap closed by jointly training an adversarial discriminator; VAE2 is trained for clean images y ∈ Y. With VAEs, images are transformed to compact latent space.

(2.) Then, we learn the mapping that restores the corrupted images to clean ones in the latent space.

Restoration through latent mapping: With the latent code captured by VAEs, in the second stage, we leverage the synthetic image pairs {x, y} and propose to learn the image restoration by mapping their latent space. The benefit of latent restoration is threefold. First, as R and X are aligned into the same latent space, the mapping from ZX to ZY will also generalize well to restoring the images in R. Second, the mapping in a compact low-dimensional latent space is in principle much easier to learn than in the high-dimensional image space. In addition, since the two VAEs are trained independently and the reconstruction of the two streams would not be interfered with each other. The generator GY can always get an absolutely clean image without degradation given the latent code Zy mapped from ZX, whereas degradations will likely remain if we learn the translation in pixel level.

Multiple degradation restoration:

The latent restoration using the residual blocks only concentrates on local features due to the limited receptive field of each layer. Nonetheless, the restoration of structured defects requires plausible inpainting, which has to consider long-range dependencies so as to ensure global structural consistency. Since legacy photos often contain mixed degradations, we have to design a restoration network that simultaneously supports the two mechanisms. Towards this goal, we propose to enhance the latent restoration network by incorporating a global branch as shown in Figure 3, which composes of a nonlocal block that considers global context and several residual blocks in the following. While the original block proposed is unaware of the corruption area, the nonlocal block explicitly utilizes the mask input so that the pixels in the corrupted region will not be adopted for completing those area. Since the context considered is a part of the feature map, we refer to the module specifically designed for the latent inpainting as a partial nonlocal block.

Fig. 4: Partial nonlocal block. Left shows the principle. The pixels within the hole areas are unpainted by the context pixels outside the corrupted region. Right shows the detailed implementation.

Defect Region Detection:

Since the global branch of the restoration network requires a mask m as the guidance, in order to get the mask automatically, train a scratch detection network in a supervised way by using a mixture of real scratched dataset and synthetic dataset. Specifically, let {si , yi |si ∈ S, yi ∈ Y} denote the whole training pairs, where si and yi are the scratched image and the corresponding binary scratch mask respectively, we use the cross-entropy loss to minimize the difference between the predicted mask yˆi and yi ,

Since the scratch regions are often a small portion of the whole image, here we use a weight αi to remedy the imbalance of positive and negative pixel samples. To determine the detailed value of αi , we compute the positive/negative proportion of yi on the fly,

Face Enhancement:

Given one real degraded photo r, we hope to reconstruct degraded faces rf in r into a much detailed and clean version with proposed face enhancement network Gf . The classical pixel-wise translation method could not solve such a blind restoration problem well because the degradation prior is totally unknown. Here, the problem is solved from the perspective of generative models.

The face enhance network is jointly trained with previous restoration network to ensure better generalization ability, i.e., rf is the output of triplet domain translation network. We found such training scheme could effectively suppress the generated artifacts. firstly, we search the face parts of arbitrary photos, and then refine this region with proposed enhancement network. As a result of generative model, there sometimes exists colour shifting between reconstructed faces and input Pure labelled Combining both synthetic structured degradations and a small amount of labelled data, the scratch detection network could achieve great results. degraded faces. We solve this issue by histogram matching. Finally, the reconstructed face will be combined with original input photo using linear blending to produce the final results.

CONCLUSION:

Let me conclude by stating that the Deep Latent Space Translation is proved to be the best and state-of-the-art technique for the restoration of the old images so that we can recreate the old memories.

REFERENCE:

[1] "Old Photo Restoration via Deep Latent Space Translation": Ziyu Wan, Bo Zhang, Dongdong Chen, Pan Zhang, Dong Chen, Jing Liao, Fang Wen.

Comments

Please login to comment.