We developed an automated video colorization framework that minimizes the flickering of colors across frames. If we apply image colorization techniques to successive frames of a video, they treat each frame as a separate colorization task. Thus, they do not necessarily maintain the colors of a scene consistently across subsequent frames.
The proposed solution includes a novel deep recurrent encoder-decoder architecture that is capable of maintaining temporal and contextual coherence between consecutive frames of a video. We use a high-level semantic feature extractor to automatically identify the context of a scenario, with a custom fusion layer that combines the spatial and temporal features of a frame sequence.
Additionally, we developed a dataset for video colorization, along with a benchmark.