[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for All-day Vision

Multispectral Transfer Network:
Unsupervised Depth Estimation for All-day Vision
AAAI 2018, New Orleans
Namil Kim*, Yukyung Choi*, Soonmin Hwang, In So Kweon
KAIST RCV Lab / All-day Vision Team
*Equal contributions

Problem definition
Why we are interesting in depth?
“Crucial information” to understand the world around us
*From NVidia
It is necessary to 3D understanding for self-decision making

Problem definition
How do we usually get “dense depth”
in any time of the day?
RGB-Stereo 3D LiDAR
DayNight
≤ 11.45m≥ 23.89m
4 points
2 points
LiDAR
0.16°
Sensitive Sparse

Problem solution
3D LiDAR
DayNight
Thermal
(LWIR )
Depth Estimation
from a single thermal Image
How do we usually get “dense depth”
in any time of the day?
RGB-Stereo

Related works
Single image based depth estimation
 Supervised depth estimation
 Unsupervised depth estimation
 Semi-supervised depth estimation
Supervised depth estimation
Supervised [NIPS’14, CVPR’15, ICCV’15, NIPS’16, PAMI’16]
Semi-supervised [CVPR’17]
Unsupervised [ECCV’16, 3DV’16, CVPR’17]
Unsupervised depth estimation
Semi-supervised depth estimation

Idea to all-day depth estimation
Day Night
Illumination change
RGB
O X
Unsupervised
Learning
Unsupervised
Learning

Day Night
Illumination change
RGBThermal
O X
Robust to illumination change
Unsupervised
Learning
Unsupervised
Learning

Day Night
Illumination change
RGBThermal
Alignment
O X
Thermal-to-depth
#1
#2
Unsupervised
Learning
Unsupervised
Learning

Day Night
Illumination change
RGBThermal
Alignment
O X
Thermal-to-depth
Adaptation
Robust to illumination change
Unsupervised
Learning
Unsupervised
Learning

Requirements #1
Multispectral (RGB-Thermal) dataset
 RGB stereo pair
 Alignment between thermal and RGB(left)
 3D measurement
Yukyung Choi et al., KAIST Multispectral Recognition Dataset in Day and Night, TITS’18

Requirements #2
Multispectral (RGB-Thermal) Transfer Network
 Aim: Thermal to depth prediction
 Data: Thermal and aligned left RGB
(+ right RGB, stereo pair)
 Model: unsupervised method
RGBThermal
Alignment
O
U.S.L
Thermal-to-depth

Proposed framework
What is Multispectral Transfer Network?
@Supervised method @Unsupervised method
@MTN method

Key Ideas of Proposed MTN (Overview)
1) Efficient Multi-task Learning
Predicting Depth, Surface Normals and Semantic Labels
with a Common Multi-Scale Convolutional Architecture,
ICCV2015.
Without annotated data:
Propose an efficient multi-task methodology
Depth and Chromaticity
- surface normal
- semantic labeling
- object pose annotation
* Most of works under an indoor.
(difficulty of collecting sources of
subsequent task in outdoor)
Multi-task learning for
depth estimation
No human-intensive data
Relevance to the depth
Contextual information

Key Ideas of Proposed MTN (1/4)
Predicting Depth, Surface Normals and Semantic Labels
with a Common Multi-Scale Convolutional Architecture,
ICCV2015.
- surface normal
- semantic labeling
- object pose annotation
* Most of works under an indoor.
(difficulty of collecting sources of
subsequent task in outdoor)
Previous works:
No human-intensive data
Relevance to the depth
Contextual information
Our work: Chromaticity
1) Efficient Multi-task Learning
Without annotated data:
Propose an efficient multi-task methodology

Interleaver Module:
to directly interleave the chromaticity into the depth estimation
“Skip-connection meets Inter-leaver for the feature learning”
Encoder Decoder
Multispectral Transfer Network (MTN)
2) Novel Module for Multi-task learning
Thermal Input
Disparity Output
Chromaticity Output
Conv.
DeConv.
Interleaver
Skip Connect.
Forward flow

1. Global/Un-Pooling + L2 Norm.
 Enlarge receptive field [ParseNet] + feature transformation
2. Gating mechanism
 Control the degree of the effectiveness of another task
to the main task. (especially in back-propagation).
3. Up-sampling and adding to previous output
Equipped in every skip-connected flows
(fully-connections between layers)

 Do not have to find an optimal split point or
parameters.  <c.f.,(b), (c), (d)>
 Reduce adverse effects from inbuilt sharing
mechanism.  <c.f.,(a), (b)>
 Optimize the same strategy as the general multi-task
learning in end-to-end manner.  <c.f., (d)>
 In the inference, the Interleaver unit can be
removed.  <c.f., (d)>
(a) Fully Shared Architecture
(c) No shared Architecture (d) Connected Architecture
(b) Partial Split Architectures
Previous Multi-task Learning Our Multi-task Learning

3) Photometric Correction
“Thermal Crossover”
Thermal-infrared image is not directly affected by changing lighting conditions.
However, thermal-infrared image suffers indirectly from cyclic illumination.

Propose the adaptive scaled sigmoid to stably train the
model as the bilinear activation function.
From the initial smaller maximum disparity 𝛽0,
we iteratively increase the value 𝛼 at each epoch
to cover the large disparity level in end of training.
According to the derivative,
this is not stable for large quantities in initial stages
4) Adaptive scaled sigmoid function

Experimental results: Day
MTN
GT
ColorThermal
Single Task LsMTN DsMTN MTN-P DIW [NIPS’16]
Without
Binary error map (error > 3 pixels)
[Eigen, NIPS2014]
[DIW, NIPS2016]
Daytime
1~50m Methods
STN LsMTN DsMTN MTN-P MTN STN-RGB Eigen-RGB Eigen-T DIW-RGB DIW-T
Distance *Lower is better
RMS 7.7735 6.6967 6.3671 7.0058 6.0786 7.5876 10.1792 10.2660 6.4993 6.4427
Log RMS 0.2000 0.1801 0.1761 0.1951 0.1714 0.2094 0.2386 0.2384 0.1934 0.1967
Abs. Relative 0.1531 0.1325 0.1259 0.1413 0.1207 0.1570 0.1992 0.1976 0.1644 0.1697
Sq. Relative 2.2767 1.6322 1.4394 1.7251 1.3119 2.0618 4.0629 4.0835 1.8030 1.7543
Accuracy *Higher is better
δ<1.25 0.8060 0.8358 0.8407 0.8040 0.8451 0.7772 0.7551 0.7561 0.7956 0.7825
δ<1.252
0.9337 0.9492 0.9544 0.9440 0.9557 0.9378 0.8965 0.8947 0.9482 0.9454
δ<1.253
0.9776 0.9842 0.9855 0.9827 0.9868 0.9806 0.9612 0.9618 0.9842 0.9851

Experimental results: Night
MTNSingle Task MTN-P DIW [NIPS’16]
Without
Nighttime
1~50m Methods
STN LsMTN DsMTN MTN-P MTN STN-RGB Eigen-RGB Eigen-T DIW-RGB DIW-T
Ordinal Accuracy *Higher is better
ξ<10 0.3233 0.3405 0.3745 0.3096 0.4666 0.2508 0.1728 0.2033 0.1404 0.3744
ξ<20 0.6237 0.6855 0.6820 0.6225 0.7026 0.3284 0.2442 0.6178 0.3176 0.7459
ξ<30 0.7317 0.7753 0.7797 0.7397 0.7757 0.3592 0.3064 0.7516 0.3805 0.8401
[Eigen, NIPS2014]
[DIW, NIPS2016]
GT
ColorThermal

Experimental Videos
Experimental Videos
Colors are mapped for visualization
This 3D information is from single monocular thermal image
Only the red part is used for inference

Conclusion
𝑰𝒏𝒕𝒆𝒓𝒍𝒆𝒂𝒗𝒆𝒓
in every skip-connected layer.
1. Pooling mechanism + L2 Norm.
(enlarge receptive field)
2. Gated Unit via Convolution
3. Up-sampling
 Employ multi-task learning for depth estimation
 Novel architecture for multi-task learning: Interleaver
 Photometric correction is helpful to deal with a thermal image.
 Adaptive sigmoid function help stable converge.

http://multispectral.kaist.ac.kr
You can download Dataset & Code
Thank you
Q & A

[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for All-day Vision

Recommended

More Related Content

What's hot (20)

Similar to [AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for All-day Vision (20)

Recently uploaded (20)

[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for All-day Vision