Beruflich Dokumente
Kultur Dokumente
6
Conference on Intelligent Robots and Systems
San Diego, CA, USA, Oct 29 - Nov 2, 2007
Previous frame
Current frame
012345670123456701234567 123456701234567012345670
t=2 t=3
Temporal encoding
Spatial encoding
t−2 time
t−1
t Moving area
i−1 i i+1
Static area
Code value i−1 i i+1
Code value t−2 t−1 t 234567012345670123456701 345670123456701234567012
Fig. 2. Concept of our proposed method Fig. 3. Coded structured light patterns for spatio-temporal selection
926
– G(0, y), ·· · , G(7,y) – are periodically arranged in a single Eq. (5); and we then judge the type of gray code patterns
image. The projected pattern also shifts to the left after a that is projected on the pixel (ξ , η ) at time t.
certain period of time, and the gray code pattern on the same 2) Selectable encoding types: Fig. 5 shows the types of
pixel changes G(0, y), G(1, y), G(2, y),· ·· every unit time. In gray pattern codes that are projected at a part of a certain
this manner, we can obtain a space code at each pixel along line for eight frames when the light patterns defined by
the time axis. Eq. (1) and (2) are projected. Here the vertical axis represents
time and the horizontal axis, the number of rectangular
C. Spatio-temporal selection type encoding algorithm
areas. In the figure, we show an example of spatio-temporal
1) Calibration between a projection pattern and a cap- selection type encoding at a pixel (ξ , η ); four bits are in
tured image: The shifting light pattern is projected using a the time direction and three bits are in the space direction,
projector to measure objects, and the projection results are and they are referred for obtaining a space code value. Thus,
then captured as an image by a camera whose angle of view we can obtain a space code value at a pixel (ξ , η ) in a
is different from that of the projector. For calculating space rectangular area i by using not only temporal encoding, or
codes by using such a captured image, the type of gray code spatial encoding in a single direction but also spatio-temporal
patterns expressed in Eq. (1) that is projected at each pixel selection type encoding, which refers image information both
on the captured image must be identified. temporally and spatially.
Fig. 4 shows the spatial relationship between the xy In this paper, we introduce n selectable space code values
coordinate system of a projector and the ξ η coordinate p X(ξ , η , t) on a pixel (ξ , η ) at time t, whose number of
system of a camera. We assume that the x axis and ξ axis referred bits in time and space except the specified pixel
are parallel for light patterns projected from the projector are (n−1, 0), (n−2, 1),· ·· , (0,n−1), respectively. n is the
and captured images by the camera. number of bits of space codes. p(= 1, · ·· , n) is an alternative
parameter for space coding such that the space encoding is
Camera Projector
y close to the space pattern selection when p is small, while
x
ξ
it is close to the temporal one when p is large.
F (ξ ,η )
η Here, g(ξ , η , t) is a binarized image obtained from a
camera at time t, which corresponds to coded structured light
R ( x, y )
0
patterns. We abbreviate the binarized image g(ξ , η , t) that
1 belongs to a rectangular area i as i gt , and the space code
Captured image 2
…
927
p=1 p=2 p=3 p=4
1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 Here, the operator ⊕ shows an exclusive-OR operation, and
2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1
3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 q(ξ ′ , η ′) means the ξ ′ -th and η ′ -th s × s pixels block area
4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3
t t t t 5 6 7 0 1 2 3 4
in the ξ and η direction, respectively. T (ξ , η , t 0) is initially
5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4
6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5
7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6
set to S(ξ , η , t0 − 1) at time t0, which is the start time to
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
generate space codes. The criterion D(ξ ′ , η ′, t) corresponds
p=5 p=6 p=7 p=8
1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0
to the number of pixels where the values of T (ξ , η , t) are
2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 different from that of S(ξ , η , t−1) in the block area q(ξ ′ , η ′ ).
3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2
4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3
t 5 6 7 0 1 2 3 4
t t 5 6 7 0 1 2 3 4
t The space code image T (ξ , η , t) is not always encoded
5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4
6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 correctly in the measurement of moving objects because it
7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 refers to the only information along the time axis. Mean-
Temporal Reference Spatial Reference Coding pixel while, the space code image S(ξ , η , t), which is generated
by the spatio-temporal selection type encoding defined by
Fig. 6. Types of encoding Eq. (15), is robust to errors by the motion of measured
objects. Based on the properties of the space code images,
the criterion D(ξ ′ , η ′, t) is defined with a frame differencing
6
Xt = i−4gt′ i−3gt′ igt−2 igt−1 igt i+1gt′ i+2gt′ i+3gt′ (11) calculation for every block area in order to detect the coding
errors caused by motion.
7
Xt = i−4gt′ i−3gt′ i−2gt′ igt−1 igt i+1gt′ i+2gt′ i+3gt′ (12)
Based on the criterion D(ξ ′ , η ′, t), the spatio-temporal
8
Xt = i−4gt′ i−3gt′ i−2gt′ i−1gt′ igt i+1gt′ i+2gt′ i+3gt′ (13) encoding type p = p(ξ ′ , η ′, t) is selected from the encoding
types p Xt defined by Eq. (6)∼(13) for every block area
Fig. 6 shows eight space encoding types defined by
q(ξ ′ , η ′) at time t, subsequently a spatio-temporal selection
Eq. (6)∼(13). The example of Fig. 5 is a case of p = 4. Space
type space code image S(ξ , η , t) is obtained by using the
encoding that refers only to information along the time axis
calculated encoding types for all the block areas. In this
(p = 1) is effective for measuring objects with no motion,
paper, the spatio-temporal encoding type p = p(ξ ′ , η ′, t) is
When p = 8, space encoding type refers only to infor-
decided as follows:
mation in a single projected image. The encoding type is
(D(ξ ′ , η ′ , t) = s2 )
effective in measuring the shapes of moving objects because n
′ ′ ξ ′ , η ′, t)
its accuracy is independent of the motion of objects; however, p(ξ , η , t) = nD( . (15)
+ 1 (otherwise)
s2
a disadvantage is that measurement errors may increase when
undulated shapes are measured.
Fig. 7 shows an example of our introduced criterion for
3) Adaptive selection of encoding types: Next, we intro-
spatio-temporal encoding in the case that a human opens
duce a criterion for the adaptive selection of encoding types and closes his fingers. In this figure, (a) space code image
defined by Eq. (6)∼(13), which focuses on the brightness T (ξ , η , t) by temporal reference, (b) space code image
changes in space code images. The introduced criterion is
S(ξ , η , t − 1) by temporal-spatial reference in a previous
based on the frame differencing feature, which is obtained by
frame, and (c) selected spatial encoding types p(ξ ′ , η ′, t),
differentiating a space code image T (ξ , η , t) at time t from a
are shown. Here, the value of p increases when the tone
space code image S(ξ , η , t −1) at time t −1. Here, T (ξ , η , t) becomes darker, as shown in Fig. 7(c).
is the space code image that refers only to information
It can been seen that in the area where code errors
along the time axis, and S(ξ , η , t − 1) is the space code
are caused by the motion of fingers p increases; in other
image in a previous frame selected by the below-mentioned
words, space codes that are robust to motion are selected.
Eq. (15). After differentiating, the criterion is provided for
Meanwhile, in the area where motion is slight, p decreases;
every divided square image region whose unit is s × s pixels
in other words, space codes that enable accurate shape
by calculating the summation of the differentiating result as
measurement for the static scenes, are selected. Thus, we can
follows:
select spatio-temporal encode types adaptively according to
D(ξ ′ , η ′, t) = the defined criteria based on frame differencing features.
∑ T (ξ , η , t) ⊕ S(ξ , η , t − 1). (14)
4) Calculation of three-dimensional information: In order
(ξ ,η )∈q(ξ ′ ,η ′ )
to calculate three-dimensional information from the space
code image S(ξ , η , t), S(ξ , η , t) is required to transform
Current frame Previous frame based on the relationship between the location of a camera
and a projector, as was the case in the previously-reported
coded structured light pattern projection methods. In this
= paper, we transform the space code image into the three-
dimensional information at each pixel using a method de-
scribed in reference [4].
(a) Code image (b) Code image (c) An image that showed The three-dimensional coordinate X(t) is obtained from
by temporal reference by temporal-spatial reference level of brightness change
the pixel position (ξ , η ) on the camera, and its space
code value S(ξ , η , t) is obtained by solving the following
Fig. 7. Criterion for spatio-temporal encoding simultaneous equation with a 3 × 4 camera transform matrix
928
C and a 3 × 4 matrix P,
ξ
X(t)
Hc η = C (16)
1
,
1
High-speed camera Projector with DMD
S(ξ , η , t) X(t) 1024 x 1024 pixels 768 x 768 pixels
Hp = P (17)
1 1
.
0.09s
High speed camera Projector
(1024 x 1024 pixels) (1024 x 768 pixels)
18 cm
110 cm 0.12s
37
37[cm]
cm
37[cm]
37 cm
Fig. 8. Configuration of experimental system Fig. 10. Experimental result for a rotating plane
929
0.1s 0.03s
0.2s 0.06s
0.3s 0.09s
0.4s 0.12s
Fig. 11. Experimental result for a waving paper Fig. 12. Experimental result for a moving hand
figures on the right are three-dimensional images calculated built a verification system that is composed of a high-speed
by our proposed method, and their interval is set to 0.03 s projector and a high-speed video camera, and we evaluated
in the figure. Here, we have also confirmed the acquisition the effectiveness of our proposed method by showing the
of three-dimensional shapes at a frame rate of 500 fps. results of three-dimensional shape measurement for various
The measured object is a 8.5 cm × 10.0 cm plane, which objects moving quickly at a high-speed frame rate such as
is rotated at approximately 5.5 rotations per second by 1000 fps.
a DC motor. It can be seen that three-dimensional shape
R EFERENCE S
information can be acquired even if the measured object
moves quickly. [1] S. Yoshimura, T. Sugiyama, K. Yonemoto and K. Ueda. A 48kframes/s
CMOS image sensor for real-time 3-D sensing and motion detection,
Next, we show the measured three-dimensional shape for in ISSCC Tech. Dig., 2001, pp.94–95.
a waving paper in Fig. 11, whose interval is 0.1 s. Here, the [2] Y. Oike, M. Ikeda and K. Asada. A CMOS image sensor for high-
measured object is a A4 paper (297 mm × 210 mm), which speed active range finding using column-parallel time domain ADC
and position encoder, IEEE Trans. Electron Devices, vol.50, no.1,
is waved in the vertical direction. From the figure, it can pp.152–158, 2003.
be seen that the temporal changes in the three-dimensional [3] J.L. Posdamer, M.D. Altschuler. Surface measurement by space-
shape, which are generated by wave propagation, become encoded projected beam systems, Comput. Gr. Image Process., vol.18,
no.1, pp.1–17, 1982.
visible. [4] S. Inokuchi, K. Sato and F. Matsuda. Range imaging system for 3-D
Finally, we show the measured three-dimensional shape object recognition, in Proc. Int. Conf. Patt. Recog., 1984, pp.806–808.
of a human hand whose fingers move quickly in Fig. 12; [5] D. Bergmann. New approach for automatic surface reconstruction with
coded light, Proc. SPIE, vol.2572, pp.2–9, 1995.
the interval is 0.03 s. Here the time required by a human [6] D. Caspi, N. Kiryati and J. Shamir. Range imaging with adaptive color
to change his fingers’ shape from ”scissors” to ”paper” is structured light, IEEE Trans. Pattern Anal. Mach. Intell., vol.20, no.5,
approximately 0.1 s or more. It can be seen that we can pp.470–480, 1998.
[7] M. Maruyama, S. Abe. Range sensing by projecting multiple slits
acquire the three-dimensional shape of an object that has a with random cuts, IEEE Trans. Pattern Anal. Mach. Intell., vol.15,
complicated shape, such as a human hand; further, quick no.6, pp.647–651, 1993
movments of the fingers or changes in the height of the [8] N.G. Durdle, J. Thayyoor and V.J. Raso, An improved structured light
technique for surface reconstruction of the human trunk, in Proc. Can.
back of the hand can also be detected in the form of three- Conf. Elect. Comput. Eng., 1998, pp.874–877.
dimensional shape. [9] L. Zhang, B. Curless and S.M. Seitz. Rapid shape acquisition using
color structured light and multi-pass dynamic programming, in Proc.
V. C ONCLUSION IEEE Int. Symp. 3D Data Processing Visualization and Transmission,
2002, pp.24–36.
In this paper, we propose a spatio-temporal selection type [10] J. Salvi, J. Batlle and E. Mouaddib. A robust-coded pattern projection
for dynamic 3d scene measurement, Int. J. Patt. Recog. Lett., vol.19,
coded structured light projection method for the acquisition pp.1055–1065, 1998.
of three-dimensional images at a high-speed frame rate,
which can select adaptive space encoding types according
to the temporal changes in the coded images. We also
930