Beruflich Dokumente
Kultur Dokumente
First Name John Callum Middle Name Robert Rex Surname Naylor Reid Role Author Author Email john@timeslicefilms.com callum@digicave.com SMPTE Member? Yes No
Affiliation
Organization Timeslice USA Digicave Ltd Pub ID
2011 Stereo 3D Conference
Address 415 N. State St, Ste 190, Lake Oswego, OR, 97034 3 Orange Row Brighton, East Sussex, BN1 1UQ Pub Date June 20, 2011
Country USA UK
The authors are solely responsible for the content of this technical presentation. The technical presentation does not necessarily reflect the official position of the Society of Motion Picture and Television Engineers (SMPTE), and its printing and distribution does not constitute an endorsement of views which may be expressed. This technical presentation is subject to a formal peer-review process by the SMPTE Board of Editors, upon completion of the conference. Citation of this work should state that it is a SMPTE meeting paper. EXAMPLE: Author's Last Name, Initials. 2010. Title of Presentation, Meeting name and location.: SMPTE. For information about securing permission to reprint or reproduce a technical presentation, please contact SMPTE at jwelch@smpte.org or 914-761-1100 (3 Barker Ave., White Plains, NY 10601).
Introduction
The system described in this paper specifically looks at achieving sculptural photography within standard creative industry working practices within realistic budget constraints to be used in online interactive content as Free Viewpoint Media (FVM). Lets start by defining sculptural photography, as an expression of 3D scanning technology.
Lighting Scanning
Active Scanning. Lasers, IR lamps, structured filters, special purpose sensors and processors (e.g. timeof-flight) Normally constrained to be flat Usually progressive
Passive Scanning. Digital stills cameras, ranging from specialist instrumentation units, to off-the-shelf commercial products Can be what the Director of Photography wants Whole frame
Comments
Niche Market Supply Resolution of custom sensors, some of which are only 200x200. Degree to which wavelengths used behave the same as visible light.
Mass Market Supply Camera resolution, dynamic range, triggering, and sophistication of analysis algorithms
Rolling shutter effects of progressive scan restrict talents ability to move Mass market demand potential
Table 1 Comparison of Passive and Active 3D Scanning Methods From this analysis Digicave and Timeslice chose to develop a sculptural photography system based on passive scanning to better satisfy the technical, artistic, and economic criteria that will enable sculptural photography as a mass market phenomenon. The rest of this paper describes the capture system, and image analysis pipeline that has been developed for sculptural photography; and some of its constraints, and trade-offs. It concludes by discussing delivery mechanisms for sculptural photographs, and steps towards the capture of photographic sculptures in motion.
robust, at the cost of a 10% loss in maximum resolution provided by the cameras in use. The process relies on targeting each camera at a common fixation point at which we have placed a tracking target; and centering, leveling, focusing, and framing each camera in the array by eye. The array is then triggered and off-the-shelf motion tracking software used to stabilise the tracking target in the array sequence. This process produces stable-looking sequences, at the cost of lost pixels around the border of each individual frame. The entire initial calibration can be accomplished by an experienced crew in less than an hour. The level of calibration achieved by this process is good enough to produce stabilized sequences for on-set preview, but the stabilized sequence is not used for the extraction of 3D models. This 2nd level of calibration is covered later in this paper. Staying calibrated is as important as getting calibrated in the first place, and is accomplished by paying attention to numerous practical details such as ensuring that the camera supports are stable, and isolated from sources of vibration, wind, and rapid changes in temperature.
Configuration
How many cameras? Where do we put them? Where do we point them? These are the three questions that drive the configuration of a sculptural photography rig. The number of cameras is driven by resolution, and subject size: the higher the resolution, the fewer cameras are needed. And the smaller the subject matter, the fewer. With todays 12 Mega-Pixel (MP) devices, and a 2m diameter action area, we get good results with a 6m diameter circular rig that has 36 cameras fixated on the center of the circle, with a camera height of 1m to get even vertical coverage of the (mainly 2m tall human) subjects. Note that all cameras are located at equal spacing on the rigs equator. Somewhat counterintuitively, we have found that satellite cameras at higher latitudes or the poles are not needed to capture form and image of human subjects, and it is quicker and easier to build and calibrate rigs that do not contain them. We have experimented with other configurations and cannot claim that the simple arrangement described above is in any way optimum apart from its ease of assembly and operation.
On-set Preview
Converting an image sequence into a photo-realistic 3D model is currently heavily compute intensive, so its important that the sequences that are input to the process are going to match the clients expectations. For this purpose we have developed a facility to view each stabilized rig capture within 20s of each take, so that clients can quickly select the shots they wish to move forward to the next phase of the pipeline: image processing.
Model Extraction
Extracting 3D models from picture sequences that have been captured using the methods described above is achieved using this sequence of operations using based on algorithms developed by Hernndez (2010): Calibration Create a low complexity visual hull of the subject Refine the mesh
Apply the texture as captured Touch up both the model, and textures by hand, if necessary
The following figures illustrate the first four stages of this pipeline.
Calibration
This is achieved using the same plates that were used for the on-set preview stabilization described above. The calibration routine in this part of the workflow calculates a solution for each camera in the rig (position, gimbal angles, focal length) based on photogrammetric analysis of the tracking target. With a solidly constructed rig and stable environmental conditions, it is only necessary to recalibrate the rig after 3 hours use. The process typically takes 5 minutes.
Initial Capture
Figure 2 Hull, Datamap, and Refinement Figure 2 illustrates the rough visual hull that is initially extracted from the array capture.
Reading left to right, the next picture is a map of the amount and quality of the point cloud data that has been inferred by the algorithms first pass. Notice that there is a paucity of data in the area below the knees of the subject, which is indicated by dark areas. In contrast from the waist upwards, the algorithm has a large amount of good point cloud data with with to attempt refinement of the rough visual hull. The results of the refinement step are shown in the third picture. Note the improved detail around the arms and hands, the shirt collar and the corrected orientation of the head. Despite looking like one, the final picture in this sequence is not a photograph, but a rendering of the refined model with the texture applied.
Figure 3 Low Data Management Figure 3 shows in close up the areas with gaps in the point cloud data, indicated by the dark areas in the first two pictures. The third picture illustrates how the algorithms heuristics create a smooth, even mesh over these areas, and in the main, produce convincing results, especially on low complexity areas such as illustrated here. Its important to produce an even mesh to achieve the best results when the model is rendered.
Figure 4 illustrates the level of skin and facial detail that can be captured using 12 Mega Pixel cameras at a range of approximately 3m from the subject. The first image is a direct capture from one of the cameras. It shows that skin and facial hair detail is adequately captured for 2D photographic purposes, and that this level of detail is also adequate when used to texture the extracted model when the distance between it and the virtual camera is similar to that of the initial capture. The center image in this sequence illustrates the ability of the pipeline to extract a large amount of good positional data from the subjects facial area.
Figure 5 Present and Future Resolution Performance Reading Figure 5 from left to right shows where the current state of the art starts to hit its technological limits. By pushing the virtual camera closer to the rendered subject, we see, in the first picture, that a noticeable amount of skin detail has been lost. The second two pictures in this figure were also captured with 12MP cameras, but at a much closer distance to the subject to emulate the performance of 25MP cameras in the large rig. They convincingly demonstrate that the algorithm will produce much better output with higher resolution input.
Motion Capture
Figure 6 Capturing Motion Figure 6 illustrates the results obtained with a subject that is moving in a dynamic manner. Note the integrity of the form and its freedom from rolling shutter artifacts. This is a starting point for full motion capture in 3D.
Figure 7 Compression Effects Figure 7 illustrates the dramatic perceptual improvement that is gained by using self-lit models. Both the models shown comprise roughly 6000 polygons, compressed from a 200,000+ polygon original. The one of the left is self-lit, the one on the right uses modeled lighting. This effect can be exploited to make Internet delivery of sculptural photographs both convincing and efficient.
Conclusion
In this paper we have shown that the capture of photo-realistic 3D models is practical using passive 3D scanning techniques, coupled with photogrammetric analysis and commercially available digital cameras. We have demonstrated the limits that the state of the art imposes on the ability to capture skin detail with high fidelity, but also shown that this will likely be overcome by the next generation of DSLRs. Finally, we have demonstrated the viability of the content to be experienced interactively, via efficient, Internet delivery to underscore our claim that what we term Sculptural Photography is a practical, new form of digital content. Glimpsing the future, we plan to extend the technology to full motion sculptural photography. With the challenges being to increase frame rate achievable by a camera array to initially produce frame-based motion capture. Parallel algorighmic development will deploy Artificial Intelligence methods to generate Inverse Kinematic metadata from the forms captured, thereby delivering models that are tractable to key-frame animation and motion path editing.
References
Hernndez, C and Vogiatzis, G. (2010). Shape from Photographs: A Multi-view Stereo Pipeline. Computer Vision: Detection, Recognition and Reconstruction, Cipolla, Battiato, Farinella (Eds.), 2010 Springer-Verlag. Kanade, Takedo et al (1997). Constructing Virtual Worlds from Real Scenes. J. Multimedia Vol 4, Issue 1, IEEE. Los Alamitos, CA.
9
Macmillan, Tim (2010). Stereo Image Acquisition using Camera Arrays. International Conference on Stereoscopic 3D for Media and Entertainment. SMPTE. New York.
10