David Drascic
Human Engineering Research and Consulting
241 Logan Avenue
Toronto, Ontario
Canada M4M 2N2
This article appeared in a slightly different form as:
"Stereoscopic Vision and Augmented Reality",
Scientific Computing and Automation 9(7), 31-34.
I have repaired some of the (minor) editing, and updated some
terminology, but it is essentially identical to the published
version.
Introduction
Future military operations will take place in increasingly hostile
environments. As technology advances, many non-military operations are
extending into hazardous environments as well, such as the ocean bed, the
interior of volcanoes, and outer space. Efficient deployment of teleoperated
and telemanaged robots will be essential for successful interaction with these
environments. Autonomous robotics, where the robot is capable of acting
without human intervention, is far from being achievable in unstructured
environments such as battlefields, bomb disposal scenarios, weapons handling,
and hazardous materials management. For the foreseeable future, remotely
controlled systems will depend on human intelligence and perception.
The effectiveness of human-machine systems is often determined by the quality
of the human-machine interface. Unfortunately, most existing telerobots are
equipped with standard monoscopic video (MV) displays as the main source
of information to the operator. MV displays eliminate all binocular depth cues
(i.e. eye convergence and disparity), as well as several monocular depth cues
(i.e. texture gradient). The loss of these important depth cues results in
situations where the location of objects in the remote scene is ambiguous.
While motion parallax or multiple views can sometimes resolve these
ambiguities, operating conditions may render these options unfeasible.
A related problem is the difficulty in estimating absolute sizes with a MV
system. It is difficult to determine whether an obstacle is too steep to
climb, or if a depression is deep enough to present a hazard. One British
study reported that using standard MV systems made bomb squad personnel
reluctant to use their remote manipulator. (Robinson, M. "Remote control
vehicle guidance using stereoscopic displays", Proc. Human Factors Society
Meeting, 1984)
Human Engineering Research and Consulting (HERC) recently investigated the
benefits of using 3-D, or stereoscopic video (SV) for teleoperation
applications in the Canadian Armed Forces. SV provides an immediate and
compelling sense of depth, which can greatly simplify teleoperation tasks
requiring delicate manipulation.
Stereoscopic Video Application Research
Stereoscopic video systems use two cameras to pick up images from two slightly
different perspectives, one for each eye of the operator. The display system
must channel these two different images to the appropriate eyes. The most
practical system, employing standard television equipment, uses an
alternating field approach. The images from the left and right cameras
are displayed alternately on the monitor. Special glasses are equipped with
liquid crystal shutters that switch from opaque or clear. These shutters are
electronically synchronised with the monitor, so that the left eye only sees
the image from the left camera, and the right eye only sees the image from the
right camera.
Since 1987, Prof. Paul Milgram of the Department of Industrial
Engineering at the University of Toronto and David Drascic, under
contract for the Defence and Civil Institute of Environmental Medicine
(DCIEM), have conducted a number of experiments at the University of
Toronto to investigate the benefits of SV for novice operators
attempting typical defence-oriented telerobotic tasks. In one
experiment, subjects performed a positioning task related to bomb
disposal teleoperation that required careful alignment of the telerobot
in depth. The difficulty of the task was varied by changing the
precision requirements. The results indicate that operators need
considerably less training to become proficient at this type of
telerobotic task, and can perform faster and with fewer errors when
using an SV display.
At the lowest level of difficulty, it was found that the benefit of SV
faded as subjects repeated a single task again and again. However,
whenever the task changed, the advantages of SV were once again
immediately apparent. At the highest levels of difficulty, the
performance advantages of SV were found even after subjects had
performed the same task many times. Since defence-related
teleoperation tasks, such as bomb disposal and hazardous materials
management, are all characterised by an unpredictable and changing
environment, operators will not have the luxury of repeating a task
several times. Thus even for very simple tasks, it is reasonable to
expect the benefits of SV to be significant and important. For
difficult tasks, it can mean the difference between success and
failure.
More recently, Human Engineering Research and Consulting (HERC), in
conjunction with DCIEM, conducted an investigation into the benefits of
using SV for teleoperation applications in the Canadian Armed Forces
for experienced telerobot operators. Using several tasks related to
bomb-disposal teleoperation, these experiments showed that even expert
operators perform better when using SV. More importantly, the
operators strongly preferred SV to MV, judging it highly desirable for
a variety of tasks, and rating it more usable and more comfortable to
use than a comparable MV display.
Stereoscopic Video Systems
All the research described above was performed using an NTSC-based SV
system, originally developed by Milgram and Drascic, and later updated
at DCIEM. This system uses standard cameras, monitors and video
equipment. The SV signal is a standard video signal that can be
recorded with any VCR. This system can be implemented for under
US$4,000 without cameras. NTSC monitors have an image refresh rate of
60 Hz. Using the alternating field SV technique, each eye sees only
half of these images, and thus has a 30 Hz image update rate. As a
result there is a perceptible flicker in the image that many operators
find distracting at first. Nonetheless, operators of all skill levels
adapted very quickly to this SV system, most strongly preferring it to
the MV system. No eye-strain attributable to the SV system was
reported even after several hours use; in fact, most operators rated
the SV display more comfortable and more usable than the original MV
display.
Until recently, the high cost and technical complexity of flickerless
SV systems has limited their use, but the recent introduction of 120 Hz
SV systems has made it possible to consider these systems for a wide
range of new applications. Several different systems are available,
ranging in price up to US$15,000. DCIEM has obtained one of these
systems and is considering it as an alternative to the low-end NTSC SV
system. It is expected that the flicker-free display will be more
easily accepted by the operators and should result in greater user
satisfaction with the display. Initial results are encouraging, but
cross-talk (seeing the right image with the left eye, and vice versa)
due to phosphor persistence in the 120 Hz monitor is distracting. It
remains to be seen whether the lack of flicker will outweigh the
greater cross-talk and considerably greater expense.
Augmenting Reality with ARGOS
Improving the display of a telerobot is only one aspect of the
human-machine interface. Another very important aspect is the method
used to communicate human goals and instructions to the telerobot.
Most telerobots in use today are almost entirely manual, requiring the
constant attention of the operator. Great strides have been made in
giving telerobots a certain degree of intelligence at executing
low-level tasks. Robots have been created that are capable of driving
from one location to another while avoiding obstacles, or reconfiguring
a multi-joint manipulator to move the end-effector to a new location.
In order to use one of these systems in an interactive telerobotic
situation, the operator needs to be able to communicate precise 3
dimensional co-ordinates to the telerobot. Such co-ordinates may be
known or defined in well-specified environments, such as a laboratory,
but until recently there was no practical technique available for
specifying such co-ordinates in the field.
Since 1989, Drascic and Milgram have been breaking new ground by
combining computer generated stereoscopic graphics with live
stereoscopic video (SV), a technology they dub ARGOS, which means
"Augmented Reality through Graphic Overlays on Stereo-video". Using
ARGOS it is possible to create virtual objects that appear to exist in
the video image. By generating a carefully calibrated virtual
pointer of some sort, and allowing the operator to adjust the
position of this pointer in the three dimensional video space, it is
possible for the operator to indicate a precise destination for the
telerobot, or to indicate a path for it to follow. Positioning a
virtual pointer is a much simpler task than driving a telerobot. Using
such an interface would reduce operator workload considerably.
An experiment was conducted to determine how accurately subjects could
align a virtual pointer with real world targets. This experiment
showed that the calibration of the graphics with the video was
successful, and that subjects could align the virtual pointer
essentially as well as they could a real pointer in the video space, at
the limits of their depth perception as determined by the display
system, i.e. one pixel.
ARGOS is the foundation of the University of Toronto's Augmented
Reality system. Much media attention has been devoted to the
phenomenon of Virtual Reality, which generally entails immersing
people in completely artificial computer-generated worlds, using as
many different senses as possible to complete the illusion. By
contrast, Augmented Reality does not attempt to create a virtual
world; instead, its goal is to allow the user to perceive the real
world more clearly and with greater understanding than is possible
using ordinary vision.
Several different kinds of Augmented Reality systems exist. ARGOS is
one of the simplest and most robust, because it uses a standard monitor
as the stereoscopic display device. Other augmented reality systems
use immersive head-mounted displays, but there are many perceptual and
calibration issues that remain to be resolved before these systems can
be used by industry.
Since the virtual pointer can be used to specify single points in the
remote space, it is a simple extension to create a virtual
tape-measure, so that the operator can make measurements of the
locations and sizes of remote objects.
As a further example of Augmented Reality, consider a space-going
telerobot. All video images in space suffer the same problem with
shadows: because there is no air in space to scatter light, shadows are
completely black, and anything in shadow is completely invisible.
However, since the dimensions of everything sent into space are very
well known, it is possible to use ARGOS to generate the missing images,
carefully drawn to appear at the correct location in the video
image.
In other situations, objects that may be invisible to normal vision may
be detectable with other sensors. In many underwater situations,
normal vision is good only for a very limited distance. While it is
easier to see through murky depths with SV than with MV, operators are
still very limited. However, using radar and sonar and infra-red
cameras, it is possible to sense objects that would otherwise be
invisible. If the information from these sensors is sent to the ARGOS
computer, appropriately shaped graphic objects can be drawn at the
correct position in space, in effect making visible what is normally
invisible.
Similarly, information from various medical imaging sensors, such as
CAT, PET, and MNR scanners can be used to generate graphic images of
the interior of the human body. These images can be super-imposed onto
a live video image of the body using ARGOS, and seen in three
dimensions, providing a clear advantage of systems that use flat
two-dimensional displays.
Improving the human-machine interface of telerobots will enable them to
fulfil the myriad tasks they will be facing in the future.
Stereoscopic Video and Augmented Reality can greatly improve the
feedback of information from the remote machine to the human operator,
and tools such as the Virtual Pointer can greatly facilitate the
communication of human instructions to the machine.
The Author
David Drascic has been working the field of telerobotics and
stereoscopic displays since 1987. He received his MASc in Industrial
Engineering from the University of Toronto in 1991. He founded Human
Engineering Research and Consulting (HERC) in 1990. Further
information on the reserach described herein can be obtained by
contacting Prof. P. Milgram, Industrial Engineering, University of
Toronto, 4 Taddle Creek Road, Toronto, Ontario, Canada, M5S 1A4.
|