What are the most important points when overlaying CG images with real-space images?
- Aratani
- Well, the most important, yet difficult, point is the alignment of the real-space images and the CG overlays. For example, let's say we want to show a CG of a vase sitting on top of a real table. It won't look realistic if the vase is floating in the air or embedded in the table. What we need to do is to accurately align the position of the real space and the virtual object. Alignment has been a vital proposition of Canon's MR technology for many years, with research and development being continued even now.
How is alignment achieved?

- Aratani
- Alignment is achieved through determining the position and orientation of the video cameras mounted on the HMD in order to capture images from the user's point of view, with the CG images superimposed over the actual object. This position and orientation are then computer generated as the position and orientation of the virtual cameras, allowing for the alignment of the real-space images with the CG images.
We can divide alignment technology into two main parts: the vision system and the sensor system. The vision system makes use of the cameras mounted on the HMD. We use registration markers specially developed by Canon, which are placed on the real object in advance in order to measure the three-dimensional position. Then, when we align the real object with the CG images to create an MR experience, the markers are detected from the images that appear in the camera. We can then calculate the current position and orientation of the camera relative to the three-dimensional position determined earlier.
So, you need real-space measurement data and reference points, right? In that case, couldn't you use something like a pattern on the floor or a picture on the wall?
- Aratani
- In principle, that would be possible. From the standpoint of appearance and providing a natural experience, the use of patterns and pictures offer many advantages. But, because detection accuracy is not as good as when using markers, it's difficult to make use of them in an actual situation.
I see. What about the sensor system?
- Aratani
- There are many different sensors, such as gyro-sensors, infrared position and orientation sensors, and magnetic-type position sensors. Taking magnetic sensors as an example, however, they are vulnerable to the effects of magnetic fields around them, so depending on where they are used, their accuracy may not be very good. Therefore, we use a hybrid vision and sensor system, which uses markers photographed by the cameras to correct for errors in the sensor measurements, providing highly accurate position matching.
OK, so we now understand the position matching, but the user's position and orientation are always changing, right? How do you deal with that?
- Aratani
- The cameras mounted on the HMD shoot at 30 frames per second. The markers are detected from the image in each frame and matched with the sensor measurements to determine the camera's position, allowing for any changes in the user's movements. In actuality, there is a delay of about two to three-thirtieths of a second in the image display, but this delay is not enough to affect the experience.
- Matsui
- This delay also includes the time required for CG rendering.
What kinds of things have you done to improve the rendering speed?
- Matsui
- In addition to software technology, there are many methods to achieve high-speed rendering. One particularly effective way is to enter the CG data into the graphics board memory. Improvements have been made in the way data is written to the board and the way the rendering commands are written. But despite this, just like computer-aided design (CAD), many problems can occur when dealing with a huge amount of original CG data, often running in the tens and hundreds of millions of polygons (the basic element of CG rendering).
In terms of securing the position matching and rendering speed, does the length of cable connecting the HMD to the PC have any effect?

- Nakanishi
- Yes, it has a major effect. The cables we use are about 8 mm thick. In order to display the left and right images and to transmit the camera images, a total of four channels, the limit of this technology is about 10 m in length. Any longer than this and the transmission speed would be so slow we wouldn't be able to use it. But, if the cable is too thick, the HMD is going to feel very heavy and awkward for the user. One solution we're thinking of is to compress the data and transmit them wirelessly, but we're still working on this.

