Mobile Eye - Matthew Gaunt

Summary Over the past year or so, there has been a large growth in the sales of smartphones[16], enabling
users to be connected to the internet where ever they are. The major selling point of these devices is the power of the processors and the thrid party application stores, available on a number of platforms. But its widely believed that the bottleneck of mobile devices is the screen size, limiting the amount of data that can be presented to the user[27, 41] and this is leading to some manufacturers beginning to embed projectors into their devices to give users a new method of displaying and sharing information [13, 12, 5]. This project is aimed at researching into the possibilities of using such a device with a steerable projector to give the user an automatic way to enhance their shopping experience. What this would involve is the user wearing their mobile device around their neck, such that the projector and camera would be projecting outward in front of them. Then as the user picks up items of interest, an image processing algorithm would register the product and the system would search for a suitable projection area to project relevant information (i.e. ratings, reviews, related products). The potential areas a product like this could be used are any situation where a user may wish to be giving information but doesnt want or need to give input into the system. Apart from the retail use already mentioned, engineering use where someone doesnt have a free hand to use a seperate device but would benet from other information. If a company were to invest in this product it would be best to create a smartphone application for some of the leading smartphone platforms, which would use a traditional touchscreen interface. This would be followed by research and development into the use of projector phones, integrating it into the existing application. By doing this the company can sell the product on a software as a service model, to retailers as early as possible. Once projector phones become more widely adopted, the company could introduce the system. I would expect such a company would require an investment of 1.3m and to be sold after 5 years for approximately 15m. This is based on working in the UK and selling the service to UK retailers only. There is currently no product that oers the features of an online e-commerce store, in a retail store setting. After discussing the product with a top e-commerce consultant, he said that retailers are very keen to enter into the smartphone market, showing their is a customer base for such a product. For retailers, projecting information to the user is a new method that can be used to show targetted advertising and show ratings of products to the user. This should improve sales since research has shown that customers are willing to pay 20-99% more for a 5 star rated product than a 4 star product [9]. Current research in such systems have been focused on highly set-up and modelled environments, using much larger projector systems or the focus has been on user interaction with the projecting device. This research will be into the development of a passive ubiquitous system with focus on presenting information to the user, expecting no user input. I wrote an Android Application (in Java) which communicates with a laptop (via Bluetooth) to control a GUI (in Python) Learnt to program in python and handle threading with a python GUI I created an algorithm which aims to nd the largest empty in a camera view (See page 36) 1
Implemented a client-server protocol to use an image recognition technique (See page Created custom hardware to create a steerable projection
Contents
1 Motivation 1.1 Inspiration . . . . . . 1.2 Application Areas . 1.3 Specialist Knowledge 1.4 Benets . . . . . . . 1.5 Aims and Objectives 1.5.1 Structure . . 5 5 6 7 7 8 8 9 9 9 9 11 11 12 12 13 15 15 16 18 18 18 19 19 21 21 21 22 23 24 26 26 26 27 28 28 28 28 30 31
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
2 Related Work - Projection 2.1 Projection Technology . . . . . . . . . . 2.1.1 LCD Projection . . . . . . . . . . 2.1.2 Digital Light Processing (DLP) . 2.1.3 Scan Beam Laser Projection . . . 2.1.4 Holographic Projection . . . . . . 2.2 Projector Form Factor and Design . . . . 2.2.1 Projector Phones . . . . . . . . . 2.2.2 Steerable Projection . . . . . . . 2.3 Keystoning . . . . . . . . . . . . . . . . 2.3.1 Embedded Light Sensors . . . . . 2.3.2 Smarter Presentations . . . . . . 2.4 Projection Uses and Interactions . . . . . 2.4.1 View & Share . . . . . . . . . . . 2.4.2 Search Light . . . . . . . . . . . . 2.4.3 Projection Technology Summary 2.5 MobileEyes Projection System . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
3 Related Work - Object/Image Recognition 3.1 Image Recognition . . . . . . . . . . . . . . 3.1.1 SIFT . . . . . . . . . . . . . . . . . . 3.1.2 Indexing Scale Invariance . . . . . . 3.1.3 SURFs - Detector . . . . . . . . . . 3.1.4 SURFs - Descriptor . . . . . . . . . 4 Smartphones 4.1 Camera Technology . . . . . . . . . . . . 4.1.1 Well Adjusting Capacitors . . . . 4.1.2 Multiple Capture . . . . . . . . . 4.1.3 Spatially Varying Pixel Exposures 4.1.4 Time to Saturation . . . . . . . . 4.2 Platform . . . . . . . . . . . . . . . . . . 4.2.1 Dalvik Virtual Machine . . . . . 4.2.2 Application Development . . . . . 4.2.3 Security, Intents and Receivers . . 3
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
4.2.4
Bluetooth and Camera api . . . . . . . . . . . . . . . . . . . . .
31 33 33 33 34 34 36 36 37 38 38 40 42 42 42 42 43 43 43 43 43 44 44 45 45 51 52
5 Project Execution 5.1 Tool and Language Choices . . . . . . . 5.1.1 Mobile Application . . . . . . . . 5.1.2 Hardware for Projection Rotation 5.1.3 Projection UI . . . . . . . . . . . 5.1.4 Object Recognition . . . . . . . . 5.2 Space Finding Algortihm . . . . . . . . . 5.2.1 Histogram Data . . . . . . . . . . 5.2.2 Hill Climbing . . . . . . . . . . . 5.2.3 Area Extraction . . . . . . . . . . 5.2.4 Application Structure . . . . . . . 6 Project Status 6.1 Current Status . . . . . . . . . . . . . 6.1.1 Projector UI . . . . . . . . . . . 6.1.2 Android Application . . . . . . 6.1.3 Image Recognition . . . . . . . 6.1.4 Hardware . . . . . . . . . . . . 6.1.5 Aims Achieved . . . . . . . . . 6.2 Future Work . . . . . . . . . . . . . . . 6.2.1 Depth . . . . . . . . . . . . . . 6.2.2 Automatic Object Recognition . 6.2.3 Space Finding Algorithm . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
A Space Finding Appendix A.1 First Averaging and Thresholding Tests . . . . . . . . . . . . . . . . . A.2 Averaging and Thresholding Improvements . . . . . . . . . . . . . . . . A.3 Dalvik . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter 1 Motivation
There are a number of factors into the motivation of this project. Smartphones are becoming more powerful and the platforms which run on them are becoming more stable and open, oering third party developers access to the devices hardware through standard and reliable apis. These mobile devices are able to store and play high resolution media, yet the screen size acts as a bottleneck[27, 41]. New projection technologies have made it possible to put projectors inside mobile devices. Manufacturers are intending the use of these projectors for viewing media and overhead projection [48], but research in the HCI eld has been on-going into new interfaces that could be created. The aim of this project is to research into developing a ubiquitous computing system that will take advantage of this new hardware, to recognise relevant products in a users scene (from the mobile devices camera) and nd a suitable area with-in which to project relevant data to the user. From this point onward, when I refer to the MobileEye system, I will be referring this system, a projector phone, capable of recognising items and projecting relevant information in the users environment.
1.1
Inspiration
One of the biggest inspirations of this project, is the gestural interface commonly known as SixthSense [39] (Previously called Wear Ur World[40]). It comprises of a camera and projector to project information onto the users surroundings and the camera is used to interact with the display through gestures. This is an impressive and thought provocking implementation of such a technology. However its hard to gauge the stage of the implementation from the video demonstration. There are many functionalities which appear staged. The main method of interaction with the system is through gestures which are registered by colour nger caps. The caps enable the user to perform tasks such as take pictures by making a frame gesture, draw on a projected screen, watch videos on a news paper and much more [37, 38]. But each of these require some form of a visual marker. The use of visual markers means the implementation can be done eectively, gives reliable projection area (where appropriate) and gesture recognition. By using the markers it eases the required image processing to determine a hand gesture from ngers alone or determine suitable projection areas. The implementation is done on a laptop computer carried around on the users back, the reason for this may have been to enable the use of existing software (e-mail client, drawing applications etc) which was shown 5
Figure 1.1: Examples of the Sixth Sense Features. Notice the nger caps and markers used for each interation. in the demonstration, but this does hide any restraints that might be applicable to a mobile device. The area of interest I was most intrigued about was the possibility of the system projecting onto newspapers, books and products, similar to that shown in [38]. Sixth Sense implements this, but projection is done directly onto the book cover, regardless of what is covering the projection area and the newspaper is calibrated for the projector by the use of markers (as seen in D of 1.1). MobileEye is an attempt to address these problems, to consider where (or even when) is best to project information and is there a way of achieving this without altering the environment the system is used in.
1.2
Application Areas
Because personal projection is such a new technology and the research into methods suitable for its use are still developing, their are no products of this kind in the market. Some manufacturers have begun adding projection to their devices to try and overcome the screen size bottleneck, such as the Samsung Halo mobile phone and Nikon S1000pj digital camera [13, 12, 5]. The MobileEye system is a product in itself, suitable for use in a retail setting, when a user picks up a product, a book for instance, it would be desirable (on behalf of the customer and the retail store) to project information about the book, such as ratings and related products. This will encourage the user to buy products with a high rating and if the product has a low rating, they can be recommended other more popular choices. But the whole MobileEye product could be used in any environment or situation which requires or benets from additional information based on the users current scene. Other such examples are engineers who dont have hands free but require reference material, the projecter can project this information onto the users surroundings without the users input, it enables them to keep their focus on the task at hand. The keystoning and space nding techniques are suitable for use in the existing consumer products previously mentioned, since at present the projection is static and 6
it is the users responsibility to move and calibrate the projector accordingly.
1.3
Specialist Knowledge
To complete this project there needs to be a number of areas researched into to cover each component. Projectors and HCI Mobile projection is a new research area with a number of papers targetted at how people interacting with these devices. I need to look into what research has been done and whether aim of this supports or goes against the method of interaction I, proposing where the projection gives information to the user expecting now input from the user. This would also include any research into steerable projection which may inuence the design of the hardware or highlight any key problems that may occur. Object Recognition I expect this to be an exetremely broad topic that will have a number of alternatives each with varying levels of success. The aim will be to look into some of these choices and make an informed decision as to the most suitable method for MobileEye. It is worth pointing out that I expect this image processing to be outsourced to a computer with stronger computational capabilities than what is available to the mobile device. Space Finding Algorithm This is the opposite of the object recognition algorithm, in that the aim will be to nd a part of the image view with no noise. I would expect this to be developed from a number of other methods / techniques that will be used to inspire the algorithm used for MobileEye, which will be targetted at running on a mobile device. Each of these topics will be discussed in the relevant chapters of related work.
1.4
Benets
There seems little research into Sixth Sense like devices, the main focus is on using projector phones in a stationary way, i.e. the used has to stand in front of the suitable surface and project some information onto the wall or overlay some information. So Im aiming to create a basic implementation that could then be built upon and changed by others to test their own take on MobileEye as well as add to it. The space nding algorithm obviously isnt limited to a mobile device and may have applications in other research areas. But I do think there is some interesting research possibilities into interacting with multiple projectors on a single screen where each projector only takes up a section of the available projection space. At the moment a number of projects aim to stitch together projection views to create a single projection area, but what Im suggesting is that you let a projector be responsible for a small section of the available space. I wouldnt expect this research to benet consumers directly, but as I have said, I hope researchers may nd specic aspects of this useful or wish to develop the system as a whole further. 7
1.5
Aims and Objectives
Research into the possibility of having a space nding algorithm to aid projection Implement and improve such an algorithm to run on a mobile device Perform a study into the feasibility of a ubiquitous system requiring no user input Research into image processing algorithms for object recognition Implement object recognition suitable to run or be used by a mobile device
1.5.1
Structure
Discuss structure of the paper - Related work, development of implementation, future work.
Chapter 2 Related Work - Projection

2.1 Projection Technology
There are a number of technologies that are used for producing and projecting images onto a surface. Since MobileEye is intended to be used in a consumer device, there are a number of key factors required to make it suitable. This includes the size, power consumption and the brightness.
2.1.1
LCD Projection
LCD projection is one the most widely used technologies in larger projector systems, although arent use in smaller form factor projectors. LCD systems function by taking a bright light source and passing it through a number of dichroic mirrors. Each dichroic mirror acts as a light lter, allowing only certain wavelengths (colours) through the mirror, while reecting others[29]. By passing the light through the dichroic mirrors, the light source is divided into red, green and blue beams. These beams are then passed through 3 individual LCD chips which creates 3 dierent coloured versions of the image. These images are passed into a prism and combined to form the projected image[29]. See gure 2.1 for an illustration.
2.1.2
Digital Light Processing (DLP)
Digital light processing (DLP) projection techniques are the most commonly implemented system in consumer devices [48]. DLP works by shining a light source onto a set of micro-mirrors. These mirrors are formed together into a digital micro-mirror device (DMD) (see 2.2). When a digital signal, such as an image or video, is passed into the DLP system, the mirrors change angle with respect to the amount of light needed to create a greyscale version of the image. Each mirror moves thousands of times a second, the more an individual mirror is switched on, the lighter the pixel will become[31]. This method so far only introduces a grey scale image, to introduce colour, a color wheel is inserted between the light source and the DMD. The mirrors then move synchronously with the colour wheel, projecting a number of dierent coloured versions of the image[29]. The human eye will then merge these colours to the intended colour. See gure 2.3 for a full diagram. One problem DLP projection suers from is that if the speed at which the colour wheel spins is too slow, it can cause adverse eects to some viewers. The problem is known as the rainbow eect, this occurs when the colours appear to be oscillating, i.e. the red, green and blue images dont merge but appear one after the other[46]. It 9
Figure 2.1: LCD Projection
Figure 2.2: Digital Micro-Mirror Device (DMD)
Figure 2.3: Full DLP System 10
Figure 2.4: Scanned Laser Beam Projection is also possible to have more than RGB colours on the colour wheel, however the speed the colour wheel rotates will probably need to be sped up to ensure the rainbow eect doesnt occur.
2.1.3
Scan Beam Laser Projection
PicoP R is a propriatry engine to display images using a Scanned-Beam Laser technique and works as follows: Three lasers (red, green and blue), each have a lense close to their output. The lense provides a low numerical aperture (in lasers this means the lasers point is dened and not blurred/faded over a large area). Each laser (red, green and blue) is joined with dichroic elements into a single white beam. The Dichroic elements act as a color lter, allowing certain wavelengths to pass through, while making others reect. This white beam is relayed onto a MEMS (Micro-Electro-Mechanical System) Scanning mirror, which is used to scan over the surface[50]. Each set of lasers essentially creates a single pixel of the correct color and establishes the focus. The 2D MEMS scanner then paints the pixels of the image, by moving to position the laser beam correctly for each pixel[25]. See gure 2.4 for an image illustrating the process. The advantage of this technique is that is has unlimited focal length, something which traditional projectors suer from[30], has low power consumption and is suitable to be implemented in small form factors. A problem laser projection can suer from is speckle contrast. Speckle occurs when a number of waves of dierent phases add together to give a wave with high intensity[54]. This is prominent in laser projection and gives the viewer a percieved lower image quality, although there are methods to prevent this from occuring.
2.1.4
Holographic Projection
Light Blue Optics has come up with a method of holographic projection, something which has previously been infeasible due to its computational complexity. However LBO (Light Blue Optics) have found a way to use the holographic projection method in real time. 11
Figure 2.5: Holographic Projection Traditional holographic projection works by taking a hologram h(u,v), which is often a xed structure refered to as a diraction pattern, illuminating it with light of wavelength which is then passed through a lens. An image F(x,y) is produced at the back focal length of the lense due to the relation between the hologram and its discrete Fourier Transform shown in 2.5[21]. Fxy = F[huv ] (2.1)
The main problem with this approach is that to calculate the hologram h(u,v) you would need to use the inverse Fourier transform of the image, which would give a fully complex result and there is no microdisplay that can handle this information (where the microdisplay would need store h(u,v)). LBOs approach was to quantise this result (make h(u,v) a set of phase only results), making it feasbile to use on a microdisplay, the cost of this is reduced the image quality. The lose in qualtiy is overcome by displaying a number of versions of a single video frame at a fast enough rate that the human eye would blend the images together to create a high quality image. This is applied to each frame to give high quality video projection. Like scan beam laser projection, this technique adds colour to the display through the use of red, green and blue lasers and has unlimited focal length. While LBOs holographic technique does still suer from speckle contrast, the method of displaying multiple versions of a single video frame, gives eective speckle reduction in itself. From this additional speckle reduction methods can be applied. LBOs system is energy ecient as it turns lasers o if they arent required, eye safe and has a small form factor.
2.2
2.2.1
Projector Form Factor and Design

Projector Phones
An interesting aspect of projector phones is where should a projector and camera be placed on such a device. Since these devices are so new, this is something that should be considered to oer the most exible and suitable option. Enrico Rukzio came up with in interesting mapping of possibile positions and uses for each possible component position. Overlayed onto this is is the possible uses of each component and which of these uses are currently researched into and what the manufacturers expect the system to be used for[48], see 2.6. The most interesting thing with this is that most research with these devices is happening with layouts of components not used by manufacturers. This is clearly in part to the simply fact that the intended use of this products are extremely dierent to the uses that are being researched, but this does pose a barrier to getting some of these research techniques into consumer products. 12
Figure 2.6: Camera Projector Placement There is one form factor ignored from this diagram, which is steerable projection. Most steerable projection methods that have been implemented are on traditional, larger projectors (discussed below) but these designs could meet 2 or more of these projector positions while still only requiring a single projection component. It may also be possible to switch the position of cameras through the use of steerable mirrors as well, this would enable the manufaturers to satisfy a large number of uses from a single device. Alternatively a form factor which moves the entire projector could be implemented similar to the WowWee Cinemin Swivel pico projector, shown in 2.7.
2.2.2
Steerable Projection
One of the biggest restrictions on projector systems, regardless of the technology used, is that the image is projected directly forward. There are some obvious problems and advantages of using mirrors to reect and move the target projection. The Everywhere display is a research project at IBM [44], where a steerable projector with a camera was used to change any surface into a touchscreen interface. A common problem with projection is the projected image becomes distorted if the projector is not orthoganal to the projection surface. This is an evident problem with steerable projection as the surface being projected onto is changing. In the Everywhere disyplay this problem was initally overcome by pre-computation and 3D modelling of the environment[44], from this 3D model, they treated the projector as a camera in the 3D world, then placed a texture map of the image they wished to see and used this information to create the pre-warped image that when projected in the real world, would appear un-distorted[43]. This method was time consuming and future research in the project lead to a dierent approach being implemented. The method proposed was using a paper pattern and placing it in the scene so a transformation P can be made between the camera and the surface by nding the four corners of the paper pattern[45]. The relationship 13
Figure 2.7: WowWees Cinemin Swivel Projector[56]. between the paper pattern corners and the corners observed by the camera is dened by: P C=B (2.2)
Where C is the matrix of the four corners from the point of view from the camera and B is the matrix of points on the paper pattern. Intuitively this makes sense, given any point in the camera frame, if we apply some transformation we should be able to calculate the relative point on the projected surface (where the paper pattern lies). To calculate P , the pseudo-inverse is computed (The pseudo-inverse is the inverse of a complex matrix of size mxn). P = BC T (CC T )1 (2.3)
This only gives us the relation between the camera and the projector surface, but to calibrate the projection, a relationship H between the camera and projector is needed. This can be obtained by projecting a pattern consisting of four points D. When this pattern is projected it will give four points on the projected surface, set E points. From this E can be observed by the camera giving the points F in the camera frame. we know the points E are related directly to both the projectors frame and the points viewed the camera. So if you view this as a transformation needed to change the original points D to get the points E, we can dene the relationship as P D = E. The relationship to the camera frame is then given similarly P F = E, where the relationship between the camera and surface (P ) has already been calculated. From this we know P D = P F , the pseudo-inverse is then calculated, which can then be used to calulate H by taking the inverse of P . P = P F DT (DDT )1 H = DDT (P F DT )1
(2.4) (2.5)
From this the transformation can be applied to warp the mesh. While this is similar to the keystoning correction needed for this project, it would be preferable not to be restricted to the requirement of having a paper pattern on the projected surface. 14
Figure 2.8: Everywhere Display Prototype A third alternative suggested in the paper was to nd four corresponding points in two 2D projective spaces, which were then manually adjusted to get a good version of the pre-warped image. This is again unsuitable for MobileEye as we dont want the user to interact with the system. In terms of hardware for the Everywhere display, a movable circular mirror was placed over the projection lense. The hardware was constructed using the system from a disco light, which was controlled through a host computer. For an image of this, see 2.8. The aim of the Everywhere Display was to use its features in retail and workspace setting. A similar research project [22] used a steerable projector to project on elements on the room it was placed. This projector was xed to a moving base and the surfaces suitable for projection were predened. The interesting thing with this work was that the space was divided into small square blocks. From this the items that were to be projected were assigned to the most suitable block, giving good results for small areas. An example of this is shown in 2.9 An alternative to pre-computing a 3D model of the environment was proposed in [19], which didnt create a 3D model, but rather pre-computed spaces in its environment during intialisation through image processing techniques, again using projected markers to calibrate.
2.3
2.3.1
Keystoning
Embedded Light Sensors
In [33] a keystoning calibration technique was successfully achieved by placing light sensors at each corner of interest in the object which is being projected onto. A pattern is then projected onto the surface, these patterns are black and white and will hence give the light sensors bright or dark values which are relayed back to a host PC. This then gives a highly accurate calibration which can be applied to planar surfaces or 3D objects. The reason for using multiple patterns is that it narrows down the position of the points of interest by using horizontal and vertical bar patterns which decrease in size. This calibration can be done in just under a second (although it is hoped that this can be reduced with high speed projection technology). The method scales up well with the resolution of the projection. After the patterns have been projected a homography matrix can be created which 15
Figure 2.9: Steerable projection - annotation by using a grid layout to decide where elements belong is used to correctly keystone the image. This method again suers from the same issues as the Everywhere display in that is requires the manipulation of the environment to function, as well as needing a way to relay the data from the sensors to the mobile device. It does however oer some interesting characteristics for 3D projection.
2.3.2
Smarter Presentations
The most relevant paper for keystoning for MobileEye is [52, 53] where it is assumed that the intrinsec parameters of the camera and projector are unknown and that the projection surface is at. These are the same assumptions that I would expect to make for MobileEye. The technique of keystoning is achieved by projecting a calibration pattern (similar to the method proposed in [47]), then nding common points of the projection surface on the camera frame. From this a homography can be made, providing there are four or more common points which are both known on the projection and camera frames. Given a projected image we take some point Cp = (xp , yp ) in a projector imagem which is projected onto some unknown point on a planar surface. All we know is there exists some transform that can be applied to Cp to get the point on the projected surface Cs . If we then observe this point from the camera we are given a point in the image frame Cc = (xc , yc ) which has a transformation from the surface to camera. The same as suggested in the everywhere display [45]. Now because both the projective views of the projector and camera are viewing the same points on the surface, theres a homography between the 2 frames. So treating the projector as a camera, Cp = HCc and we obtain the homogenous coordinate as: 16
x1 H11 H12 H13 xc x2 = H21 H22 H23 yc x3 H31 H32 H33 1 Where xp = x1 /x3 and yp = x2 /x3 . The above equation can be re-written as: xp = H11 xc + H12 yc + H13 H31 xc + H32 yc + H32 H21 xc + H22 yc + H23 yp = H31 xc + H32 yc + H32
(2.6)
(2.7) (2.8)
This can be re-arranged further to: aT h = 0 x aT h = 0 y where: h = (H11 , H12 , H13 , H21 , H22 , H23 , H31 , H32 , H33 )T ax = (xc , yc , 1, 0, 0, 0, xp xc , xp xc , xp )T ay = (0, 0, 0, xc , yc , 1, yp xc , yp yc , yp )T (2.11) (2.12) (2.13) (2.9) (2.10)
Now given a set of four or more points, we can create a linear system of equations Ah = 0 and solve the problem in a least squares method.

A=
aT c x1 aT c y1 . . aT c xN aT c yN
(2.14)
Now by writing the sum of squares error of Ah = 0 (Sum of squares is a mathematically method to calculate the deviation of a set of points from the mean),we get: 1 (2.15) f (h) = (Ah)T (Ah) 2 When this is multiplied out and the derivative of f with respect to h is taken, the result is AT Ah = 0. From this we can obtain h as the eigenvector corresponding to the smallest eigenvalue[32]. This homography is then used as a pre-warp calibration to the projected content. Further renements of this are then achieved by projection a more complex pattern. This paper goes on further to then keystone the image not from the point of view of the camera, but from the point of view of the audience (viewing on a projector screen), which is done by image processing techniques. However MobileEye assumes that the user and the cameras perspective are similar enough, not to edit the warping beyond this homography. One nal point, that is worth mentioning, this paper also includes a method for interacting with the projected screen using a laser pointer. This isnt going to be a concern with MobileEye, but could lead to a richer interaction technique for users of the system. 17
2.4
2.4.1
Projection Uses and Interactions

View & Share
There are obvious concerns with the use of projection technology in that there is a shift between private and public interactions. A good illustration of using public and private data is in the implementation of View & Share, a mobile application which enables users to view photos and media on a projected surface, while other users can download photos from the Presenter freely and easily[27]. In the View & Share application, there is a public and private option. By changing to a private viewing session, all participants are forced to view and share images on their mobile device, turning o the projection. This is an obvious and required feature, but to build a truely pervasive computing experience, there needs to be a strict rule set on what is acceptable to project. Besides illustrating this barrier of private and public interaction, it also illustrated use of sharing data as well as hardware, through projection, downloading and connectivity to essentially lend out the projector. View & Share gave control to a participant in the group[27], but it could be possible to give a portion of the projectable area to each user with their own projector phone. If laser projectors are truely able to turn o pixels as suggested in [25] it could be possible to use a space nding algorithm to divide the available space of a projector to multiple parties, leading to some interesting functionalities and interactions.
2.4.2
Search Light
One particularly common interface technique for projector phones which is emerging is the analogy of using a projector phone as a torch light in the dark to reveal dierent information. A good example of this is [34] where a paper map is annotated with Google Latitude markers of the users friends. To see all of the friends the user moves the projector phone around the map to reveal more information. Now this technqiue has been applied before but on the devices screen, meaning the user has to switch attention from the background (the paper map) and the display. In a study aimed specically at the use of the torchlight interface for annotating points of interest on a map, it was found that the projector method was faster than a magic lens approach [51]. It was proposed that this was because users no longer had to change their attention from the background and screen display. Its interesting to consider how this interaction might be applicable to this research project. MobileEye intends to remove the user participation in the positioning of the projector their are a few scenarios this may play a factor: The user changes their orientation to move the projection across the projected surface, wanting this feature. The user takes hold of the device wishing to switch and make use of this mode. With the rst idea, the compass or features of the scene (from the camera) could be used to determine movement. I would expect that this would be quite unnatural to the user and would only be required for a few limited applications. However the second method would be alot more natural to the user, at this point I would expect the user to hold the device such that the screen was facing up and the projection would be straight ahead. Then from this state the user could interact with the devices screen and projection screen independently. 18
One such suggested method of interaction is the idea of using the projector phone as a search light with a cross hair in the centre of the screen and then using a button click to select an item the crosshair is hovering over[48].
2.4.3
Projection Technology Summary
DLP is the technique implemented in most consumer products (pico projectors, mobile devices etc.). This is most likely because of its maturity. However it makes sense to use a laser based projection technique as it oers a projection with unlimited focal length, something that DLP needs done manually. While it is dicult to gauge the pros and cons of a scan-beam projection system compared to holographic projection, it is clear that both of these systems oer the right features to be used in a consumer device (low power consumption, unlimited focal limit, small form factor). The only dierence that can be noted is the extra speckle reduction in LBOs system.
2.5
MobileEyes Projection System
There has been a large amount of research into steerable projection and interfaces for mobile projector phones, but there seems to be little, if any, overlap between the two areas. One of the biggest and un-researched problems that the MobileEye system will suer from is that the intention will be to project on unknown surfaces in an unknown environment. Most steerable applications over come this by having a statically placed projection systems (i.e. the projector is steerable but only ever kept in one environment) and then pre-computing suitable areas or using a 3D model to determine suitable areas. With the release of new projector phones and the rst ever workshop on personal projection having commenced on May 17th 2010, an algorithm that could oer better selection of projection areas could oer a number of new uses and applications in this research eld. After nding a suitable space within which to project, a calibration step would be ideal. This would need to be done before each projection attempt, because each surface would be new. The method for the calibration will be the same technique used in [53] which requires no intrinsic parameters of the camera or projector. The reason for using this previous work over other alternatives is that this requires no alteration of the environment, unlike [39, 45] and no need for pre-computation. I would expect that only an initial estimate of the calibration would be needed, as this will speed up the process, but I will also be making the assumption that the keystoning will calibrate the projection with respect to the camera, which the user will be wearing. This means a satisfactory result should be given for the user. The best projection technology for this project is obviously either a scan-beam laser or holographic projection method. The reason being that they oer a projection that is always in focus (A problem that DLP and LCD methods suer from) but also because they are suitable for use in consumer products. The most suitable form factor would be to have the projector and camera on the back of the device since a user will be expected to wear the device as a pendant. Ideally, exibility would be given by having the projector projecting upwards, with a steerable mirror over this, with have some mechanism of completely retracting the mirror, enabling the projection to project straight ahead (i.e. up). This then allows the user to have the option of naturally holding the device and using the device in a torch light method. 19
The existing interaction techniques may lead to using device as if it had seperate states where the device can be left passively to give the user information or be in a state to be used for interaction. This would be useful as if the projection system does give the user some interesting information which the user then wishes to act upon, the state can change to still use the projector (if suitable).
20
Chapter 3 Related Work - Object/Image Recognition

MobileEye will need some method for recognising products and objects which will be used to obtain information and project onto the users environment. The main method would be to attempt to match an image with previously seen or stored images. In this chapter a number of possible algorithms to perform this task will be discussed.
3.1
Image Recognition
The task of identifying correspondences between two images can be divided into three parts: Interest points are selected from the image, this will be things like corners, blobs (areas of an image which are brighter or darker than their surroundings[17]) and T-Junctions. Each interest points surrounding pixels are used to create a neighbourhood vector, known as a descriptor . The nal stage is matching, giving two similar images, the descriptors should be able to indicate a reliable match.
3.1.1
SIFT
Scale Invariant Feature Transform (SIFT) and is based on observations of neurons in the temporal cortex of a mammal[35]. SIFT works as follows: First a Dierence of Gaussian is applied to the image, this is then used to nd local maxima and minima as points of interest. By blurring the image, the keys are partially invariant to local variations like 3D Projection. Dierence of Gaussian is the method of subtracting one blurred images from another blurred image (where each image is blurred by a dierent amount). This is an approximation of Laplacian of Gaussian (which is a blob detector). These keys are then indexed using a nearest neighbour approach, which uses a best-bin-rst search method. Best-bin-rst search is an algorithm based on the kd-tree data structure. 21
Figure 3.1: The top row is the same image at dierent resolutions The Kd-tree data structure is a binary search tree where each node is an ndimensional hyperplane. This gives some useful properties for handling nearest neighbour data structures when trying to nd the closest neighbour. Best bin rst search performs a similar search to kd-trees but performing depth rst search, comparing each node to determine which branch to traverse along. However instead of back tracking as done by kd-trees, it just selects the closest, and in SIFt it was found good results are optained by weighting certain nodes and then only traversing along the tree to the 200th depth. From these reliable keys, a hough transform is used to cluster keys with similar pose. A Hough Transform has been used in the to identify arbitrary shapes, by looking at the set of pixels identied as part of a line or shape and creating the appropriate shape [18]. In the SIFT algorithm the Hough Transform is used to group similar keys into bins. The nal bins are sorted into size order and least-squares is used to match an image point to an ane transformation. By matching the model with the image feature, outliers can be found and removed. Once all outliers are removed, if 3 models remain, a match has been identied. The key advantage of SIFT compared to its predecessors is its scale and illumination invariant. It can also handle partial occlusion as well and was developed with speed as a main concern.
3.1.2
Indexing Scale Invariance
This method was aimed at creating a suitable detector that would be scale, rotation and ane invariant while maintaining a suitable method to index the images, making it easier to look up matching images. The detector proposed in [36] handles scale invariance by creating several resolutions of an image, where each resolution is created by applying a Gaussian kernel which blurs the image. Then given a point in an image a characteristic scale is found by calculating the result of some function F, which creates the scale space (i.e. applies the gaussian kernel), and these results are used to nd a local maximum. This function F can be any one of a number of functions applied to each scaled image and their work showed that laplacian gave the most promising results. To help understand this look at 3.1. So far the methods described was using all the points that satised the property, the character scale between to images was equal to the scale factor between the two images. This however revealed a number of in-accuracies and unstable results, so interest points were used instead of all the possible points. 22
Figure 3.2: An example of integral images. The value of A is simply A, B = A + B, C = C + A and D = A + B + C + D. The interest points were selected using a Harris detector (Corner and edge detection) [28], since it gave reliable results and works with rotation and scale invariance. From these interest points the Laplacian function was applied to test the matches over the scale space. From this matching points could be found with good reliability. Indexing of these points is then achieved by turning each interest point of an image into a descriptor, maintaining rotation invariance by using the gradience direction of the point.Matches between points are determined using the Mahalanobis Distance (which is a statistical method used to measure the distance of unknown and known sets, by considering the correlations between them). The image retrieval from a database is achieved by a voting system, where each point of a query image that matchs a point in the database gives a vote for that image. The image with the most votes is then determined the most similar.
3.1.3
SURFs - Detector
SURF uses a method referred to as Integral images, originally proposed by [55]. The idea behind integral images is that the values of some pixel x=(x,y) is calculated as the sum of all pixels to the left and above the pixel. Doing this can be done in one sweep of the original image. See 3.2. SURFs detector was based on the Hessian-Laplacian detector used in [36] (referred to as harris-laplacian in [36]). The proposed detector for nding points of interest is called Fast-Hessian. Given a point X = (x, y) in some image I, the hessian matrix is dened as: H(X, ) = Lxx (X, ) Lxy (X, ) Lxx (X, ) Lyy (X, ) (3.1)
for the point X and scale , where Lxx (X, ) is the convolution of the Gaussian second order derivative x2 g() with the original image I at point X (Similarly for Lyy and Lxy ). This is the Laplacian Gaussian, SIFT proposed the approximation of this (Dierence of Gaussian) could be used and this gave successful results. So Fast-Hessian uses an approximation of the second order derivaties to give box lters, which can be eciently calculated using integral images. These approximations of the second order Gaussian Derivatives for = 1.2 can be seen in gure 3.3. Each box lter is named Dxx , Dxy and Dyy respectively. Now that these lters have been created, they can be applied to the Image I. Because a lter of any size can be applied eciently to an integral image (i.e. any lter can be applied with the same number of operations), there is no dependency on waiting for the rst application of the lter before applying a second lter, which is traditionally the case. This is important as it means the scale space can be created in parallel giving an speed increase. 23
Figure 3.3: SURF Box lters, the left side is the discretised and cropped Gaussian second order partial derivatives and the right hand two are the box lter approximations. The lter sizes in SURF increase in size (9x9, 15x15, 21x21, 27x27 etc) due to the form of the box lters and the discrete nature of the integral images. As the lter sizes begin to get larger, the increase between each layer increases as well, this is double for each octave (6, 12, 24). So for the rst octave you may have 9x9, 15x15, 21x21, 27x27 and then the next octave would have 39x39, 51x51 and so on. This has the property that the Gaussian derivatives scale with the lter size. For example the 9x9 lter was the approximation of = 1.2 Gaussian second order derivative which is the scale factor s (i.e. s = 1.2). So for the 15x15 lter = 15 x1.2 = 2 = s. 9 Interest points are then selected by applying a non-maximum suppression to a 3 x 3 x 3 neighbourhood (3 x 3 pixels in 3 images of the scale space). This is simply a method used to detect edges by seeing if a pixel is a local maxima along a gradient direction [24]. The maxima of the Hessian determinant are then interpolated in the image and scale space, which is a nal technique to give us the points of interest.
3.1.4
SURFs - Descriptor
To store a description of each interest point, a Haar Wavelet response is taken in the x and y direction in a neighbourhood of 6s around the interest point (where s is the scale the interest point was detected). The size of the wavelets is 4s, which means that the wavelets size can become large with high scale factors, because of this integral images are used again. The responses of the wavelets are weighted against a Gaussian centred at the interest point with = 2.5s and represented as a vector. A sliding window of 60 degrees is then used to sum up the horizontal and vertical vectors to give a new vector. The longest new vector is the choosen orientation of the interest point. The window size of 60 degrees was choosen experimentally. There is a proposed version of SURF called U-SURF, which skips this step. The idea being that everything would be roughly up, its dicult to say whether this is applicable to MobileEye or not, while the user should be wearing the camera in such a way that we could programmaticly rotate the image so everything was approximately up, there is no data shown to the tolerance of any rotation (i.e. a product slighty rotated left or right to the camera might cause big problems). Now we have an interest point and an orientation, a box is created around the interest point with size 20s (s = scale factor). This is then divided into a 4x4 grid of sub-regions and each sub-region is given 5x5 evenly spaced sample points. Again Haar wavelet responses are taken along the horizontal and vertical directions (in relation to the region orientated around the interest point). These responses are weight by a Gaussian position at the center of the interest point with = 3.3s. These responses are summed up for each sub region. The to include information about the polarity of the intensity changes (light to dark vs. dark to light) the sum of absolutes of each sub regions sample points are extracted. This is then used to create a feature vector: v=( dx , dy , | 24 dx |, | dy |) (3.2)
where dx is the sum of vectors of each sub region along the horizontal axes and dy on the vertical axes. Over the 4x4 sub-region for each dx and dy , v has a length of 64. The reason of setting the sub-regions to a size of 4x4, is again through experimenting for the best results. The nal detector element used is the trace of the Hessian matrix for the interest point, which gives the sign of the laplacian, this is just a small addition that gives a good speed increase during matching.
25
Chapter 4 Smartphones
4.1 Camera Technology
This may seem like an insignicant point, but there is one very apparent feature in camera technologies that have proven to be a helpful feature in the space nding algorithm developed for this project, which requires acknowledging for any future implementation. CCD and CMOS are the two main competing camera technologies, both with a number of advantages and disadvantages. Here I will be focusing on the techniques used to improve the dynamic range of these sensors. Dynamic range is the ability for a chip to handle both bright and dark areas in an image. Its hard to produce a good example of this in picture form, but in 4.1 you can see an illustration hidden by a shadow. In the left hand image the illustration is barely noticeable, however in the right hand image the illustration is clearly visible. The lighting was not changed, only the angle of the camera. What is happening is that the scene is exceeding the dynamic range of the image sensor, causing the image to clipped in the dark and/or bright region. There are a number of methods that can be used to improve the dynamic range of these chips, some of which are outlined below.
4.1.1
Well Adjusting Capacitors
This is a method created originally for CCD sensors but has been applied to CMOS sensors[23]. Both CCD and CMOS active pixel sensors work in direct integration[26]. Direct integration is when a circuit is reset to a voltage VReset , before exposure to light.
Figure 4.1: Dynamic range of the Nexus One Smartphone. 26
Figure 4.2: Direct integration circuit on the left and saturation levels of the charge on the right.
Figure 4.3: Example of the saturation levels of a single well capacity adjustment. During the exposure to light, the photocurrent (current produced by photons hitting the sensor) drains a capacitor CD . After a set exposure time the negative charge of this capactor is read. In gure 4.2 you can see the diagram of the circuit on the left. On the right is a diagram of the charge read after the integration time (or exposure time) tint . Well adjusting capacitors operate by changing the saturation value of the circuit over the exposure time. This is achieved through the use of a control gate which alters the clipping value. The clipping value is the amount of charge that can be stored, after the charge exceeds this, the extra charge is passed through the control gate to a sink region (ground). The clipping value acts as the saturation value Qsat and is determined by the magnitude of voltage given to the control gate. This voltage is altered according to some function which can be dened for the best results[49]. Figure 4.3 has an example of the well capacitance being adjusted once during the exposure time.
4.1.2
Multiple Capture
Multiple caputre, simply takes several photos of the same image with varying exposures, bright parts of the image are captured by short exposure times and dark areas 27
Figure 4.4: Spatially varying pixel exposure, e0 , e1 , e2 and e3 each represent a dierent exposure time for a pixel, where e0 < e1 < e2 < e3 , giving a set of images varying in space and exposure. are captured on long exposure. These images are then merged together and some research has shown that averging over the images can give good results[26].
4.1.3
Spatially Varying Pixel Exposures
This technique sacrices spatial resolution for high dynamic range images by assigning multiple sensors to each pixel. A lter is then applied to each sensor, giving 4 values for a single pixel, each with a dierent exposure, so a bright pixel will likely have a maximum value and a lower value and a dark pixel will likely have both a zero and non zero value [42]. A further explanation of this is given in 4.4.
4.1.4
Time to Saturation
This method of increasing the dynamic range works by calculating the time it takes for each individual pixel to reach the saturation point. This is achieved by giving each pixel a local processor which is triggered by the magnitude of the light intensity [20]. Once light hits a sensor it calculates the saturation time and the shortest saturation time determines the exposure, which is applied to all the sensors. However having a processor per pixel can increase the pixel size to an unacceptable amount [26].
4.2
Platform
Here I am going to discuss the Android platform and some unique features it has relevant to the MobileEye implementation. I wont be covering why the Android platform has been selected, that is discussed on page 33.
4.2.1
Dalvik Virtual Machine
Google originally started with exetremely tight constraints on its expectation of what hardware Android could run on, setting the target of having 64MB of RAM for its low end devices. This has lead to a number of design choices to enable applications to run using as little memory as possible since the platform is intended to enable multitasking. Its expected that after all the high level processes (libraries and background services) have been started there is only approximately 20MB of memory left from the original 64MB. The rst major change the Dalvik Virtual Machine does is transform the compiled jar class les into a .dex le format. Traditionally n Java classes will generate n class le when compiled. A dex le will merge these together and generate a share pool 28
Figure 4.5: Small example of Zygote memory sharing. of constants which is shared more than a set of class les. A share pool is where constants are stores, things like strings, methods, types etc. A simple example of this being done on a set of class les is shown on page 52. This gives impressive results in terms of the le size of a dex le, often being created at least 2% less than a compressed jar le (dex les are later compressed into an android package .apk les). Like other OSs, memory is divided into dierent kinds and Dalvik identies the following 4, clean shared, clean private, dirty shared and dirty private. Clean memory is simply data that the OS knows it can drop or replace without fear of disrupting the system (i.e. data back-ed up by a le that can be re-read in). In the clean shared and clean private memory are the libraries and application specic dex les. In the private dirty memory, you have the memory address space assigned to that specic process, which serves as the application heap. For the shared dirty memory Dalvik has a process called Zygote, which loads the classes and libraries it believes will be used by a number of applications, preventing each one loading its own version of the library. Zygote is responsible for creating a fork whenever a new application is to be started and the forked child process then becomes the main application. This is described as a standard unix fork, which would suggest it is at this point the application is given its own address space. Copy-on-write semantics are used to share the zygote shared dirty memory, as shown in gure 4.5. This later plays an important part of the garbage collecting in the Dalvik VM which is why it is mentioned here. Normally on a standard unix fork, the child process is given its own copy of its parents memory, copy-on-write however allows processes to share the memory until a child process attempts to write it. If this occurs it is given its own copy of the memory. Each process has its own garbage collection and Zygote has been the deciding factor on how the garbage collection is performed on data. There are generally 2 methods of storing some mark bits to indicate to the garbage collector (GC) what it should do 29
to the data. One method stores the mark bits with each object, the second option stores the mark bits seperately (in parallel). But because Zygote shares its memory, if one process marks this to be cleaned, then GC will attempt to clean up Zygotes objects which will then aect the other processes using the same memory. (Note: The copy-on-write memory is not the same memory as the shared memory, copy-on-write is the Zygotes heap). Finally the android platform does a number of things to improve the eency of the code (beyond optimisations of the java compiler). So on installation, the Android platform performs verication to ensure the dex le is well formed and correct. Optimizations are done to the code, such as static linking, inlining of native methods etc. These optimizations are done at install time to save the amount of work needed later on and I imagine this is done at install time to convert a standard dex le into an optimized dex le suitable for that version of the Android platform.
4.2.2
Application Development
The recommended method for development on the Android platform is through the Eclipse IDE with the Android Plug-in. There are a number of versions of Android, the main versions that are on devices in the public are 1.5, 1.6 and 2.1. The provided emulator is fairly robust and oers a wide range of tools and hardware simulation, however the camera on the emulator is extremely poor. It oers a moving pattern with no way to feed in data. For this reason the development needs to be done directly on the device, but there is little extra eort to do this as the emulator and devices integrate with the IDE through the same method - android debug brige (adb). The general structure of an application is that each individual screen view is an Activity. An activity is designed to give an application the ability to handle the applications reaction as it goes through the activity lifecycle. The activity lifecycle is Androids technique of handling your application becomes the top of the application stack, moves down the stack by other applications and eventually gets brought back to the top or removed from the stack. To handle this, each activity is given access to override the following methods[1]: public class Activity extends ApplicationContext { protected void onCreate(Bundle savedInstanceState); protected void onStart(); protected void onRestart(); protected void onResume(); protected void onPause(); protected void onStop(); protected void onDestroy(); } Because MobileEye is intending to maintain bluetooth and socket connections with other machines this needs to be handled correctly through these methods. The activity of each screen can be thought of as the activity responsible for the UI thread and its elements. As an example, if you try and edit a TextView (Just a widget to display a string) from a dierent thread, it will cause an error in the application. 30
So to overcome this, a Handler method is used which acts as a message queue between other threads and the main UI thread. This is exetremely exible and can be used for transmitting a Message or Runnables. Where a Message is a data structure which can store certain information and Runnables are an easy way to implement a Thread where you only need to use the run() method of the Thread class and dont wish to create a seperate sub-class of Thread. Android has a process where the process is considered to be dead if the UI locks up for longer than 5 seconds [6]. At which point the system will oer to force close the application. This means the image processing needs to be done on a worker thread, but the communications need to have their own threads to be able to read and respond, as quickly as possible, to communications.
4.2.3
Security, Intents and Receivers
The Android platform has a security feature where an application must state what hardware and information it wishes to have acess to in the AndroidManifest le. This le stores the list of Activities and can be used to identify any broadcast receivers. The reason for this is that a user must give the application permission to use these on installation. Intents are used to launch an intentionto do something. So an example of this would be to turn on bluetooth from your application. This cant be done for security reasons, so a system application is called to handle this for us. Such an intent would be performed as [4]:
if (!mBluetoothAdapter.isEnabled()) { Intent enableBtIntent = new Intent(BluetoothAdapter.ACTION_REQUEST_ENABLE); startActivityForResult(enableBtIntent, REQUEST_ENABLE_BT); } This intent is acknowledged by the system and any applications registered to handle this intent will be launched. The idea behind this is that an application can suppliment its features through other applications[10]. BroadcastReceivers are used to receive information from the system, so a simple example of this is the buttons on the handsree set of the device. The system will register each button press and pass it to any broadcast receiver that wants to be aware of that event.
4.2.4
Bluetooth and Camera api
The bluetooth api for Android is fairly simple and the api includes some example code to turn on the bluetooth device from the application and how to perform a scan of nearby devices. The documentation for the Camera api is fairly well explained, but there are aspects of it that have little to no documentation. The main problem with the api is the preview callback provided by the api, while this is exactly what is needed, the data is given as just a byte[], with little explanation of how the data is structured. The documentation claims the default format is YCbCr 420 SP, but I couldnt nd any informatoin about this format, suggesting it is specic to the Android platform and not widely used. The closest version that is widely explained is the YCbCr format. But to try and avoid having to read in and convert this to a friendlier format I tried to change the preferences of the Cameras encoding to RGB 888, RGB 565 or RGB 332. These formats stored the values of red green and blue using R8 G8 B8 bits, R5 G6 B5 bits or R3 G3 B2 bits, where R8 is 8 bits to represent red, G8 is 8 bits to represent 31
Figure 4.6: YUV 420 format and byte stream[14]. green etc. However this doesnt eect the encoding used for the preview (i.e. this setting appears to be ignored). Further research pointed me to a forum post someone had discussed with regards to the format of the image data [2]. It was claimed that the G1 had a dierent format altogether and was using the YUV 420 semi-planar encoding. This encoding starts with a set of luminance values for the image Y (which has a luminance value for each pixel), and the U and V components are applied one value to 4 Y values, with all the U values appearing after the Y values and V values after the U values. An illustration of this is shown in gure 4.6, however the Android implementation of this is meant to be slightly dierent to this. The U and V values are one after the other, not joined together in the data as shown in gure 4.6. Both the YUV and YCbCr format have the value Y representing the luminance of the image which is suitable to get a greyscale version of the image. I found some code from an open source project that extracts this and I have adapted it accordingly to obtain a scaled version of the image at read-in time, to give some minor speed up[15].
32
Chapter 5 Project Execution

The implementation consisted of a number of components that which been seperated as much as possible to give the most exibility possibile. The key components of the system were: Mobile application handling the image processing to search for free space Hardware to perform projection rotation UI to be projected onto a surface (done in Python) A client-server application to handle object recognition (done in Java) Each of these sections will be discussed in depth outlining my choices and reasoning behind them.
5.1
5.1.1
Tool and Language Choices

Mobile Application
The main requirements of the mobile platform used to implement this work was it had give access to the camera and have means to communicate with bluetooth devices as well as internet connectivity. The decision was heavily based on hardware since I only have access to an Android device, but also because it was free and has one of the most active development communities. The language used in Android is Java, although there is a native SDK (NDK) counterpart which allows development in C or C++. From the Android NDK site, the NDK provides increased performance for ..self-contained, CPU-intensive operations that dont allocate much memory[3]. I would expect MobileEye to be both CPUIntensive but also allocate alot of memory use by handling a number of images, so it would seem to oer possible improvements, however the documentation at [3] also explains that most applications gained increased complexity from using the NDK, because of this I decided to use the Java SDK would be the most suitable for development, since Java is a language I am comfortable developing in and meant I could spend more time focusing on the problem and learning python. The mobile device I have used for implementation is the Nexus One [8], which has a 5 mega pixel camera and a 1GHz processor and 512MB Ram. This does need to be kept in mind that the processor is fairly fast compared to some other smartphones. The techniques implemented have not been developed to favour the high dynamic range techniques used in the Nexus Ones camera (its unclear whether it is a CMOS or CDD camera chip, both of which can oer high dynamic range), although it does help the algorithm to perform well. 33
Figure 5.1: A diagram on University of Bristols current steerable projector.
5.1.2
Hardware for Projection Rotation
Ideally a laser based projection unit would be used for this project. Again this choice was dictated by available hardware. I have used a Samsung DLP projector which is approximately 5x2x3.75 inches (WxHxD). The main disadvantage of this is the size is larger than a mobile device and requires manual focusing. The advantage is this projector is it will be brighter than a smaller, handheld projection unit. This is good for implementation but hides the capabilities of an actual implementation of this system. I was lucky enough to be given some guidance for implementing the system for rotating the projection. Internal research in the University of Bristol was done to create a custom piece of hardware which used a bluetooth controlled motor to move a mirror over a handheld projector. This took several months to build and there was no guarentee that it could be used for a projector the size of the one available to me and it wasnt able to rotate horizontally. For these reasons, I created a custom piece of hardware which would be controlled manually to position a projection both vertically and horizontally. The custom hardware was designed to strap around the projector, consisting of 2 parts, a xed base attached to the projector unit with a circle cut into it, larger than the projection lense diameter. The second piece had the mirror attached it on hinges, which had small hooks that tted onto the base part of the hardware. The hinges on the mirror gave the vertical positioning and the second part was able to rotate around the base, giving horizontal positioning. See gure 5.2 for photos. An ideal steerable projection system would be a similar system to the one used in the Everywhere display 2.8. This would obviously need to be made to a much smaller scale and would most likely need to be developed by someone with a stronger background in hardware development. Ideally a USB connection would be the best solution for communication with the device to control its movement, but this would require further research into apis available to use a USB connected device.
5.1.3
Projection UI
UI Design is something I have never spent much time doing in previous development, I have only been exposed to Java and python languages for use in UI implementations. From my experience Java was far more complex than python. In terms of development of the backend of the program, it had to be capable of communicating via bluetooth 34
Figure 5.2: Images of the Steerable Projection Hardware. Section A shows the xed base on the left and the rotating platform on the right. Section B shows these to pieces tted together on the projector. C shows how the mobile device would be attached and section D shows the mirror tted to the hardware.
35
with the mobile device and update the UI accordingly. Both are capable of doing this so I decided to use python since my previous experiece with python resulted in a much quicker development process compared to Java. I used a GUI builder called Glade to develop the layout of the UI, then used some example code from the PyBluez documentation [11] (PyBluez is a wrapper for the linux bluetooth protocol). The example code helped create the java connection as well (using the correct UUIDs etc). The resulting application works extremely well, I implemented some code to rotate the image after the projection had been rotated (when you rotate the mirror to project left or right, the result projection image is rotated. The bluetooth has occasional connection problems if the mobile device connects at about the same time as a connection timeout. The timeouts can be stopped but this results in the connection never getting closed when the program wishes to end (although I expect there is some way to overcome this problem, this method had the extra addition that it regularly ensured a live connection is available otherwise it reset).
5.1.4
Object Recognition
I used a binary of the FabMap application to see how the results possible but also to see if any interesting mapping data could be obtained from its use. Because it was a binary it meant that it only required some conifguration changes to work with a custom set of initialising images. But to integrate this with the application a used a Java client-server socket implementation. The reason for this is that it seems infeasible that the image processing required for this would be done on a mobile device. Instead it would be accomplished on a server (most likely implemented using a MapReduce technique since it is a highly parallelisable task). Because I dont have access to a server that could this application online safely, I implemented the client-server application on the same machine which runs the projector UI. During the initialisation of the mobile application, an IP address is required to use this method, but may skipped if not needed. The reason for choosing Java was simply because I had used it before and it was able to run shell scripts from within the program which meant I could easily run the FabMap algorithm after downloading the image from the mobile device. Obviously in a real product situation this may be replaced a dierent language or platform to ensure safe and secure methods for external api use. This would also be done using a custom implementation of the FabMap algorith rather than a binary.
5.2
Space Finding Algortihm
There is no clear research into methods of nding space in a scene, so for this project I developed a custom space nding algorithm designed to work on a mobile device at an acceptable speed that it could be used to nd and project in real time. The original idea was to use an existing edge detection method to detect where item borders are and from this detect where there might be larger areas which might be appropriate. However the time taken to convolve an edge detection kernel with an image took to long while still keeping a reseanable size image to work on. To overcome this I decided to try and achieve a similar result through the use of thresholding. Here I will outline the process performed to create the nal algorithm implemented. 36
Figure 5.3: Initial histogram data from a phone image.
5.2.1
Histogram Data
The rst step was to consider the output of a small image from the mobile device and the histogram representation as shown in 5.3. Its clear to see that data is highly noisy and suers from extremes (the number of pixels with a value of approximately 254). But from a human perspective you could make an estimate of about 3 or 4 gaussian distributions that could be used to segment the image. While it would be possible to run through the array and pick out the biggest values from this data, it would be dicult to calculate with some reliability the approximate mean and variance. Because of this I implemented a simple smoothing technique, which averaged each pixel value with the 2 pixel values either side of it. This had 2 advantages, it made it clearer where the peak and width of the distributions were. A number of these sample images with their approximate thresholded images are shown in the appendix on page 45. The eect of this on the data, had a similar eect to blurring the image. If you apply a gaussian blur to an image, the eect in the histogram is similar to the results achieved through this method. The great advantage of using this method is that the image size doesnt become the largest concern in terms of eency, since its values are read into these values to create the histogram, which is implemented in a bucket sort algorithm, and then only the histogram data is used in the further algorithms, which is limited to at most 255 values. To speed up the algorithm I reduced the data further to 64 buckets, giving groups 37
of 4 (values 0-4, 5-9, 10-14 etc). By putting the data into these buckets it performed a good level of averaging in itself (since values with extremes were added to the smaller neighbouring values) to then eliminate any further noise the original averaging method propesed above was applied (as shown in the appendix on page 51). The next problem was to perform the hill climbing that would nd these peaks and estimate the width of the peak (variance). Before implementing a hill climbing technique it is worth noting that in the majority of images, there tends to be approximately three dominant distributions that represent the data. This is an assumptioin that is based on the results of images used during the development. This value is used to heavily model the expected data from these images, but has proven to be exetremely successful in the results and always accounted for the majority of pixels in the image.
5.2.2
Hill Climbing
The original method of hill climbing was searching for the peaks in the data, after this selecting the top three and traversing along the peak until a trough was found on either side. The peaks were searched for using preset points along the graph, the motivation behind this was to avoid any noise that still may exist in the data. This was later improved to be a single pass, maintaining pointers of the minimum, peak top and maximum bucket for each distribution. The results of hill climbing can be seen in gure 5.4. As you can see the results are highly satisfactory if you take the highest value peak, which represents the piece of paper. This however is an idealistic image that happened to work for the outlined so far. The problem that can occur if you only consider the highest value peak out of the top three is that the top 2 peaks may actually be part of the same region in the image and by treating them as seperate distributions can give the results as shown in gure 5.5. The simplest method to solve this, is to identify that these 2 peaks as meeting at a trough and this gives a suitable merging rule. There are situations where this isnt suitable, if the image has peaks at either end of the histogram and happen to meet in the middle, it is highly unlikely that they represent the same region, because of this there is a limit put on how far apart the peaks can be to be merged into the same distribution.
5.2.3
Area Extraction
So far Ive explained how the segmentation is achieved, from this the largest free region needs to be identied. The main point of segmenting was to extract the largest and brightest area, but the problem is identifying where this largest area is. Given a group of pixels assigned to the same threshold, the challenge is to nd the largest region of these pixels and ignoring any noise that may exist in the group. My initial thoughts were to implement a full region growing algorithm which would identify all the groups in the image matched to the threshold, then select the largest region. This would require alot of computation to be spent on the smaller regions. Instead I used the much simpler method of averaging a point a = (x, y) which is averaged for all the pixels assigned to the group. From this the pixel is grown vertically and horizontally in a square until one direction can no longer grow. Then the other directions grow until such a time that the box is consuming as much space as possible. The reason of taking this over the alternatives (which will be discussed in future research 6.2) is that the simplicity gives good time complexity and the end results are good. There are however, two major drawbacks of this method. The averaging 38
Figure 5.4: Hill climbing results of a white piece of paper. On the left: the cumulative values of each pixel group (0-4, 5-9 etc). The highlighted values (orange) repesent the peak group of that distribution and the colours along the side repesent each distribution. On the right: The corresponding thresholded images for each distribution, where white pixels indicate they are a part of that distribution / group / threshold.
Figure 5.5: This image has been split into three regions (illustrated by white, grey and black pixels) and from the histogram the peaks of the white and grey region are 34 and 42. 39
Figure 5.6: An example of where the space nding algorithm will choose a less than optimal solution. If the center point (shown as the red circle in this image) is placed between two regions not classied as the same region by the thresholding (shown as the solid black bars), the region growing will grow to the width of this column and grow vertically (Region 1). Ideally region 2 or 3 would be selected as the projection area is far larger. technique works well providing there is a large enough section within the scene to contribute most to the average of a. This means that smaller areas which may be suitable for projection in a scene, wont be found since only a small amount of noise in other parts of the image will move the center point a far o (possibly into a section of the image not included in the threshold) meaning the region growing will simply fail. The other problem with this technique is the region growing of the box, it doesnt try any alternative methods for growing, growing simply in all directions evenly until each direction reachs a boundary. If the center point a happned to land in between two small columns of a large space, as illustrated in 5.6, then it will ll this column and grow directly down, despite a far more optimal solution existing else where in the image.
5.2.4
Application Structure
The Android application is governed by a set of states. This states are used to give the system as much stability as possible. The states are as follows: Initialising When the application sets up any bluetooth or image processing connections on the rst 2 screens, these connections arent actually opened until the camera activity is opened. The reason for this is it makes it easier to handle a connection that fails (in terms of what activity needs to nish). When the camera activity starts up the bluetooth and object recognition servers are connected. If either of these fail then the application is reset. It is possible to skip these connections as they arent required, it just gives the application limited use. Once the connection is established the state is changed Find Area This state is simply the rst stage at nding a projectable area, if it nds a suitable sized area the state changes, otherwise the state remains unchanged and repeated Test Projection Area This state waits for approximately 3?? seconds with each image frame being tested that the projection areas mean pixel value stays the same. If this changes beyond a threshold it sets the state back to nding an 40
area. After 3 seconds the state is changed to projecting markers. When the state changes the bluetooth connection is informed to project the markers. Setting Up Markers This state has the sole purpose of acting as a time out. The user is required to indicate that the markers have been set-up (moved into position) by a button press. It is assumed that the area will remain constant during this period. This would be dierent for a mechanical motor as it would be far faster than the manual conguration. If the button isnt pressed the state is set back to nding area, otherwise it scans to obtain points the marker. Projecting Markers This state expects to see the marker in the centre of the screen and will nd the corners of the marker. If this is successful the state is moved onto projecting data, the coordinates are then sent to the UI (over the bluetooth connection) which would be used to apply keystoning and change the marker for the appropriate image. If this doesnt succeed the state is changed back to nding area. Projecting Data This is the nal state and sets a new image average pixel value. If this goes above another threshold the system resets it state again. Throughout these states it is possible to take a photo to send for object recognition. This information is sent when changing from the projecting markers state to projecting data.
41
Chapter 6 Project Status

6.1 Current Status
The current status of the implementation is a reliable and exible system. However there are a number of changes that could advance the systems performance and also a number of changes that would bring the system closer to a nal implementation. Below is outline of each section and my thoughts on what would be most suitable to change.
6.1.1
Projector UI
The python UI has worked exetremely well and performs reliably. By using the bluetooth connection it should be suitable for most bluetooth enabled mobile devices, the only thing that is required is the use of the same protocols used in the MobileEye system. What this means is simply using the appropriate XML calls between the mobile device and the laptop connected to the computer. The only thing that wasnt implemented was the keystoning of the projected image. My implementation nds and passes four corresponding points from the camera frame to the python UI application, but nothing is done with the data. The methods outlined in [52, 53] should be reasonably easy to calculate, although may require some external image processing libraries to apply the transformation (i.e. openCVs python wrapper). The code for handling the rotation for the steerable projection (when the projection is steered left or right the projected image rotates as well) was disabled as hardware wasnt being used for the nal implementation, so may require enabling for future use. Further work would need to be done with this if a laser based projection was used such that the background colour of the application was changed from grey to black. The advantage of this, with laser projection, is that these pixels will be completely switched o, at the moment the UI will project a white / grey default colour, creating an un-desired border around any projection. Discuss the no colour consideration Dicuss the algorithms for maximising the space used
6.1.2
Android Application
I am fairly happy with the Android Application as it stands, the Activity lifecycle handles the client-server protocol and bluetooth connections appropriately and the memory of the application seems fairly stable. I believe the application could be netuned to give a slight speed up but nothing too extensive. 42
Future work may be need to make this application work on older versions of Android, but the only real api dierences I would expect would be in the camera and text to speech apis. I implemented the code to handle button presses of the handsfree kit, the main idea being that a limited amount of interaction could be given to the device through this rather than touch screen (To handle object recognition and also to identify when manual set-up of the steerable projection was done). This would be ideal for any future implementations with full
6.1.3
Image Recognition
I didnt get the time to see what level of results could be achieved by using the FabMap algorithm and an interior mapping system. I had hoped the binary of the FabMap implementation would be more exible than it was, but lead to extremely slow performance because of the nature of its implementation. I think this part of the implementation is the one that requires the most attention. The following things are what I would like to change: Implement a custom version of a standard SURF image recognition and then implement and compare the FabMap algorithm Maximise the speed of system by using the parallelisable possibilities provided by the SURF system through the use of a MapReduce system for matching and scale space creation. While there are a number of techniques, Google are improving their image search features and they are showing signs of releasing apis to use their service[7]. While there is no time scale on the release of this service, Googles vast computing and access to images would give them a number of advantages that would be dicult to match without investing alot of time and eort.
6.1.4
Hardware
The hardware wasnt particularly useful by the end of the project. Because of the size of the projector there was only one way to position the phone so you could see the devices screen, but this meant it wasnt possible to move the mirror vertically enough to position the porjector everywhere in the cameras view. I think a smaller projection system would be able to overcome this problem, but making the same version of the hardware to work on such a small device may prove dicult, so I would expect a dierent design might be needed.
6.1.5
Aims Achieved
Discuss the aims and what was achieved what wasnt
6.2
6.2.1
Future Work
Depth
At the moment there is no calibration of the projector to handle depth with the projector. This is overcome by manually moving the mirror, but in a real life implementation the camera cant be aligned properly without depth. A clear example of this is given in gure 6.1. Notice in the example that the projector and camera are next to each 43
Figure 6.1: Illustration how depth of a projection area is required. The camera angles (both vertical and horizontal) cv and ch remain the same while the angles to center the projection are varied (ph1 and ph2 ). Note: the vertical angle doesnt change and would be kept the same the camera angle. other at the height. This means that the vertical angle between the camera and the center of the projectable area will be the same as the angle needed to centre the projection. You could obviously swap this orientation so the camera and projector where vertically aligned (i.e. one below the other) but now on dierent heights. This would swap everything around (horizontal angles the same vertical now changes with depth. This could be overcome simply projection dierent coloured markers across the vertical section of the screen, which could then be recognized by the camera and could be used to work out the dierence between the projectable surface centre and the coloured marker (the camera should be able to calculate the distance from the centre projected (colour) marker. Having multiple markers means if a centre marker was not projected onto the surface the width between other markers could be used to infer some level of depth to estimate the amount rotation need to get the centre marker onto the surface, which can then be rened.
6.2.2
Automatic Object Recognition
Discuss how it might be possible to achieve this on a Mobile Device.
6.2.3
Space Finding Algorithm
The algorithm dening the free space within which to project is fairly reliable and stable. My biggest concerns with it is that area extraction suers from the problems outlined on page 38. What might be possible (to achieve in a fast time) is to divide the image into a grid (size can be determined by performance), where during the segmentation (pixels are dened into groups) the grid is lled with a score. Each grid is given a point for each pixel in its region which is a member of that group. Then perform a much simpler region growing method on the maximum size. The key to this is that it uses a big enough grid to simplify the process and performs a greate overview of the regions. From the region dene an approximate center point for the area extraction to continue as normal, instead of using a weight point to dene the centre. 44
Appendix A Space Finding Appendix

A.1 First Averaging and Thresholding Tests
45
46
47
48
49
50
A.2
Averaging and Thresholding Improvements
51
A.3
Dalvik
52
Bibliography
[1] Activity. [2] [android-developers] re: Android era.previewcallback.onpreviewframe. [3] Android ndk. [4] Bluetooth. [5] Coolpix. 1000pj. [6] Designing for responsiveness. [7] Google will make goggles a platform. [8] Nexus one. [9] Online consumer-generated reviews have signicant impact on oine purchase behavior. [10] Openintents. [11] Pybkuez documentation. [12] Samsung i8520 halo android 2.1 phone with 3.7-inch super amoled and pico projector. [13] Worlds rst video projector mobile phone epoq egp-pp01. [14] Yuv. [15] Zxing (zebra crossing). [16] Smartphone sales to overtake mobile phone sales by 2012, Nov 2009. [17] Blob detection, May 2010. [18] Hough transform, May 2010. [19] Stanislaw Borkowski, Oliver Ri, and James L. Cowley. Projecting rectied images in an augmented environment. IEEE International Workshop on ProjectorCamera Systems, Oct 2003. [20] Vladimir Brajovic and Takeo Kanade. A sorting image sensor: An example of massively parallel intensity-to-time processing for low-latency computational sensors. IEEE International Conference on Robotics and Automation, Apr 1996. [21] Edward Buckley. Invited paper: Holographic laser projection technology. SID Symposium Digest of Technical Papers, 39(1), May 2008. 53 camera preview lter using cam-
[22] Andreas Butz and Christian Schmitz. Annotating real world objects using a steerable projector-camera unit. IEEE International Workshop on Projector-Camera Systems, June 2005. [23] Steven Decker, R. Danial McGrath, Kevin Brehmer, and Charles G. Sodini. A 256 x 256 cmos imaging array with wide dynamic range pixels and column-parallel digital output. IEEE Journal of Solid-State Circuits, 33(12), Dec 1998. [24] Cornelia Fermller. Non-maximum suppression. u [25] Mark Freeman, Mark Champion, and Sid Madhavan. Scanned laser pico projectors: Seeing the big picture (with a small device). [26] Abbas El Gamal. High dynamic range image sensors. Stanford University Website, 2002. [27] Andrew Greaves and Enrico Rukzio. View & share: Supporting co-present viewing and sharing of media using personal projection. International Journal of Mobile Human Computer Interaction, 2010. [28] Chris Harris and Mike Stephens. A combined corner and edge detector. Proceedings of the Alvey Vision Conference, 1988. [29] Larry J. Hornbeck. Digital Light ProcessingTM for High-Brightness, HighResolution Applications. Texas Instruments Incorporated, P.O. Box 655012, MS41, Dallas, TX 75265, February 1997. [30] Texas Instruments. Dlp R projectors glossary. [31] Texas Instruments. How dlp technology works, 2009. [32] David Kriegman. Homography estimation. CSE 252A, 2007. [33] Johnny C. Lee, Paul H. Dietz, Dan Maynes-Aminzade, Ramesh Raskar, and Scott E. Hudson. Automatic projector calibration with embedded light sensors. ACM Symposium on User Interface Software and Technology (UIST), Oct 2004. [34] Markus Lochtefeld, MichaelRohs, Johannes Schoning, and Antonio Kruger. Marauders light: Replacing the wand with a mobile camera projector unit. Mobile and Ubiquitous Multimedia, 2009. [35] David G. Lowe. Object recognition from local scale-invariant features. ICCV, 1999. [36] Krystian Mikolajczyk and Cordelia Schmid. Indexing based on scale invariant interest points. ICCV, 2001. [37] Pranav Mistry. Pranav mistry: The thrilling potential of sixthsense technology, November 2009. [38] Pranav Mistry and Pattie Maes. Pattie maes and pranav mistry demo sixthsense, 03 2009. [39] Pranav Mistry and Pattie Maes. Sixthsense: A wearable gestural interface. SIGGRAPH Asia 2009, Sketch., Yokohama, Japan., December 2009. [40] Pranav Mistry, Pattie Maes, and Liyan Chang. Wuw - wear ur world - a wearable gestural interface. CHI EA 09, Apr 2009. 54
[41] Andrew Molineux, Enrico Rukzio, and Andrew Greaves. Search light interactions with personal projector. Ubiprojection 2010 1st Workshop on Personal Projection at Pervasive 2010, May 2010. [42] Shree K. Nayar and Tomoo Mitsunaga. High dynamic range imaging: Spatially varying pixel exposures. IEEE CVPR, 2000. [43] Claudio Pinhanez. The everywhere displays projector: A device to create ubiquitous graphical interfaces. Proc. of Ubiquitous Computing 2001 (Ubicomp01), Sep 2001. [44] Claudio Pinhanez. Using a steerable projector and a camera to transform surfaces into interactive displays. Conference on Human Factors in Computing Systems, 2001. [45] Claudio S. Pinhanez, Frederik C. Kjeldsen, Anthony Levas, Gopal S. Pingali, Mark E. Podlaseck, and Paul B. Chou. Ibm research report: Ubiquitous interactive graphics. IBM Research Report RC22495 (W0205-143), May 2002. [46] Projector.com. Projector display types: Crt or dlp or lcd? [47] Ramesh Raskar and Paul Beardsley. A self-correcting projector. IEEE Computer Vision and Pattern Recognition (CVPR), Dec 2001. [48] Enrico Rukzio and Paul Holleis. Projector phone interactions: Design space and survey. Workshop on Coupled Display Visual Interfaces at AVI 2010, May 2010. [49] Michel Sayag. Non-linear photosite response in ccd imagers (patent). [50] Michael Schmitt and Ulrich Steegmller. Green laser meets mobile projection u requirements. Optics and Laser Europe, pages 1719, 2008. [51] Johannes Schoning, Markus Lochtefeld, Michael Rohs, Antonio Kruger, and Sven Kratz. Map torchlight: A mobile augmented reality camera projector unit. Conference on Human Factors in Computing Systems, 2009. [52] Rahul Sukthankar, Robert G. Stockton, and Matthew D. Mullin. Automatic keystone correction for camera-assisted presentation interfaces. Advances in Multimodal InterfacesProceedings of ICMI, 2000. [53] Rahul Sukthankar, Robert G. Stockton, and Matthew D. Mullin. Smarter presentations: Exploiting homography in camera-projector systems. Proceedings of International Conference on Computer Vision, 2001. [54] Jahja I. Trisnadi. Speckle contrast reduction in laser projection displays. Proc. SPIE Projection Displays VIII, Ming H. Wu; Ed., 4657:131137, April 2002. [55] Paul Viola and Michael Jones. Rapid object detection using a boosted cascade of simple features. CVPR, 2001. [56] WowWee. Cinemin swivel.
55

Mobile Eye - Matthew Gaunt

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Mobile Eye - Matthew Gaunt

Hochgeladen von

Copyright:

Verfügbare Formate

Summary Over the past year or so, there has been a large growth in the sales of smartphones[16], enabling

Bluetooth and Camera api . . . . . . . . . . . . . . . . . . . . .

it is the users responsibility to move and calibrate the projector accordingly.

Aims and Objectives

Chapter 2 Related Work - Projection

Digital Light Processing (DLP)

Figure 2.1: LCD Projection

Figure 2.2: Digital Micro-Mirror Device (DMD)

Figure 2.3: Full DLP System 10

Scan Beam Laser Projection

Projector Form Factor and Design

Projection Uses and Interactions

Projection Technology Summary

MobileEyes Projection System

Chapter 3 Related Work - Object/Image Recognition

Indexing Scale Invariance

Well Adjusting Capacitors

Figure 4.1: Dynamic range of the Nexus One Smartphone. 26

Spatially Varying Pixel Exposures

Dalvik Virtual Machine

Security, Intents and Receivers

Bluetooth and Camera api

Chapter 5 Project Execution

Tool and Language Choices

Figure 5.1: A diagram on University of Bristols current steerable projector.

Hardware for Projection Rotation

Space Finding Algortihm

Figure 5.3: Initial histogram data from a phone image.

Chapter 6 Project Status

Discuss the aims and what was achieved what wasnt

Automatic Object Recognition

Discuss how it might be possible to achieve this on a Mobile Device.

Space Finding Algorithm

Appendix A Space Finding Appendix

Averaging and Thresholding Improvements

Das könnte Ihnen auch gefallen