Beruflich Dokumente
Kultur Dokumente
Submitted to
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE & ENGINEERING
BY
KADIYALA BHARATHI (16831A0563)
CERTIFICATE
The results embodied in this Technical Seminar report have not been submitted to any other
University or Institute for award of any Degree or Diploma
I further declare that this Technical Seminar report has not been previously
submitted before either in part or full for the award of any degree or any diploma by
any organization or any universities.
KADIYALA BHARATHI
(16831A0563)
Acknowledgement
"Task successful" makes everyone happy. But the happiness will be gold without
glitter if we didn't state the persons who have supported us to make it a success.
We would like to express our sincere thanks and gratitude to our Principal, Dr.
SREENATHA REDDY and Head of the Department Dr. S. DEEPAJOTHI, Department of
Computer Science and Engineering, Guru Nanak Institute of Technology for having guided
us in developing the requisite capabilities for taking up this project.
On a more personal note we thank our beloved parents and friends for their moral
support during the course of my seminar. .
ABSTRACT
CERTIFICATE ............................................................................................................................. 2
DECLARATION........................................................................................................................... 3
ACKNOWLEDGEMENT ............................................................................................................ 4
ABSTRACT ................................................................................................................................... 5
LIST OF FIGURES .........................................................................Error! Bookmark not defined.
INTRODUCTION............................................................................Error! Bookmark not defined.
THE BENEFITS OF GOING SERVER LESS .............................Error! Bookmark not defined.
THE RISKS OF GOING SERVER LESS .....................................Error! Bookmark not defined.
ADVANTAGES................................................................................Error! Bookmark not defined.
DISADVANTAGES .........................................................................Error! Bookmark not defined.
CONCLUSION ................................................................................Error! Bookmark not defined.
Chapter 1
1. Direct Objects Direct objects should be continuously viewable to the user and
functionally rendered. They should have a real-world counterpart, and their use
in the interface should mimic their real-world use. A simple push button is a
good example of a direct object: In the real world and the interface, buttons are
visually similar and are activated (pushed) in the same way.
• Nose Mapping In nose mapping the face[1] (the nose) is tracked and map its
motion to the cursor.
4. For each state si , a set of transition rules that associate an event ei,j , j = 1 . . .
m n (informally, the output of one or more selectors in the bank bi ) with either
a state of a different index, or si (the null transition). By convention, the first
transition event to fire defines the transition for that state.
The parser modeling is explained with the example of a button press VICon
from above. Possible sequence of selectors are :
1. A simple motion selector defines the trigger condition to switch from the
distinguished initial state s1 ,
4. Gesture recognition.
It is easy to see that processing under this framework is efficient because of the
selector ordering from simple-to-complex wherein parsing halts as soon as one
selector in the sequence is not satisfied.
The intent of the framework is that a parser will not only accept certain input,
but it may return other relevant information: duration, location, etc. A VICon
history is defined as a set h0...hm , where the set length m implicitly represents
the duration of the current interaction and hj1...m is a snapshot of the behaviors
current state and any additional relevant information. When a behavior enters its
distinguished initial state, its history is reset: m 0. The history begins to track
users actions when the behavior leaves its initial state. A snapshot is created for
every subsequent frame and concatenated into the history thereby adding
another dimension that can be employed by the VICon during parsing. The key
factor differentiating the VICs paradigm from traditional interfaces is that there
may be multiple exit conditions for a given VICon determined by different
parsing streams each triggering a different output signal[1].
Chapter 3
4D Touchpad
3.1 Introduction to 4D
• Fragnance
• Sense of touch
• Time
Out of these the time as the fourth dimensionis most common. Here the concept
of 4D combines space and time within a single coordinate system, typically
with three spatial dimensions: length, width, height, and one temporal
dimension: time[9]. Dimensions are components of a coordinate grid typically
used to locate a point in a certain defined ”space” as, for example, on the globe
by latitude and longitude. In spacetime, a coordinate grid that spans the 3+1
dimensions locates events (rather than just points in space), so time is added as
another dimension to the grid, and another axis. This way, you have where and
when something is. Consider figure 6.1 which illustrates a 4D example, here if
the inner cube becomes the outer one and vice-versa instantaneously, ie co-
ordinates can be represented only with the help of a fourth dimention, time. As a
simple example consider an ant at a point (1,1,1) which is anexample for 3D.
But the presence of an ant at (1,1,1) at 9am ie (1,1,1,9) is an illustration for 4D.
ROI of the 4DT. Since any projection of the disturbance onto a camera plane
results in the loss of one dimension which is discussed in the next chapter, we
use two cameras to verify the contact of the object with the surface of the
table[1,2].
Figure 5.2: Schematics of 4DT
The image of the screen to be displayed to the user is projected on to the table
from the projector through the mirror. Mirror is mounted such that screen is
shown clearly in the table. Special adjustment is done in caliberating the camera
so as to obtain the correct sequenceof actions by the user.
Chapter 4
Working of 4D Touchpad
The 4DT is based on the 3D-2D Projection-based mode of the VICs framework.
The fundamental idea behind VICs is that expensive global image processing
with user modeling and tracking is not necessary in general vision-based HCI.
Instead, interface components operating under simple-to-complex rules in local
image regions provide more robust and less costly functionality with 3 spatial
dimensions and 1 temporal dimension. VIC provide a framework which include
the VICon, parsermodeling, dynamic modelling discussed in chapter 5, for the
4DT based on which it works[2]. As seen in the previous chapter 4DT
comprises of a pair of cameras with a wide-baseline and a projector that are
directed at a table.The projector is placed underneath the table while the
cameras are positioned above to remove user-occlusion in the projected images.
Now whenever a disturbance enter into the ROI of the VIC framework for the
4DT, following steps are performed to identify user’s intention:
At first the 3D view of the projected camera is projected on to the plane using
perspective projection. This is achieved by representing the user’s pint of view
by a point in this 3D space. From this point, the user has a certain ”view” on our
3D world. This view is drawn below as the pyramid. All ray of lights that pass
this ”pyramid” will originate from objects that the user sees. When they pass
this pyramid, they also pass the front plane. In the end, the whole 3D to 2D
projection will be about projection the entire 3D world on that plane in front of
the pyramid[3] (coloured grey in the figures). Imagine we want to project a cube
on a plane. If we want to draw the situation, it would look somewhat like this
(note that the base we use has been rotated a bit):
4.2.1 Homography
solving,
x~l,i =
αβ
Where H = HlHr is the homography that maps points on the right side image to
points on the left side image.
In 4D touchpad the projection of the camera and the actual figure(Figure 6.3) is
homographycally analysed using the above equations.
The vision system is responsible for the correct detection of press actions on the
table and other gestures related to the VICons. Since the projection onto a
camera plane results in the loss of one dimension, we use two cameras to verify
the contact of the object with the surface of the table. The rectification process
described corrects both camera images in a way that all points in the plane of
the table appear at the same position in both camera images. This can be used
for simple stereo calculation. In practice, a small region above the surface is
considered. Because of the rectification, a simple button press detector can be
built just by segmenting color regions with a color skin detector and subtracting
the resulting color blobs in both cameras from each other. The common
region(Figure 6.4) between the two images represents the part of the object that
has actual contact with the plane of the interface. A graph of the depth
resolution(Figure 6.5) of our system is analysed. The high depth discrimination
is due to the wide baseline of the stereo system[1,2].
Segmentation of objects above the plane is even easier. For both cameras, we
can subtract the current frame from a stored background frame yielding mask of
modified regions. Then, we can take the difference between two modified masks
to find all pixels not on the plane and use it in more intensive
computations like 3D gesture recognition.
1. The user enters the local region. Visually, there is a notable disturbance in the
local regions appearance. A simple thresholded, image-differencing algorithm
could detect the disturbance.
2. The finger moves onto the button, presenting itself as a large color blob in the
local region.
3. The finger pushes the button. From one camera, its difficult to reliably detect
the pushing (nothing physically changes as a result of the user action), but we
can assume the pushing action has a certain, fixed duration.
In practice, manually designing such parsers is tedious and its generally difficult
to guarantee robust recognition. Therefore, we use pattern-recognition
techniques to automatically learn a model for each low level gestures
spatiotemporal signature.Various other ge.sture recognition algorithms can also
be used to model it
A natural gesture language implements the VICs interaction model on the 4DT
platform. The gesture language comprises a vocabulary of individual gestures
and a gesture grammar that promotes a natural interaction by grouping
individual gesture words into gesture sentences[1]. Table 1 lists the vocabulary
ndividual gesture words. Grouping these individual words together into a
comprehensive gesture language gives the composite gesture sentences in Table
2.
Table 6.2: An intuitive set of gesture sentences. Gesture Sentence Pushing Push
+ Silence Twisting Press-right + Twist + Silence Dropping Pick + Drop +
Silence Flipping Pick + Flip + Silence Moving Pick + Move + Drop + Silence
Stopping Stop + Silence Resizing Grab + Resize + Stop + Silence
Chapter 5
Future Work
3. Incresing the Region Of Interface(ROI) of VIC cue so that user has got
enough space for interaction.
4. Using new hardware technology so that the cost of 4DT is reduced and
accuracy of detection can be increased.
Advantages
References
[1] Jason J. Corso, Guangqi Ye, Darius Burschka, and Gregory D. Hager, ”A
Practical Paradigm and Platform for Video-Based Human-Computer
Interaction” ,2008
[5] Johnny Bistrom, Alessandro Cogliati, ”Post-WIMP User Interface Model for
3D Web Applications”,2005
[6] Wiki reference, http://en.wikipedia.org/wiki/Human%E2%80%93computer
interaction