Sie sind auf Seite 1von 19

INTRODUCTION

Overview
YouTube is one of the staples of modern society. Through this single website you can access
hundreds of videos from all over the world at the click of a button. You can literally find anything
on YouTube.YouTube has revolutionized everything from entertainment to education, opened new
doors for talented individuals looking to be discovered, and brought people together from all over
the globe.

Before YouTube existed we could learn about distant countries and cultures by reading books and
articles online, watching documentaries and looking at published pictures. But YouTube allows us
to take things a step further and learn about places and cultures around the world from the actual
people who live there.
I think it’s absolutely amazing that with a quick YouTube search we can watch a Zimbabwean
music video, take an educational trip to Bhutan, the last Shangri La, or find out everything there
is to know about Iceland. It is opening up new doors for people, who were once ignorant to the
world around them, to become aware of cultures around the globe and to see that there are all sorts
of interesting people living on our planet, interesting music and dance that we’ve never heard
before, and so much to see, learn and soak up.

YouTube has opened new doors for talent discovery. Never before has it been so easy for talented
singers, dancers, actors and artists to achieve instant fame. Today, a person can upload a video of
himself singing to YouTube and, if it gets seen by the right person, he could receive millions of
views, get a record deal and begin a whole new life overnight.
For example: Justin Bieber. When Justin’s mom uploaded some videos of her son singing to
YouTube to share with family and close friends she had no idea that the videos would be seen by
music manager Scooter Braun and that he would have the opportunity to meet Usher and become
a pop icon.
YouTube has given people around the globe a platform through which they can post videos to
show the world, first-hand, what is going on in their countries; to show the world the reality of
situations that may not be being broadcast on the news or may be hidden by the government; and
to let people know about different issues that they may not have heard about and getting viewers
to take action.

One of the most amazing examples of YouTube as a platform for spreading the truth is the story
of Neda Agha-Soltan, who was killed in protests over the Iranian election of 2009. The world had
no idea what was going on in Iran until a video of Neda being killed was uploaded to YouTube. It
spread like wildfire and soon the entire world jumped into action, doing what they could to stop
the troubles in Iran and show their support.

YouTube has also changed the face of entertainment. We used to watch television, go to the
movies, read books. Today we can spend hours on YouTube, watching video after video. YouTube
has shows, movies, home movies, animated shorts, web series and more and we can watch it all
from home for free.
And YouTube entertainment isn’t limited to YouTube alone. YouTube clips are being shown on
the news, and on popular television shows like Good Morning America and Ellen. Whereas people
used to ask friends, over dinner, whether or not they had seen the latest box office hit, now people
ask if their friends have seen the latest viral video and odds are they have. Even if they didn’t see
the clip on YouTube, they probably saw it on the news or on their favorite variety show.

YouTube is of how to do videos, tutorials and lectures. If you have a question or want to know
how something is done all you have to do is search for it in YouTube and odds are there is an
educational video waiting to teach you everything you need to know.
In fact, some people are even beginning to use YouTube as a tool for offering a free education to
people around the world. For instance, hedge fund analyst Salman Khan quit his job to start
offering a free education on the web, via YouTube. Through the ‘Khan Academy’, Khan offers
over 1,400 tutorials, teaching about everything from math and finance, to physics, chemistry and
biology. With videos like Khan’s on YouTube, all that a person needs is an internet connection
and they have access to a fantastic education, even in third world countries.

There are many videos which are too long like 4 hour video or more for example a cricket match
,a football match , etc. People who miss the live stream and have hectic schedule to watch a full
game a video is made up of important events that took place in that game those videos are called
highlights so people with not much time can watch highlights.

Video size can also vary according to the quality of video and duration of the video. A video can
be in megabytes or can be in gigabytes. If you record a full cricket match video size can go upto
GB’s where as highlights of the same cricket match will be in MB’s.
YouTube views worldwide

Now question arises, why to use video data instead of image data.To understand why YouTube is
so popular now a days or Video data have increased in todays generation we have to understand
about video. What is video, video type ,video format, why to use video data in place of an image
data.

What is video?

Video is an electronic medium for the recording, copying, playback, broadcasting, and display
of moving visual media.
Video was first developed for mechanical television systems, which were quickly replaced
by cathode ray tube (CRT) systems which were later replaced by flat panel displays of several
types.
Video systems vary in display resolution, aspect ratio, refresh rate, color capabilities and other
qualities. Analog and digital variants exist and can be carried on a variety of media, including radio
broadcast, magnetic tape, optical discs, computer files, and network streaming.
Characteristics of Video

Frame rate, the number of still pictures per unit of time of video, ranges from six or eight frames
per second (frame/s) for old mechanical cameras to 120 or more frames per second for new
professional cameras. Film is shot at the slower frame rate of 24 frames per second, which slightly
complicates the process of transferring a cinematic motion picture to video. The minimum frame
rate to achieve a comfortable illusion of a moving image is about sixteen frames per second.
Aspect ratio describes the proportional relationship between the width and height of video screens
and video picture elements. All popular video formats are rectangular, and so can be described by
a ratio between width and height
Video quality can be measured with formal metrics like Peak signal-to-noise ratio (PSNR) or
through subjective video quality assessment using expert observation.
Format of Video

Different layers of video transmission and storage each provide their own set of formats to choose
from.
For transmission, there is a physical connector and signal protocol. A given physical link can carry
certain display standards that specify a particular refresh rate, display resolution, and color space.
Many analog and digital recording formats are in use, and digital video clips can also be stored on
a computer file system as files, which have their own formats. In addition to the physical format
used by the data storage device or transmission medium, the stream of ones and zeros that is sent
must be in a particular digital video coding format, of which a number are available.
Types of Video Data
There are two types of video

Analog Video

Analog video is a video signal transferred by an analog signal. An analog color video signal
contains luminance, brightness (Y) and chrominance (C) of an analog television image. When
combined into one channel, it is called composite video as is the case, among others
with NTSC, PAL and SECAM.
Analog video may be carried in separate channels, as in two channel S-Video (YC) and multi-
channel component video formats.
Analog video is used in both consumer and professional television production applications.

analog video

Digital Video

Digital video signal formats with higher quality have been adopted, including serial digital
interface (SDI), Digital Visual Interface (DVI), High-Definition Multimedia Interface (HDMI)
and DisplayPort Interface, though analog video interfaces are still used and widely available.
There exist different adaptors and variants.
Challenges in video making

Video Cost: Making a video is accompanied with various cost, it includes cameras,crew, lightning,
voice over.

Video Expertise: Video editing is not a simple task. You could have shot a video footage from
your digital camera, your cellphone or even downloaded from websites. Stitching video creating a
transition between them , adding music , titles can be hard and time consuming

Time Factor: From capturing and delivering video is time consuming. It does take lot of time to
edit a video. Animated video takes a lot of time to edit.

Why to use video data instead of image data

People are visual beings. Online consumers can retain only 10 to 20% of the information they read
or hear about, but once that information is paired with visual elements , the information retained
suddenly goes up to 65%. At any given time, the human brain can only process a very limited
amount of information. In addition any information that can be processed faster will take priority
in grabbing a person’s attention
Video is the most important visual aspects used in content marketing today. By 2019 80% of
internet traffic will consist of video traffic. Generally , online audience prefers watching video
rather than image or plain text. Infact four times as many consumers would rather watch a video
about a product than reading or watching images. While 90% of consumers state that watching a
product video is very helpful in making a purchasing decision.

Chapter 2.
In today’s age and time there is a surge for the demand of video data. People prefer watching a
video in order to gain any kind of knowledge than going through a book. And not only in the field
of gaining knowledge but video is prevalent in abundant quantity in the field of entertainment
media as well. And the demand for video data is just increasing on a steep slope. Though there is
an immense affinity for people to turn towards video data over textual data.
The way people use the Internet for sharing experiences makes many of these data easily
accessible.
For example, on YouTube more than three hundred hours of footage are uploaded every minute,
many documenting real-life social situations and interactions. Since the early 2000s, social science
researchers have developed new ways to use such data to study dynamics of social life. They trace
situations or events step-by-step to explain a outcome such as violence, mass panics, or team work
emergency rooms. To do so, they focus on aspects such as peoples’ interactions, movements, fields
of vision, exchanges of glances or gestures, and actors’ facial expressions and body postures. We
have combined related approaches and insights from applied research into a methodological
framework for conducting such analyses, labelled Video Data Analysis
Therefore video is divided into two parts:
1.Raw video
2.Edited video

1. Raw Video
What do you mean by raw video ?
Raw video is nothing but uncompressed video having many useless events. It is commonly used
by video cameras, video monitors, video recording devices (including general purpose computers),
and in video processors that perform functions such as image resizing, image rotation,
deinterlacing, and text and graphics overlay. While performing some operations on video it is
preferred best to perform tasks on raw video as it maintains the best possible quality, gives best
output with compression after video editing.
2. Edited Video
Edited video is the video which is compressed. Raw video from which useless events have been
removed from the raw video. After editing, most meaningless clips are cut off, so edited videos
usually have compact structures.

Now days , with the popularity of camera devices, a sheer amount of videos are captured and
shared online. Every day, vast video data floods the Internet social networking platform. It provides
users a convenient way to access to video data. However, it also makes data browsing time-
consuming. So we urgently need an efficient way to handle these huge video data.
Fortunately, video summary can be a good assistance in the data explosion era. It offers viewers
the video gist by generating a compact version of the video content. There are mainly two
categories of video summary. One is storyboard, which consists of key-frames . The other is video
skim, which is composed of video segments, namely key-shots. Usually, video shots are generated
by uniform cutting or segmentation models. The two versions of video summary have individual
advantages, e.g., storyboard represents the video with just a few frames, while video skim can
retain the dynamic characteristics of the original video. Both of them can not only provide a
viewer-friendly way to video browsing, but also have a wide range of applications, such as activity
recognition, event detection, and video embedding, etc.
In the early years, researchers focused on the summarization of edited videos which have been
preprocessed by editors, such as news, TV program, video ads and so on. Edited videos are usually
with compact structures, and most shots are informative. To summarize this kind of videos,
researchers focus on developing models to exploit video structure information (i.e., the relationship
among shots), and then select the most representative elements. Most of these models are based
on low-level appearance and motion features
Methodology:

A general framework is built by exploiting the mutual benefit between edited video and raw video
summarization. Practically, it is a challenging problem, since edited videos and raw videos have
different structures. However, despite of the differences between edited videos and raw videos,
they also share some commonalities in summarization. Ideally, for both edited and raw video
summary, they are supposed to contain important objects (importance) and representative shots of
the video content (representativeness), meanwhile with less redundancy (diversity). Last but not
least, the storyline of the summary should be smooth enough to make the viewer understand the
video content easily (storyness).

This report is based on the basic idea that the summaries of both edited video and raw video share
similar properties, we build a general summarization framework for them. The proposed
framework can be divided into the following steps:

First, to measure the properties of video summary, we design four models, i.e., importance,
representativeness, diversity and storyness. It is worth mentioning that these property models
consider both the characteristics of edited video and raw video, and are applicable to the
summarization of the two kinds of videos.

Second, to balance the influence of the four property models, a score function is built with the
weighted combination of them. The weights of the property models, denoted as property-weight,
are learned in a supervised manner. It is important to note that we learn respective property-weight
for edited video and raw video summarization, which is more reasonable than setting a common
property-weight for both of them.

Third, to augment the training data in learning the property weight, the training set is formed by
both edited videos and raw videos. Furthermore, to reduce the structure mess caused by the rough
mixture, a new parameter, denoted as mixing coefficient, is designed for the training videos.
Briefly, each training video is equipped with a pair of mixing-coefficients, reflecting its relevance
to edited video and raw video summarization.

The contributions of the proposed framework are summarized as follows:


1) We build a unified framework for the summarization of both edited videos and raw videos,
which considers the commonalities and differences between the two kind of videos.
2) We design four models to capture the properties of video summary. They are applicable to both
edited video and raw video summarization. Moreover, the score function developed with the
weighted combination of the four property models can measure the summary quality
comprehensively.
3) We propose to construct a combined training set with both edited videos and raw videos, which
can address the problem of lacking of training data. Moreover, the video structure mess in the
training set is reduced by the mixing coefficient.
Chapter 3
Now the problem arises detecting

First Person View

First person narrative is a point of view (who is telling a story) where the story is narrated by one
character at a time. This character may be speaking about him or herself or sharing events that he
or she is experiencing. First person can be recognized by the use of I or we.

In first person, we only see the point of view of one character. While this character may share
details about others in the story, we are only told what the speaker knows. An author may switch
from character to character, but still use first person narrative. This way, we may learn about what
other characters think and feel, but we are still limited in our knowledge because we must rely on
what the character shares.

We are not only limited by what the character shares, but what the character knows. He or she may
not have all the information or knowledge about events. We would also not know what other
characters are thinking. We also have to decide if the speaker is reliable with the information that
he or she gives. An unreliable narrator is a character and storyteller that cannot be trusted.

There are many different ways that an author may use first person narrator. There may be
an interior monologue, an inner voice or stream of consciousness. There may be a dramatic
monologue, a poetic form where a speaker reveals him or herself. There might be a plural first-
person perspective, where an author would represent a group. Finally, there might even be
a peripheral narrator, a first-person narrator that is not a main character.
Third person perspective

Third-person perspective puts you in direct control of a character that you can actually see in front
of you. Third-person, in grammar referred to as he, she or it. This means that you are not directly
the person you are controlling. You do not control his thoughts or what he looks at (the first-person
perspective, the you, gives you better control over putting you into someone else’s body even if,
in a grammar sense, it does not really count as first-person).

Advantages over First Person View

Visible character
Visible actions: jumps, climbing, taking over
Wider field of view
Close-up three dimensional objects are hard to portray realistically stereoscopic 3D and head
tracking. Eg: the protunding guns in any FPS

Challenges for first person video

In first person video, the person shooting the video is out of the video scope. He can shoot
everything but misses himself.

Datasets
Some types of datasets

TVSum
SumMe
YouTube
OVP
UT Egocentric
ADL
FPVSum

Datasets Types of No. of Duration Categories Annotations Edited/raw


vidoes videos
TVSum User 50 2 to 10 YouTube videos Fully(frame edited
generated minutes level)
videos of
events
SumMe Human 25 1 to 6 1st Person- Base fully raw
summaries minutes Jumping, Bike polo,
Scuba, Valparaiso
Downhill, Uncut
Evening flight
YouTube Youtube 120 to News,sports,cartoon annotated both
videos 500
seconds
OVP Diverse 5 1.5 to 6.5 annotated
datasets minutes
UT Head 4 3 to 5 eating, annotated raw
Egocentric mounted hours shopping,driving,
camers attending lecture,
cooking
ADL Object 20 20 to 60 Edited
bounding minutes and raw
boxes and
keyframes
used
FPVSum First 98 Over 7 Not fully raw
person vs hours annotated

Das könnte Ihnen auch gefallen