Beruflich Dokumente
Kultur Dokumente
Prepared By
INDIA
December 2012
Table of Contents
References 31
i
List of Tables
ii
List of Figures
iii
Assignment 1
2. System contains maximum 3 clients, each can enter or leave the system at any time
and one can design GUI as given in Figure 1.1.
3. Chat room is long lived server component and there is no GUI at server side.
1
Implementation of Chat application using socket programming in Java
4. All messages are to be broadcasted to all clients connected to the chat room.
1. Hardware Requirement
2. Software Requirement
1.4 Theory
In this assignment you will implement a Chat Server. Client process will send some
string or messages to the server. And in response to that the server process will send
the same string to all available client processes (i.e. it will broadcast the message).
Passed To:
DatagramSocket.receive();
DatagramSocket.send();
DatagramSocketImpl.receive();
DatagramSocketImpl.send();
MulticastSocket.send();
This class defines a socket that can receive and send unreliable datagram packets
over the network using the UDP protocol.
Public Constructors
public DatagramSocket();
In the implementation with TCP Socket you have to make use of the following classes
public class ServerSocket
Public Constructors
Public Methods
Methods
1.5 Procedure
1. Create a DatagramSocket.
2. Create a DatagramPacket with the specification of the Remote Host, port number
and the data to be sent.
3. When there is a request for connections then accept the connection using accept
method of the socket.
4. Read and write to the socket using DataInputStream and DataOutputStream re-
spectively.
1. Create a socket with the specification of the host machine (server) and the port
number.
Implement an application for a chat server and multiple clients using TCP and UDP
both. Compare the usage of TCP Vs UDP Sockets w.r.t. this application. Which
is best suitable?
Hence from this assignment you can learn how to build client and server application,
that communicate using socket. Also you can learn how reliable and unreliable
communication occurs in them.
5. Name Some Case Studies of Distributed Systems which you have studied?
6. If you are said to design a Distributed Systems for your Client which design issues
you are going to consider?
10. What is the Difference between Networked System and Distributed System?
12. Name Some Case Studies of Distributed Systems which you have studied?
13. If you are said to design a Distributed Systems for your Client which design issues
you are going to consider?
17. Explain the Difference between Message oriented Communication and Stream Ori-
ented Communication.
1. Hardware Requirement
10
Implementation of Remote Method Invocation using Java RMI
2. Software Requirement
2.4 Theory
The client lookup the server name in the registry to establish remote references.
The Stub serializing the parameters to skeleton, the skeleton invoking the remote
method and serializing the result back to the stub..
The stub is responsible for sending the remote call over to the server-side skeleton.
The stub opening a socket to the remote server, marshaling the object parameters
and forwarding the data stream to the skeleton.
A skeleton contains a method that receives the remote calls, unmarshals the param-
eters, and invokes the actual remote object implementation.
2.5 Procedure
/* SampleServer.java */
import java.rmi.*;
public interface SampleServer extends Remote
{
public int sum(int a,int b) throws RemoteException;
}
The server uses the RMISecurityManager to protect its resources while engaging
in remote communication.
/* SampleServerImpl.java */
import java.rmi.*;
import java.rmi.server.*;
import java.rmi.registry.*;
public class SampleServerImpl extends UnicastRemoteObject
implements SampleServer
{
SampleServerImpl() throws RemoteException
{
super();
}
The server must bind its name to the registry, the client will look up the server
name.
Use java.rmi.Naming class to bind the server name to registry. In this example
the name call SAMPLE-SERVER.
In the main method of your server object, the RMI security manager is created
and installed.
//RMIServer.java
public static void main(String args[])
{
try
{
//create a local instance of the object
SampleServerImpl Server = new SampleServerImpl();
//put the local instance in the registry
Naming.rebind("SAMPLE-SERVER " , Server);
System.out.println("Server waiting.....");
}
catch (java.net.MalformedURLException me)
{
System.out.println("Malformed URL: " + me.toString());
}
catch (RemoteException re)
{
System.out.println("Remote exception: " + re.toString());
}
}
In order for the client object to invoke methods on the server, it must first look
up the name of server in the registry.
The name specified in the URL must exactly match the name that the server
has bound to the registry.
The remote method invocation is programmed using the remote interface name
(remoteObject) as prefix and the remote method name sum as suffix.
//RMIClient.java
import java.rmi.*;
import java.rmi.server.*;
public class SampleClient
{
public static void main(String[] args)
{
//get the remote object from the registry
try
{
System.out.println("Security Manager loaded");
String url = "//localhost/SAMPLE-SERVER";
SampleServer remoteObject = (SampleServer)Naming.lookup(url);
Step 4 and 5: Compile the Java source files and Generate the client stubs and server
skeletons
Once the interface is completed, you need to generate stubs and skeleton code.
The RMI system provides an RMI compiler (rmic) that takes your generated
interface class and procedures stub code on its self.
The RMI applications need install to Registry. And the Registry must start manual
by call rmiregistry.
The rmiregistry uses port 1099 by default. You can also bind rmiregistry to a
different port by indicating the new port number as : rmiregistry new port
On Windows, you have to type in from the command line: start rmiregistry
Advancements:
Create an RMI application for following requirements
3. Simple Calculator
4. Time Server
5. Echo Server
6. String Operations
You have to develop an RMI Server, where database will be residing. RMI Client will
have GUI with functions like, Insert, delete, update.
Implementation of Client-Server
architecture using Socket
Programming in Linux
Imagine a Client-Server architecture (As shown in figure 3.1 ), where user stores the file
on a server. The main server splits that file into two or more fragments and store each
fragment on separate storage server. When client retrieve the file from the main server,
the main server again retrieves the file in fragments from storage servers and present it
as a one file to user.
19
Implementation of Client-Server architecture using Socket Programming in Linux
1. Hardware Requirement
2. Software Requirement
3.4 Theory
In this assignment you will implement client-server architecture using socket.A socket is
a communication mechanism that allows client/server systems to be developed either
locally, on a single machine or across network.Client and main server can communicate
by using socket. Main server and fragmented server can also communicate by using
socket.
3.5 Procedure
1. Server creates socket by calling socket system call and it cant be shared with another
process.
#include<sys/types.h>
#include<sys/socket.h>
4. Servers accept incoming requests by calling accept. When server calls accept, new
socket is get created that is distinct from named socket and is used for communi-
cation with client.
int accept(int sockfd,struct sockaddr *addr, int addrlen);
5. Client creates socket by using socket system call and send connection request to
server through connect system call.
int connect(int sockfd,struct sockaddr *addr, int addrlen);
7. Finally, client and server calls close to close the connection. int close(int sockfd);
Simple network client example:
#include<sys/types.h>
#include<sys/socket.h>
#include<stdio.h>
#include<netinet/in.h>
#include<arpa/inet.h>
#include<unistd.h>
#include<stdlib.h>
int main()
{
int sockfd;
int len;
struct sockaddr_in address;
int result;
char ch = A;
//Creating and naming the socket
sockfd = socket(AF_INET,SOCK_STREAM,0);
address.sin_family = AF_INET;
address.sin_addr.s_addr = inet_addr("127.0.0.1");
address.sin_port = 1234;
len = sizeof(address);
//Connect our socket to server socket
result = connect(sockfd,(struct sockaddr *) &address, len);
if(result == -1)
{ perror("oops:client1");exit(1); }
//Read and Write via sockfd
write(sockfd,&ch,1);
read(sockfd,&ch,1);
printf("\n Servers says : %c\n",ch);
close(sockfd);
exit(0);
}
#include<sys/types.h>
#include<sys/socket.h>
#include<stdio.h>
#include<netinet/in.h>
#include<arpa/inet.h>
#include<unistd.h>
#include<stdlib.h>
int main()
{
int server_sockfd,client_sockfd;
int server_len,client_len;
struct sockaddr_in server_address;
struct sockaddr_in client_address;
//Create and name the socket
server_sockfd = socket(AF_INET,SOCK_STREAM,0);
server_address.sin_family = AF_INET;
server_address.sin_addr.s_addr =inet_addr("127.0.0.1");
server_address.sin_port = 1234;
server_len = sizeof(server_address);
$ cc -o Serverapp server2.c
$ cc -o Clientapp client2.c
$ ./Serverapp &
$ ./ Clientapp
From this assignment you can study how to write a socket program in C under Linux.
Perform case study on cloud computing which will include Definition, Benefits,
Drawbacks, All the services like Process as a Service, Platform as a Service, Info as a
Service, Integration as a Service, Security as a Service, Storage as a Service,
Governance or Management as a Service, TAAS, Infrastructure as a Service.
4.3 Theory
Definition Cloud computing is a technology that uses the internet and central remote
servers to maintain data and applications.
Cloud computing allows consumers and businesses to use applications without in-
stallation and access their personal files at any computer with internet access.
This technology allows for much more efficient computing by centralizing storage,
memory, processing and bandwidth.
26
Case Study on Cloud Computing
You dont need a software or a server to use them. All a consumer would need is
just an internet connection and you can start sending emails. The server and email
management software is all on the cloud (internet) and is totally managed by the
cloud service provider Yahoo , Google etc.
3. Location independent resource pooling: processing and storage demands are bal-
anced across a common infrastructure with no particular resource assigned to any
individual user;
5. Pay per use: consumers are charged fees based on their usage of a combination of
computing power, bandwidth use and/or storage
Architecture
Advantages of cloud computing
1. Reduced Cost
Cloud technology is paid incrementally, saving organizations money.
2. Increased Storage
Organizations can store more data than on private computer systems.
3. Highly Automated
No longer do IT personnel need to worry about keeping software up to date.
4. Flexibility
Cloud computing offers much more flexibility than past computing methods.
5. More Mobility
Employees can access information wherever they are, rather than having to remain
at their desks.
reputation, and any sign of a security breach would result in a loss of clients and
business.
2. Dependancy(loss of control)
3. Cost
Higher costs. While in the long run, cloud hosting is a lot cheaper than traditional
technologies, the fact that its currently new and has to be researched and improved
actually makes it more expensive. Data centers have to buy or develop the software
thatll run the cloud, rewire the machines and fix unforeseen problems (which are
always there). This makes their initial cloud offers more expensive. Like in all other
industries, the first customers pay a higher price and have to deal with more issues
than those who switch later (although it would be very hard to create and improve
new technologies without these initial adopters).
4. Decreased flexibility
This is only a temporary problem (as the others on this list), but current technologies
are still in the testing stages, so they dont really offer the flexibility they promise.
Of course, thatll change in the future, but some of the current users might have
to deal with the facts that their cloud server is difficult or impossible to upgrade
without losing some data, for example.
31
Laboratory Manual
Prepared By
Prof.Shah Sahil K.
INDIA
December 2013
Table of Contents
i
TABLE OF CONTENTS TABLE OF CONTENTS
5 Case Study 15
5.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.2 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.3 Post Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
References 16
ii
List of Figures
1.1 Relation between frequency of word and significance of word [Luhns idea] . . . . . . . . 3
iii
Assignment 1
Implementation of Conflation
Algorithm
JDK 1.7
1
1.4. THEORY Implementation of Conflation Algorithm
1.4 Theory
Information Retrieval
Calvin Mooers coined the term information retrieval in 1950. In the context of library and information
science, we mean to get back information, which is, in a way, hidden, from normal sight or vision.
According to, J.H. Shera: It is,The process of locating and selecting data, relevant to a given require-
ment. Calvin Mooers:Searching and retrieval of information from storage, according to specification
by subject.
In order to develop an automated text processing system which by means of computable methods with
the minimum of human intervention will generate from the input text (full text, abstract, or title)
a document representative adequate for use in an automatic retrieval system,conflation algorithm is
mainly useful. A document will be indexed by a name if one of its significant words occurs as a member
of that class. Such a system will usually consist of three parts:
Luhn proposed that the frequency of word occurrence in an article furnishes a useful measurement of
word significance. Luhn used Zipfs Law as a null hypothesis to specify two cut-offs, an upper and a
lower,thus excluding non-significant words. The words exceeding the upper cut-off were considered to
be common and those below the lower cut-off rare, and therefore not contributing significantly to the
content of the article. He thus devised a counting technique for finding significant words. The same is
shown by using a plot of frequency versus rank.
Stop words
These are the very common words occurring frequently in a sentence and which does not have any
meaning and these will not contribute in relevance of the sentence.
Example of stop words include but not limited to words like a,an,the,is,was,are,were,he,she,it etc.
Non words
These are the words/notations used in order to represent the sentence with proper formatting characters.
Example of non words include all formatting(or special) characters like ?,,,;,:,& etc.
The removal of high frequency words, Stop words or fluff words is one way of implementing Luhns
Figure 1.1: Relation between frequency of word and significance of word [Luhns idea]
upper cut-off. This is normally done by comparing the input text with a stop word list of words which
are to be removed. The advantages of the process are not only that non-significant words are removed
and will therefore not interfere during retrieval, but also that the size of the total document file can be
reduced by between 30 and 50 per cent.
Terms with a common stem will usually have similar meanings, for example: CONNECT, CON-
NECTED, CONNECTING, CONNECTION, CONNECTIONS. Performance of an IR system will be
improved if term groups such as this are conflated into a single term. This may be done by removal of
the various suffixes -ED, -ING, -ION, -IONS, etc to leave the single term CONNECT. In addition, the
suffix stripping process will reduce the total number of terms in the IR system, and hence reduce the
size and complexity of the data in the system, which is always advantageous.
Assumption for the algorithm is: a consonant in a word is: a letter other than A, E, I, O or U, and
other than Y preceded by a consonant. A vowel in a word is: if a letter is not a consonant it is a
vowel. Every consonant is represented by C and every vowel is represented by V. A list CCC.... of
length greater than 0 will be denoted by C, and a list VVV... of length greater than 0 will be denoted
by V. Any word, or part of a word, therefore has one of the four forms:
These all may be represented by the single form: [C]VCVC ... [V]. Where, the square brackets denote
arbitrary presence of their contents. Using (VC)m to denote VC repeated m times, this may again be
written as:
[C](VC)m[V]
m will be called the measure of any word or word part when represented in this form.
Some examples of it are as follows:
This means that if a word ends with the suffix S1 and the stem before S1 satisfies the given condition,
S1 is replaced by S2. The condition is usually given in terms of m, e.g.:
Here S1 is EMENT and S2 is null. This would map REPLACEMENT to REPLAC, since REPLAC is
a word part for which m = 2.
For two stems to be equivalent they must match except for their endings, which themselves must appear
in the list as equivalent.
For example, stems such as ABSORB- and ABSORPT- are conflated because there is an entry in the
list defining B and PT as equivalent stem-endings if the preceding characters match.
Document representative
It is a list of significant words(words having high frequency of occurrence). These are often referred to
as the documents index terms or keywords.
1.5 Procedure
1. A text file is taken as a input to conflation algorithm
3. Process the input file to remove the stop words and non words. This step is known as document
preprocessing
5. Detect the equivalent stems and find the frequency of occurrence of each term in the document.
6. Based on Luhns idea decide the upper bound(maximum frequency value of the term) and lower
cutoff(based on maximum frequency value it can be decided).Apply Luhns idea to decide signifi-
cant word set.
Concept of IR Models
2.3 Theory
2.3.1 Clustering
Clustering can be considered the most important unsupervised learning problem; so, as every other
problem of this kind, it deals with finding a structure in a collection of unlabeled data. A definition of
clustering could be the process of organizing objects into groups whose members are similar in some
way. A cluster is therefore a collection of objects which are similar between them and are dissimilar
to the objects belonging to other clusters.Commonly clustering can be classified into following types
2. Hierarchical Clustering
6
2.3. THEORY Implementation of Single Pass Clustering Algorithm
The clustering algorithms which only require one pass of the file of object descriptions,are known as
Single-Pass Algorithms.
Given a collection of clusters and a threshold value h, if a new document n has the highest similarity
more than h to some cluster, the document n is appended to the cluster, and if there exists no cluster, a
new cluster is generated which contains only the document n. Clearly Single Pass Clustering is suitable
for incremental clustering to temporal data (or data stream) since, once a document is assigned to a
cluster, it is not changed in the future.
Algorithm
2. The first object becomes cluster representative(or centroid) of the first cluster.
3. Each subsequent object is matched against all cluster representatives existing at its processing
time.When a new document(object descriptor) di(i > 1) comes in, calculate the similarity values
to all the clusters C by using cosine similarity between cluster representative and document.
4. A given object(document) is assigned to one cluster (or more if overlap is allowed) according to
some condition(threshold value) on the matching function.
5. When an object is assigned to a cluster the representative for that cluster is recomputed.
If D1,D2,....,Dn are the documents in the cluster and each Di is represented by a numerical vec-
tor(d1,d2,...dt) then the centroid C of the cluster is given by
Where, kDik = d12 + d22 + .... + dn2
6. If an object fails a certain test(condition) it becomes the cluster representative of a new Cluster.
2.4 Procedure
1. 4-5 text files are taken as a input to Single Pass Clustering Algorithm.These input files should be
represented in term vs document matrix form(Vector Space Model Representation)
2. Pass each input text file(document) serially through algorithm till all documents are covered.
3.3 Theory
For a set of attributesor features A and a set of values V for a text document, a record R is a subset
of the cartesian product A x V in which each attribute has one and only one value. Thus R is a set of
ordered pairs of the form (an attribute, its value). For example, the record for a document which has
been processed by an automatic content analysis algorithm would be R = (K1, x1), (K2, x2) . . . (Km,
xm)
Records are collected into logical units called files. They enable one to refer to a set of records by
name, the file name. The records within a file are often organized according to relationships between
9
3.4. PROCEDURE Implementation of Inverted Index Structure
the records. This logical organization has become known as a file structure (or data structure).
3.3.2 Indexing
In general, indexing is the technique of mapping of identifiers to set of objects in order to fasten the
searching of the objects.In IR perspective, objects will be set of documents or document representatives.
Inverted index/file structure
An inverted file is a file structure in which every list contains only one record. Remember that a list
is defined with respect to a keyword K, so every K-list contains only one record.This implies that the
directory will be such that ni = hi for all i, that is, the number of records containing Ki will equal the
number of Ki-lists. So the directory will have an address for each record containing Ki . For document
retrieval this means that given a keyword we can immediately locate the addresses of all the documents
containing that keyword. The definition of inverted files does not require that the addresses in the
directory are in any order. However, to facilitate operations such as conjunction (and) and disjunction
(or) on any two inverted lists, the addresses are normally kept in record number order. This means
that and and or operations can be performed with one pass through both lists. The penalty we pay
is of course that the inverted file becomes slower to update.
3.4 Procedure
1. 3-4 text files are taken as a input in order to build inverted index structure.
3. For each distinct keyword,maintain a data structure containing keyword and (Document no.,Position
of keyword in whole document)
Implementation of Feature
Extraction in 2D Color Images
4.3 Theory
Transforming the input data into the set of features is called feature extraction. If the features extracted
are carefully chosen it is expected that the features set will extract the relevant information from the
input data in order to perform the desired task using this reduced representation instead of the full size
input.Alternatively, feature extraction can be termed as method of capturing visual content of images
for indexing and retrieval. Features of images used in Multimedia-IR can be of following types:
12
4.4. PROCEDURE Implementation of Feature Extraction in 2D Color Images
2. Domain-specific features
These features depict the characteristics of the image domain. Ex: Fingerprints, human face,eye
retina.
3. General features
Ex: color, texture, shape, height, width, aspect ratio.
If the features extracted are carefully chosen it is expected that the features set will extract the
relevant information from the input data in order to perform the desired task using this reduced
representation instead of the full size input.
The issue of choosing the features to be extracted should consider following concerns:
The features should carry enough information about the image and should not require any domain-
specific knowledge for their extraction.
They should be easy to compute in order for the approach to be feasible for a large image collection
and rapid retrieval.
They should relate well with the human perceptual characteristics since users will finally determine
the suitability of the retrieved images.
Because of perception subjectivity, there does not exist a single best representation for a feature. Color
feature is one of the most widely used feature in Image Retrieval.
4.4 Procedure
Process of Feature extraction
2. Scan the input image in a single pass and maintain a count of the number of pixels found at each
feature (color, intensity,texture etc.)
3. Each 8-bit image is consisting of 0-255 gray levels/bins. Extraction process involves finding the
pixel (x, y) from image which has particular gray level. This process can be applied to whole
image.
4. Final output will be 256 grey levels/bins containing pixels having respective grey level values.These
extracted values can be used to generate a histogram.(In this case,it is a graph showing the number
of pixels in an image at each different intensity value found in that image.)
For an 8-bit grey scale image there are 256 different possible intensities, and so the histogram will
graphically display 256 numbers showing the distribution of pixels amongst those grey scale values.
4. Define Feature Extraction.How it is useful in reducing the storage space of multimedia documents?
Case Study
5.2 Theory
Explain the presentation topic by clearly stating each point thoroughly. Use examples, diagrams to
make the explanation more effective.
15
References
[2] Yates & Neto, Modern Information Retrieval, Pearson Education, ISBN 81-297-0274-6
[3] M.F.Porter, An algorithm for suffix stripping, Originally published in July 1980.
[4] Bob Boiko & Wiley, Content Management Bible, 2nd Edition, ISBN-978-0-7645-7371-2,
E-book available.
16