Sie sind auf Seite 1von 101

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

SIDDHARTH INSTITUTE OF PG STUDIES Page | 1

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

INTRODUCTION
PEER-TO-PEER (P2P) networks have become very popular in the last few years. Nowadays, they are the most widespread approach for exchanging data among large communities of users in the file sharing context. In this scenario, the success of P2P-based solutions is strictly related to the use of lossy data compression techniques (such as MPEG formats), which yield reasonable detail levels in representing large amounts of information and make data exchange feasible in practice by significantly reducing data transmission costs. However, the problem of suitably extending-data-compression-based solutions to application contexts other than file sharing has not been deeply investigated yet. Specifically, no P2P-based solution has imposed itself as an effective evolution of traditional distributed databases. This is quite surprising, as the huge amount of resources provided by P2P networks (in terms of storage capacity, computing power, and data transmission capability) could effectively support data management. In this scenario, information is represented as points in a multidimensional space whose dimensions correspond to different perspectives over data: users explore data and retrieve aggregates by issuing range queries, i.e., queries specifying an aggregate operator and the range of the data domain from which the aggregate information should be retrieved. Specifically, we will consider the case of analytical applications dealing with historical data, which typically require huge computation and storage capabilities, due to the large amount of data which need to be accessed to evaluate queries. Although the multidimensional data model is substantially more complex than the representation paradigm adopted in the file sharing context (where data are organized according to hname; filei pairs), analytical applications dealing with historical multidimensional data and file-sharing applications share a fundamental aspect: they can rely on lossy data compression. In fact, analogously to tools for reproducing audio and/or video files, a lot of applications dealing with multidimensional data can effectively accomplish their tasks even in the case that only an approximate representation of data is available

ABSTRACT:
SIDDHARTH INSTITUTE OF PG STUDIES Page | 2

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks


P2P-based framework supporting the extraction of aggregates from historical

multidimensional data is proposed, which provides efficient and robust query evaluation. When a data population is published, data are summarized in a synopsis, consisting of an index built on top of a set of subsynopses (storing compressed representations of distinct data portions). The index and the subsynopses are distributed across the network, and suitable replication mechanisms taking into account the query workload and network conditions are employed that provide the appropriate coverage for both the index and the subsynopses. Index Terms P2P networks, multidimensional data management, data compression. OBJECTIVES: P2P, is any distributed network architecture composed of participants that make a portion of their resources (such as processing power, disk storage or network bandwidth) directly available to other network participants. Description : Our aim is devising a P2P-based framework supporting the analysis of multidimensional historical data. Specifically, our efforts will be devoted to combining the amenities of P2P networks and data compression to provide a support for the evaluation of range queries, possibly trading off efficiency with accuracy of answers. The framework should enable members of an organization to cooperate by sharing their resources (both storage and computational) to host (compressed) data and perform aggregate queries on them, while preserving their autonomy. A framework with these characteristics can be useful in different application contexts. For instance, consider the case of a worldwide virtual organization with users interested in geographical data, as well as the case of a real organization on an enterprise network. In both cases, even users who are not continuously interested in performing data analysis can make a\part of their resources available for supporting analysis tasks needed by others, if their own capability of performing local tasks is preserved. 1.1.1 Challenges
SIDDHARTH INSTITUTE OF PG STUDIES Page | 3

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

The management of compressed data on unstructured P2P networks is an intriguing issue, but poses several research challenges, which we discuss in the following. 1.1.1.1 Compression A compression technique must be devised which is able to create prone-to-bedistributed data synopses supporting the efficient evaluation of aggregates, possibly affected by tolerable error rates. Several compression techniques have been proposed in the literature in the centralized scenario, which are able to compute synopses that enable aggregate queries to be estimated with very good accuracy. These techniques could be employed in the P2P scenario to build a single synopsis to be replicated over the network. However, in this case, . although the cost of disk storage is continuously and rapidly decreasing, it may still be difficult to find peers for which hosting replicas of synopses has a negligible cost, while autonomy is a requirement in our setting: using traditional compression techniques, synopses providing reasonable error rates may have a non-negligible size (usually not under 1 percent of the size of the original data set, e.g., 100 GB from a 10 TB data set). . although compressing the data certainly makes replication less resource consuming, replicating the entire synopsis each time would require storage and network resources that could be saved if only some specific portion of the synopsis could be replicated. These drawbacks would be overcome if the compressed synopsis were subdivided into tiny subsynopses which are independently replicated and disseminated on the network when needed. Peers would, therefore, be asked to host replicas of small chunks of data. This way, the autonomy requirement would not result in a limit on the overall size of the synopsis (since the whole storage capacity of the network could be employed to store the whole synopsis), thus enabling the construction of synopses that provide high-quality estimates. In fact, it is well known that the naive solution of obtaining independent subsynopses by first partitioning the data and then compressing each portion separately is less effective than compressing the data altogether. This is analogous to what generally happens with common lossless compression algorithms, such as LZW [43]: compressing n pieces of a text separately results in n independent synopses whose overall size is larger than the synopsis obtained by compressing the whole text. 1.1.1.2 Indexing Once a compression technique yielding subsynopses with the desired properties has been defined, the problem of making the compressed data efficient to be located over the network
SIDDHARTH INSTITUTE OF PG STUDIES Page | 4

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

must be tackled. Appropriate techniques are thus needed to distribute the compressed data and index them for efficient access. In unstructured P2P systems, no indexing mechanism is generally provided, and data accessibility is achieved by disseminating replicas. Thus, users explore the network (through flooding or random walking) until the desired data are located. This approach could be adapted to our context by simply disseminating several copies of all the subsynopses; however, this way, answering a range query could require several subsynopses to be located, thus making query evaluation bandwidth-consuming. A better way to address this issue is to design an indexing mechanism that supports the efficient location of the subsynopses involved in the query evaluation Hence, the challenge is to define an indexing technique supporting the location of our subsynopses in an unstructured network, such that the portions of the index and the responsibility of hosting it can be dynamically distributed among the peers, while preserving their autonomy. 1.1.1.3 Replication A replication scheme capable of maintaining appropriate levels of coverage w.r.t. the evolution of user interests and network conditions must be designed, to ensure accessibility and robustness. Existing replication strategies for unstructured P2P networks treat data sets as atomic objects, as they perform a number of replicas of a data set each time a query is posed on it. As explained before, in our scenario, this would limit both the size of the synopsis (thus affecting the accuracy of the compressed data) and the frequency of replica creations (thus limiting the responsiveness w.r.t. volatility); moreover, the index itself must be properly replicated too. Hence, the challenge is to exploit the fragmentation of the synopsis and the multidimensionality of data to detect the specific portions of data and index which need to be replicated, that is, the regions of the data in which users are interested most or whose accessibility is not satisfactory, as well as the portions of the index referencing these regions. Based on this, a fine-grained replication strategy must be devised which, instead of blindly creating replicas of the whole synopsis, makes new copies of specific portions of data and index when needed.

SIDDHARTH INSTITUTE OF PG STUDIES Page | 5

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

SIDDHARTH INSTITUTE OF PG STUDIES Page | 6

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

Requirements Analysis is done in order to understand the problem for which the software system is to solve. For example, the problem could be automating an existing manual process, or developing a completely new automated system, or a combination of the two. For large systems which have a large number of features, and that need to perform many different tasks, understanding the requirements of the system is a major task. The emphasis in requirements Analysis is on identifying what is needed from the system and not how the system will achieve it goals. This SIDDHARTH INSTITUTE OF PG STUDIES Page | 7

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

task is complicated by the fact that there are often at least two parties involved in software development - a client and a developer. The developer usually does not understand the client's problem domain, and the client often does not understand the issues involved in software systems. This causes a communication gap, which has to be adequately bridged during requirements Analysis. In most software projects, the requirement phase ends with a document describing all the requirements. In other words, the goal of the requirement specification phase is to produce the software requirement specification document. The person responsible for the requirement analysis is often called the analyst. There are two major activities in this phase - problem understanding or analysis and requirement specification in problem analysis; the analyst has to understand the problem and its context. Such analysis typically requires a thorough understanding of the existing system, the parts of which must be automated. Once the problem is analyzed and the essentials understood, the requirements must be specified in the requirement specification document. For requirement specification in the form of document, some specification language has to be selected (example: English, regular expressions, tables, or a combination of these). The requirements documents must specify all functional and performance requirements, the formats of inputs, outputs and any required standards, and all design constraints that exits due to political, economic environmental, and security reasons. The phase ends with validation of requirements specified in the document. The basic purpose of validation is to make sure that the requirements specified in the document, actually reflect the actual requirements or needs, and that all requirements are specified. Validation is often done through requirement review, in which a group of people including representatives of the client, critically review the requirements specification.

SOFTWARE REQUIREMENT OR ROLE OF SOFTWAREREQUIREMENT SPECIFICATION (SRS)


IEEE (Institute of Electrical and Electronics Engineering) defines as, 1. A condition of capability needed by a user to solve a problem or achieve an objective; SIDDHARTH INSTITUTE OF PG STUDIES Page | 8

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

2. A condition or capability that must be met or possessed by a system to satisfy a contract, standard, specification, or other formally imposed document.

Note that in software requirements we are dealing with the requirements of the proposed system, that is, the capabilities that system, which is yet to be developed, should have. It is because we are dealing with specifying a system that does not exist in any form that the problem of requirements becomes complicated. Regardless of how the requirements phase proceeds, the Software Requirement Specification (SRS) is a document that completely describes what the proposed software should do without describing how the system will do it?. The basic goal of the requirement phase is to produce the Software Requirement Specification (SRS), which describes the complete external behavior of the proposed software.

Existing System

In Existing system the multidimensional data model is substantially more complex than the representation paradigm adopted in the file sharing context (where data are organized according to (name; file pairs), analytical applications dealing with historical multidimensional data and file-sharing applications share a fundamental aspect: they can rely on lossy data compression.

In the case of analytical applications dealing with historical data, which typically require huge computation and storage capabilities, due to the large amount of data which need to be accessed to evaluate queries. However, the problem is suitably extending-data-compression-based solutions to application contexts other than file sharing has not been deeply investigated yet. The tools for reproducing audio and/or video files, a lot of applications dealing with multidimensional data can effectively accomplish their tasks even in the case that only an approximate representation of data is available

Proposed System
We proposed a framework for sharing and performing analytical queries on historical multidimensional data in unstructured peer-to-peer networks.

SIDDHARTH INSTITUTE OF PG STUDIES Page | 9

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

In our approach, participants make their resources (and possibly their data in a suitable compressed format) available for the other peers in exchange for the possibility of accessing and posing range queries against the data published by others. Our solution is based on suitable data summarization and indexing techniques, and on mechanisms for data distribution and replication that properly take into account the need of preserving the autonomy of peers as well as the interest exhibited by the users in the data to support an efficient query evaluation. The experimental results showed the effectiveness of our approach in providing fast and accurate query answers, and ensuring the robustness that is mandatory in peer-to-peer settings. .

Modules Explanation
Our proposal is a framework supporting the sharing and the analysis of compressed historical multidimensional data over an unstructured P2P network. From the user standpoint, two tasks are supported: data publication and data querying. Data Publication Let p be a peer which is willing to share a historical multidimensional data set D so that the other peers can pose aggregate range queries against it. In order to make its data suitable for being distributed across the network, p builds a synopsis of D by first appropriately partitioning D, and then, compressing each portion of data in the partition. Peer p also builds an index over these subsynopses, which, again, is properly fragmented in order to make it prone to be distributed. Finally, the subsynopses and the index portions are disseminated across the network, along with metadata about D. The assignment of data and index portions to peers takes into account the willingness of peers to share their resources. Data Querying Exploration queries can be issued by peers to discover the shared data sets in which they may be interested. These queries specify criteria that are matched against the metadata associated with each available data set. The result of the exploration process is a set of matching data sets
SIDDHARTH INSTITUTE OF PG STUDIES Page | 10

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

and for each of them, a set of peers that should be contacted to start the evaluation of range queries, i.e., peers hosting portions of the distributed index that are thus capable to appropriately route range queries. The main contributions of this paper and its organization may be summed up as follows: 1. a compression technique for building an indexed aggregate structure over a multidimensional data population, prone to be distributed, and accessed across a P2P network. 2. a storage model which employs additional data structures to support efficient and robust query answering over compressed data in an unstructured P2P network. 3. a dynamic replication scheme capable of maintaining appropriate levels of coverage w.r.t. the evolution of the query workload and the network condition. Partitioning the Data Domain The aim of the partitioning step is to divide the data domain into nonoverlapping blocks. These blocks will be compressed separately, yielding distinct subsynopses. The assignment of different amounts of storage space to the blocks for representing their subsynopses should depend on the differences in homogeneity among the blocks. Intuitively enough, the more homogeneous the data inside a block, the smaller the amount of information needed to effectively accomplish its summarization.

Splitting Blocks of the Partition Hence, employing this splitting strategy aims at creating blocks with similar degrees of homogeneity, while refining the partition toward more and more homogeneous blocks. Specifically, having blocks with similar homogeneity is likely to yield subsynopses with similar accuracy, while having blocks as much homogeneous as possible is likely to enhance the accuracy of each subsynopsis. 2.1.2 Distributing Blocks of the Current Partition
SIDDHARTH INSTITUTE OF PG STUDIES Page | 11

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

First, a fixed portion Bmin of B is assigned to every block (the meaning of Bmin will be clearer in the following), and then, the remainder of B is distributed on the basis of the homogeneity of the blocks. That is, if the current partition consists of k blocks b1; . . . ; bk, then

each bi is assigned the following amount of storage space: block according to the compression technique adopted

The value of Bmin is the amount of space needed to store the most compact representation of a

SIDDHARTH INSTITUTE OF PG STUDIES Page | 12

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

Feasibility Study
The feasibility of the project is analyzed in this phase and business proposal is put fortwith a very general plan for the project and some cost estimates. During system analysis the feasibility study of the proposed system is to be carried out. This is to ensure that the proposed system is not a burden to the company. For feasibility analysis, some understanding of the major requirements for the system is essential.
SIDDHARTH INSTITUTE OF PG STUDIES Page | 13

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

Three key considerations involved in the feasibility analysis are

ECONOMICAL FEASIBILITY TECHNICAL FEASIBILITY SOCIAL FEASIBILITY

ECONOMICAL FEASIBILITY This study is carried out to check the economic impact that the system will have on the organization. The amount of fund that the company can pour into the research and development of the system is limited. The expenditures must be justified. Thus the developed system as well within the budget and this was achieved because most of the technologies used are freely available. Only the customized products had to be purchased. TECHNICAL FEASIBILITY This study is carried out to check the technical feasibility, that is, the technical requirements of the system. Any system developed must not have a high demand on the available technical resources. This will lead to high demands on the available technical resources. This will lead to high demands being placed on the client. The developed system must have a modest requirement, as only minimal or null changes are required for implementing this system.

SOCIAL FEASIBILITY The aspect of study is to check the level of acceptance of the system by the user. This includes the process of training the user to use the system efficiently. The user must not feel threatened by the system, instead must accept it as a necessity. The level of acceptance by the users solely depends on the methods that are employed to educate the user about the system and

SIDDHARTH INSTITUTE OF PG STUDIES Page | 14

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

to make him familiar with it. His level of confidence must be raised so that he is also able to make some constructive criticism, which is welcomed, as he is the final user of the system.

SIDDHARTH INSTITUTE OF PG STUDIES Page | 15

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

SPIRAL MODEL
SPIRAL MODEL was defined by Barry Boehm in his 1988 article, A spiral Model of Software Development and Enhancement. This model was not the first model to SIDDHARTH INSTITUTE OF PG STUDIES Page | 16

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

discuss iterative development, but it was the first model to explain why the iteration models. As originally envisioned, the iterations were typically 6 months to 2 years long. Each phase starts with a design goal and ends with a client reviewing the progress thus far. Analysis and engineering efforts are applied at each phase of

the project, with an eye toward the end goal of the project. The steps for Spiral Model can be generalized as follows:

The new system requirements are defined in as much details as

possible. This usually involves interviewing a number of users representing all the external or internal users and other aspects of the existing system.

A preliminary design is created for the new system. A first prototype of the new system is constructed from the preliminary

design. This is usually a scaled-down system, and represents an approximation of the characteristics of the final product.

A second prototype is evolved by a fourfold procedure: Evaluating the first prototype in terms of its strengths, weakness, and risks. Defining the requirements of the second prototype. Planning an designing the second prototype. Constructing and testing the second prototype.

At the customer option, the entire project can be aborted if the risk is deemed too great. Risk factors might involve development cost overruns, operating-cost miscalculation, or any other factor that could, in the customers judgment, result in a less-than-satisfactory final product.

SIDDHARTH INSTITUTE OF PG STUDIES Page | 17

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

The existing prototype is evaluated in the same manner as was the previous prototype, and if necessary, another prototype is developed from it according to the fourfold procedure outlined above.

The preceding steps are iterated until the customer is satisfied that the refined prototype represents the final product desired.

The final system is constructed, based on the refined prototype. The final system is thoroughly evaluated and tested. Routine

maintenance is carried on a continuing basis to prevent large scale failures and to minimize down time.

The following diagram shows how a spiral model acts like:

SIDDHARTH INSTITUTE OF PG STUDIES Page | 18

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

Fig 1.0-Spiral Model

SIDDHARTH INSTITUTE OF PG STUDIES Page | 19

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

ADVANTAGES:
Estimates(i.e. budget, schedule etc .) become more relistic as work progresses, because important issues discoved earlier. It is more able to cope with the changes that are software development generally entails.

You are now ready to edit, run, and compile the project with DJava..

SPIRAL MODEL DESCRIPTION


The development spiral consists of four quadrants as shown in the figure above Quadrant 1: Determine objectives, alternatives, and constraints. Quadrant 2: Evaluate alternatives, identify, resolve risks. Quadrant 3: Develop, verify, next-level product. Quadrant 4: Plan next phases. Although the spiral, as depicted, is oriented toward software development, the concept is equally applicable to systems, hardware, and training, for example. To better understand the scope of each spiral development quadrant, lets briefly address each one.

QUADRANT 1: DETERMINE OBJECTIVES, ALTERNATIVES, AND CONSTRAINTS


Activities performed in this quadrant include: Establish an understanding of the system or product objectivesnamely performance, functionality, and ability to accommodate change. SIDDHARTH INSTITUTE OF PG STUDIES Page | 20

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

Investigate implementation alternativesnamely design, reuse, procure, and procure/ modify. Investigate constraints imposed on the alternativesnamely technology, cost, schedule, support, and risk. Once the system or products objectives, alternatives, and constraints are understood, Quadrant 2 (Evaluate alternatives, identify, and resolve risks) is performed.

QUADRANT 2: EVALUATE ALTERNATIVES, IDENTIFY, RESOLVE RISKS


Engineering activities performed in this quadrant select an alternative approach that best satisfies technical, technology, cost, schedule, support, and risk constraints. The focus here is on risk mitigation. Each alternative is investigated and prototyped to reduce the risk associated with the development decisions. Boehm describes these activities as follows: This may involve prototyping, simulation, benchmarking, reference checking, administering user questionnaires, analytic modeling, or combinations of these and other risk resolution techniques. The outcome of the evaluation determines the next course of action. If critical operational and/or technical issues (COIs/CTIs) such as performance and interoperability (i.e., external and internal) risks remain, more detailed prototyping may need to be added before progressing to the next quadrant. Dr. Boehm notes that if the alternative chosen is operationally useful and robust enough to serve as a low-risk base for future product evolution, the subsequent risk-driven steps would be the evolving series of evolutionary prototypes going toward the right (hand side of the graphic) the option of writing specifications would be addressed but not exercised. This brings us to Quadrant 3.

SIDDHARTH INSTITUTE OF PG STUDIES Page | 21

Y9MC93007

Managing Multidimensional Historical Aggregate

Data in Unstructured P2P Networks QUADRANT 3: DEVELOP, VERIFY, NEXT-LEVEL PRODUCT


If a determination is made that the previous prototyping efforts have resolved the COIs/CTIs, activities to develop, verify, next-level product are performed. As a result, the basic waterfall approach may be employedmeaning concept of operations, design, development, integration, and test of the next system or product iteration. If appropriate, incremental development approaches may also be applicable.

QUADRANT 4: PLAN NEXT PHASES


The spiral development model has one characteristic that is common to all modelsthe need for advanced technical planning and multidisciplinary reviews at critical staging or control points. Each cycle of the model culminates with a technical review that assesses the status, progress, maturity, merits, risk, of development efforts to date; resolves critical operational and/or technical issues (COIs/CTIs); and reviews plans and identifies COIs/CTIs to be resolved for the next iteration of the spiral. Subsequent implementations of the spiral may involve lower level spirals that follow the same quadrant paths and decision considerations.

SIDDHARTH INSTITUTE OF PG STUDIES Page | 22

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

SIDDHARTH INSTITUTE OF PG STUDIES Page | 23

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

HARDWARE CONFIGURATION

Hard disk RAM Processor Monitor

: : : :

40 GB 512mb Pentium IV 17Color Monitor

SOFTWARE CONFIGURATION

Front End Tools Operating System Back End

: : : :

Java , Swing ,JDBC Netbeans IDE6.7 Windows XP. MS Access

SIDDHARTH INSTITUTE OF PG STUDIES Page | 24

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

SIDDHARTH INSTITUTE OF PG STUDIES Page | 25

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

JAVA TECHNOLOGY
Java technology is both a programming language and a platform.

THE JAVA PROGRAMMING LANGUAGE


THE JAVA
PROGRAMMING LANGUAGE IS A HIGH - LEVEL LANGUAGE THAT CAN BE CHARACTERIZED BY ALL OF THE FOLLOWING BUZZWORDS:

Simple Architecture neutral Object oriented Portable Distributed High performance Interpreted Multithreaded Robust Dynamic Secure
With most programming languages, you either compile or interpret a program so that you can run it on your computer. The Java programming language is unusual in that a program is both compiled and interpreted. With the compiler, first you translate a program into an intermediate language called Java byte codes the platform-independent codes interpreted by the interpreter on the Java platform. The interpreter parses and runs each Java byte code
SIDDHARTH INSTITUTE OF PG STUDIES Page | 26

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

instruction on the computer. Compilation happens just once; interpretation occurs each time the program is executed. The following figure illustrates how this works.

You can think of Java byte codes as the machine code instructions for the Java Virtual Machine (Java VM). Every Java interpreter, whether its a development tool or a Web browser that can run applets, is an implementation of the Java VM. Java byte codes help make write once, run anywhere possible. You can compile your program into byte codes on any platform that has a Java compiler. The byte codes can then be run on any implementation of the Java VM. That means that as long as a computer has a Java VM, the same program written in the Java programming language can run on Windows 2000, a Solaris workstation, or on an iMac.

THE JAVA PLATFORM

SIDDHARTH INSTITUTE OF PG STUDIES Page | 27

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks


A platform is the hardware or software environment in which a program runs.

Weve already mentioned some of the most popular platforms like Windows 2000, Linux, Solaris, and MacOS. Most platforms can be described as a combination of the operating system and hardware. The Java platform differs from most other platforms in that its a software-only platform that runs on top of other hardware-based platforms. The Java platform has two components:

The Java Virtual Machine (Java VM) The Java Application Programming Interface (Java API)

Youve already been introduced to the Java VM. Its the base for the Java platform and is ported onto various hardware-based platforms. The Java API is a large collection of ready-made software components that provide many useful capabilities, such as graphical user interface (GUI) widgets. The Java API is grouped into libraries of related classes and interfaces; these libraries are known as packages. The next section, What Can Java Technology Do? Highlights what functionality some of the packages in the Java API provide. The following figure depicts a program thats running on the Java platform. As the figure shows, the Java API and the virtual machine insulate the program from the hardware.

Native code is code that after you compile it, the compiled code runs on a specific hardware platform. As a platform-independent environment, the Java platform can be a bit slower than native code. However, smart compilers, well-tuned interpreters, and justin-time byte code compilers can bring performance close to that of native code without threatening portability.

WHAT CAN JAVA TECHNOLOGY DO?


SIDDHARTH INSTITUTE OF PG STUDIES Page | 28

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

The most common types of programs written in the Java programming language are applets and applications. If youve surfed the Web, youre probably already familiar with applets. An applet is a program that adheres to certain conventions that allow it to run within a Java-enabled browser. However, the Java programming language is not just for writing cute, entertaining applets for the Web. The general-purpose, high-level Java programming language is also a powerful software platform. Using the generous API, you can write many types of programs. An application is a standalone program that runs directly on the Java platform. A special kind of application known as a server serves and supports clients on a network. Examples of servers are Web servers, proxy servers, mail servers, and print servers. Another specialized program is a servlet. A servlet can almost be thought of as an applet that runs on the server side. Java Servlets are a popular choice for building interactive web applications, replacing the use of CGI scripts. Servlets are similar to applets in that they are runtime extensions of applications. Instead of working in browsers, though, servlets run within Java Web servers, configuring or tailoring the server. How does the API support all these kinds of programs? It does so with packages of software components that provides a wide range of functionality. Every full implementation of the Java platform gives you the following features:

The essentials: Objects, strings, threads, numbers, input and output, data Applets: The set of conventions used by applets. Networking: URLs, TCP (Transmission Control Protocol), UDP (User Internationalization: Help for writing programs that can be localized for

structures, system properties, date and time, and so on.


Data gram Protocol) sockets, and IP (Internet Protocol) addresses.

users worldwide. Programs can automatically adapt to specific locales and be displayed in the appropriate language.

Security: Both low level and high level, including electronic signatures,

public and private key management, access control, and certificates.

SIDDHARTH INSTITUTE OF PG STUDIES Page | 29

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks


Software components: Known as JavaBeansTM, can plug into existing Object serialization: Allows lightweight persistence and communication Java Database Connectivity (JDBCTM): Provides uniform access to a wide component architectures.

via Remote Method Invocation (RMI).

range of relational databases. The Java platform also has APIs for 2D and 3D graphics, accessibility, servers, collaboration, telephony, speech, animation, and more. The following figure depicts what is included in the Java 2 SDK.

HOW WILL JAVA TECHNOLOGY CHANGE MY LIFE?


We cant promise you fame, fortune, or even a job if you learn the Java programming language. Still, it is likely to make your programs better and requires less effort than other languages. We believe that Java technology will help you do the following:

Get started quickly: Although the Java programming language is a

powerful object-oriented language, its easy to learn, especially for programmers already familiar with C or C++.

SIDDHARTH INSTITUTE OF PG STUDIES Page | 30

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks


Write less code: Comparisons of program metrics (class counts, method counts, and so on) suggest that a program written in the Java programming language can be four times smaller than the same program in C++.

Write better code: The Java programming language encourages good

coding practices, and its garbage collection helps you avoid memory leaks. Its object orientation, its JavaBeans component architecture, and its wide-ranging, easily extendible API let you reuse other peoples tested code and introduce fewer bugs.

Develop programs more quickly: Your development time may be as much

as twice as fast versus writing the same program in C++. Why? You write fewer lines of code and it is a simpler programming language than C++.

Avoid platform dependencies with 100% Pure Java: You can keep your

program portable by avoiding the use of libraries written in other languages. The 100% Pure JavaTM Product Certification Program has a repository of historical process manuals, white papers, brochures, and similar materials online.

Write once, run anywhere: Because 100% Pure Java programs are

compiled into machine-independent byte codes, they run consistently on any Java platform.

Distribute software more easily: You can upgrade applets easily from a

central server. Applets take advantage of the feature of allowing new classes to be loaded on the fly, without recompiling the entire program.

Finally we decided to proceed the implementation using Java Networking. And for dynamically updating the cache table we go for MS Access database. JAVA HA TWO THINGS: A PROGRAMMING LANGUAGE AND A PLATFORM. JAVA IS A HIGH-LEVEL PROGRAMMING LANGUAGE THAT IS ALL OF THE FOLLOWING

SIMPLE

ARCHITECTURE-NEUTRAL

SIDDHARTH INSTITUTE OF PG STUDIES Page | 31

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks


OBJECT-ORIENTED DISTRIBUTED INTERPRETED ROBUST SECURE PORTABLE HIGH-PERFORMANCE
MULTITHREADED

DYNAMIC

JAVA WITH JAVA

IS ALSO UNUSUAL IN THAT EACH

JAVA

PROGRAM IS BOTH COMPILED AND INTERPRETED.

A COMPILE YOU TRANSLATE A

JAVA

PROGRAM INTO AN INTERMEDIATE LANGUAGE CALLED

BYTE CODES THE PLATFORM-INDEPENDENT CODE INSTRUCTION IS PASSED AND RUN ON THE

COMPUTER.

COMPILATION
EXECUTED.

HAPPENS JUST ONCE; INTERPRETATION OCCURS EACH TIME THE PROGRAM IS

THE FIGURE ILLUSTRATES HOW THIS WORKS.

Java Program

Interpreter

Compilers

My Program

SIDDHARTH INSTITUTE OF PG STUDIES Page | 32

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

YOU

CAN THINK OF

JAVA

BYTE CODES AS THE MACHINE CODE INSTRUCTIONS FOR THE INTERPRETER, WHETHER ITS A

JAVA

VIRTUAL MACHINE (JAVA VM). EVERY JAVA


TOOL OR A

JAVA

DEVELOPMENT

WEB BROWSER THAT CAN RUN JAVA APPLETS, IS AN IMPLEMENTATION OF THE JAVA VM.

THE JAVA VM CAN ALSO BE IMPLEMENTED IN HARDWARE.

JAVA
YOUR

BYTE CODES HELP MAKE

WRITE

ONCE, RUN ANYWHERE POSSIBLE.

YOU

CAN COMPILE

JAVA

PROGRAM INTO BYTE CODES ON MY PLATFORM THAT HAS A

JAVA

COMPILER.

THE

BYTE

CODES CAN THEN BE RUN ANY IMPLEMENTATION OF THE PROGRAM CAN RUN

JAVA VM. FOR EXAMPLE, THE SAME JAVA

WINDOWS NT, SOLARIS, AND MACINTOSH.

SWINGS:

This introduction to using Swing in Java will walk you through the basics of Swing. This covers topics of how to create a window, add controls, postion the controls, and handle events from the controls. SIDDHARTH INSTITUTE OF PG STUDIES Page | 33

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

The Main Window


Almost all GUI applications have a main or top-level window. In Swing, such window is usually instance of JFrame or JWindow. The difference between those two classes is in simplicity JWindow is much simpler than JFrame (most noticeable are visual differences - JWindow does not have a title bar, and does not put a button in the operating system task bar). So, your applications will almost always start with a JFrame. Though you can instantiate a JFrame and add components to it, a good practice is to encapsulate and group the code for a single visual frame in a separate class. Usually, I subclass the JFrame and initialize all visual elements of that frame in the constructor. Always pass a title to the parent class constructor that String will be displayed in the title bar and on the task bar. Also, remember to always initialize frame size (by calling setSize(width,height)), or your frame will not be noticeable on the screen. package com.neuri.handsonswing.ch1; import javax.swing.JFrame; public class MainFrame extends JFrame { public MainFrame() { super("My title"); setSize(300, 300); } } Now you have created your first frame, and it is time to display it. Main frame is usually displayed from the main method but resist the urge to put the main SIDDHARTH INSTITUTE OF PG STUDIES Page | 34

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

method in the frame class. Always try to separate the code that deals with visual presentation from the code that deals with application logic starting and initializing the application is part of application logic, not a part of visual presentation. A good practice is to create an Application class that will contain initialization code. package com.neuri.handsonswing.ch1; public class Application { public static void main(String[] args) { // perform any initialization MainFrame mf = new MainFrame(); mf.show(); } } If you run the code now, you will see an empty frame. When you close it, something not quite obvious will happen (or better said, will not happen). The application will not end. Remember that the Frame is just a visual part of application, not application logic if you do not request application termination when the window closes, your program will still run in the background (look for it in the process list). To avoid this problem, add the following line to the MainFrame constructor:

SIDDHARTH INSTITUTE OF PG STUDIES Page | 35

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);

Before Java2 1.3, you had to register a window listener and then act on the window closing event by stopping the application. Since Java2 1.3, you can specify a simple action that will happen when a window is closed with this shortcut. Other options are HIDE_ON_CLOSE (the default window is closed but application still runs) and DO_NOTHING_ON_CLOSE (rather strange option that ignores a click on the X button in the upper right corner).

Adding Components
Now is the time to add some components to the window. In Swing (and the Swing predecessor, AWT) all visual objects are subclasses of Component class. The Composite pattern was applied here to group visual objects into Containers, special components that can contain other components. Containers can specify the order, size and position of embedded components (and this can all be automatically calculated, which is one of the best features of Swing). JButton is a component class that represents a general purpose button it can have a text caption or an icon, and can be pressed to invoke an action. Lets add the button to the frame (note: add imports for javax.swing.* and java.awt.* to the MainFrame source code so that you can use all the components). When you work with JFrame, you want to put objects into its content pane special container intended to hold the window contents. Obtain the reference to that container with the getContentPane() method. Container content = getContentPane(); content.add(new JButton("Button 1"));

SIDDHARTH INSTITUTE OF PG STUDIES Page | 36

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

If you try to add more buttons to the frame, most likely only the last one added will be displayed. That is because the default behavior of JFrame content pane is to display a single component, resized to cover the entire area.

Grouping Components
To put more than one component into a place intended for a single component, group them into a container. JPanel is a general purpose container, that is perfect for grouping a set of components into a larger component. So, lets put the buttons into a JPanel: JPanel panel=new JPanel(); panel.add(new JButton("Button 1")); panel.add(new JButton("Button 2")); panel.add(new JButton("Button 3")); content.add(panel);

SIDDHARTH INSTITUTE OF PG STUDIES Page | 37

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

Layout Management Basics


One of the best features of Swing is automatic component positioning and resizing. That is implemented trough a mechanism known as Layout management. Special objects layout managers are responsible for sizing, aligning and positioning components. Each container can have a layout manager, and the type of layout manager determines the layout of components in that container. There are several types of layout managers, but the two you will most frequently use are FlowLayout (orders components one after another, without resizing) and BorderLayout (has a central part and four edge areas component in the central part is resized to take as much space as possible, and components in edge areas are not resized). In the previous examples, you have used both of them. FlowLayout is the default for a JPanel (that is why all three buttons are displayed without resizing), and BorderLayout is default for JFrame content panes (that is why a single component is shown covering the entire area). Layout for a container is defined using the setLayout method (or usually in the constructor). So, you could change the layout of content pane to FlowLayout and add several components, to see them all on the screen.

SIDDHARTH INSTITUTE OF PG STUDIES Page | 38

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

The best choice for the window content pane is usually a BorderLayout with a central content part and a bottom status (or button) part. The top part can contain a toolbar, optionally. Now, introduce lets a combine and new layouts, component several and

components

JTextArea. JTextArea is basically a multiline editor. Initialize the frame content pane explicitly to into BorderLayout, put a new JTextArea the central part and move the button panel below. package com.neuri.handsonswing.ch1; import java.awt.*; import javax.swing.*; public class MainFrame extends JFrame { public MainFrame() { super("My title"); setSize(300,300); setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); Container content = getContentPane(); content.setLayout(new BorderLayout()); JPanel panel = new JPanel(new FlowLayout()); panel.add(new JButton("Button 1")); panel.add(new JButton("Button 2")); panel.add(new JButton("Button 3")); content.add(panel, BorderLayout.SOUTH); SIDDHARTH INSTITUTE OF PG STUDIES Page | 39

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

content.add(new JTextArea(), BorderLayout.CENTER); } }

Notice that the layouts for content pane and the button panel are explicitly defined. Also notice the last two lines of code this is the other version of add method, which allows you to specify the way the component is added. In this case, we specify the area of BorderLayout layout manager. Central part is called BorderLayout.CENTER, and other areas are called BorderLayout.NORTH (top), BorderLayout.SOUTH (bottom), BorderLayout.WEST (left) and BorderLayout.EAST (right). If you get confused about this, just remember land-maps from your geography classes.

JAVA DATABASE CONNECTIVITY (JDBC)

JDBC AND ODBC IN JAVA:


MOST
POPULAR AND WIDELY ACCEPTED DATABASE CONNECTIVITY CALLED

OPEN JAVA

DATABASE CONNECTIVITY (ODBC) IS USED TO ACCESS THE RELATIONAL DATABASES. IT OFFERS THE
ABILITY TO CONNECT TO ALMOST ALL THE DATABASES ON ALMOST ALL PLATFORMS.

APPLICATIONS CAN ALSO USE THIS

ODBC

TO COMMUNICATE WITH A DATABASE.

THEN

WE NEED

JDBC WHY? THERE ARE SEVERAL REASONS: ODBC API CALLS


WAS COMPLETELY WRITTEN IN FROM

LANGUAGE AND IT MAKES AN EXTENSIVE USE

OF POINTERS.

JAVA

TO NATIVE

CODE HAVE A NUMBER OF DRAWBACKS IN THE

SECURITY, IMPLEMENTATION, ROBUSTNESS AND AUTOMATIC PORTABILITY OF APPLICATIONS.

ODBC

IS HARD TO LEARN.

IT

MIXES SIMPLE AND ADVANCED FEATURES TOGETHER, AND IT

HAS COMPLEX OPTIONS EVEN FOR SIMPLE QUERIES.

ODBC DRIVERS MUST BE INSTALLED ON CLIENTS MACHINE.

SIDDHARTH INSTITUTE OF PG STUDIES Page | 40

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

ARCHITECTURE OF JDBC:
JDBC ARCHITECTURE CONTAINS THREE LAYERS:

JDBC Application

JDBC Drivers

JDBC Drivers

APPLICATION LAYER:
JAVA
TABLE. PROGRAM WANTS TO GET A CONNECTION TO A DATABASE.

IT

NEEDS THE INFORMATION FROM THE

DATABASE TO DISPLAY ON THE SCREEN OR TO MODIFY THE EXISTING DATA OR TO INSERT THE DATA INTO THE

DRIVER MANAGER: THE


CONNECTION-REQUEST FORM.

LAYER IS THE BACKBONE OF THE

JDBC

ARCHITECTURE.

WHEN

IT RECEIVES A

THE JDBC APPLICATION LAYER: IT TRIES

TO FIND THE APPROPRIATE DRIVER BY ITERATING THROUGH ALL THE

AVAILABLE DRIVERS, WHICH ARE CURRENTLY REGISTERED WITH

DEVICE MANAGER. AFTER

FINDING OUT THE

RIGHT DRIVER IT CONNECTS THE APPLICATION TO APPROPRIATE DATABASE.

JDBC DRIVER LAYERS: THIS

LAYER ACCEPTS THE

SQL CALLS FROM THE APPLICATION AND CONVERTS THEM A JDBC DRIVER IS RESPONSIBLE FOR ENSURING THAT

INTO NATIVE CALLS TO THE DATABASE AND VICE-VERSA.

AN APPLICATION HAS CONSISTENT AND UNIFORM M ACCESS TO ANY DATABASE.

SIDDHARTH INSTITUTE OF PG STUDIES Page | 41

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks


JDBC
DRIVER PASSES THE REQUEST TO THE

WHEN A REQUEST RECEIVED BY THE APPLICATION, THE

ODBC JDBC

DRIVER, THE

ODBC

DRIVER COMMUNICATES WITH THE DATABASE AND SENDS THE REQUEST AND GETS

THE RESULTS.

THE RESULTS WILL BE PASSED TO THE JDBC DRIVER AND IN TURN TO THE APPLICATION. SO, THE ODBC AND GET THE RESULTS FROM THE ODBC.

DRIVER HAS NO KNOWLEDGE ABOUT THE ACTUAL DATABASE, IT KNOWS HOW TO PASS THE APPLICATION

REQUEST O THE

THE JDBC
AND THE

AND

ODBC

INTERACT WITH EACH OTHER, HOW?

THE

REASON IS BOTH THE

JDBC API

ODBC ARE BUILT ON AN INTERFACE CALLED CALL LEVEL INTERFACE (CLI). BECAUSE OF THIS REASON JDBC DRIVER TRANSLATES THE REQUEST TO AN ODBC CALL. THE ODBC THEN CONVERTS THE REQUEST THE
RESULTS OF THE REQUEST ARE THEN FED BACK THROUGH THE

AGAIN AND PRESENTS IT TO THE DATABASE. SAME CHANNEL IN REVERSE.

ABOUT MS-ACCESS : What is a Database?


A database is a collection of information that's related to a particular subject or purpose, such as tracking customer orders or maintaining a music collection. If your database isn't stored on a computer, or only parts of it are, you may be tracking information from a variety of sources that you're having to coordinate and organize yourself. Using Microsoft Access, you can manage all your information from a single database file. Within the file, divide your data into separate storage containers called tables; view, add, and update table data by using online forms; find and retrieve just the data you want by using queries; and analyze or print data in a specific layout by using reports. Allow users to view, update, or analyze the database's data from the Internet or an intranet by creating data access pages. To store your data, create one table for each type of information that you track. To bring the data from multiple tables together in a query, form, report, or data access page, define relationships between the tables.
SIDDHARTH INSTITUTE OF PG STUDIES Page | 42

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

To find and retrieve just the data that meets conditions that you specify, including data from multiple tables, create a query. A query can also update or delete multiple records at the same time, and perform predefined or custom calculations on your data. To easily view, enter, and change data directly in a table, create a form. When you open a form, Microsoft Access retrieves the data from one or more tables, and displays it on the screen with the layout you choose in the Form Wizard, or a layout that you create from scratch. To analyze your data or present it a certain way in print, create a report. For example, you might print one report that groups data and calculates totals, and another report with different data formatted for printing mailing labels. To make data available on the Internet or an intranet for interactive reporting, data entry, or data analysis, use a data access page. Microsoft Access retrieves the data from one or more tables and displays it on the screen with the layout you choose in the Page Wizard, or a layout that you create from scratch. Users can interact with the data by using features on the data access page.

Tables: What they are and how they work


A table is a collection of data about a specific topic, such as products or suppliers. Using a separate table for each topic means that you store that data only once, which makes your database more efficient, and reduces data-entry errors.Tables organize data into columns (called fields) and rows (called records). A common field relates two tables so that Microsoft Access can bring together the data from the two tables for viewing, editing, or printing. In table Design view, you can create an entire table from scratch, or add, delete, or customize the fields in an existing table. In table Datasheet view, you can add, edit, view, or otherwise work with the data in a table. You can also display records from tables that are related to the current table by displaying
SIDDHARTH INSTITUTE OF PG STUDIES Page | 43

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

subdatasheets within the main datasheet. With some restrictions, you can work with the data in subdatasheets in many of the same ways that you work with data in the main datasheet.

Queries: What they are and how they work


You use queries to view, change, and analyze data in different ways. You can also use them as the source of records for forms, reports, and data access pages. The most common type of query is a select query. A select query retrieves data from one or more tables by using criteria you specify and then displays it in the order you want.

Forms: What they are and how they work


You can use forms for a variety of purposes.Most of the information in a form comes from an underlying record source. Other information in the form is stored in the form's design. You create the link between a form and its record source by using graphical objects called controls. The most common type of control used to display and enter data is a text box.

Modules: What they are and how they work WHAT IS A MODULE?
A module is a collection of Visual Basic for Applications declarations and procedures that are stored together as a unit. There are two basic types of modules: class modules and standard modules. Each procedure in a module can be a Function procedure or a Sub procedure.

CLASS MODULES
Form and report modules are class modules that are associated with a particular form or report. Form and report modules often contain event procedures that run in response to an event on the form or report. You can use event procedures to control the behavior of your forms and reports, and their response to user actions, such as clicking the mouse on a command button.
SIDDHARTH INSTITUTE OF PG STUDIES Page | 44

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

When you create the first event procedure for a form or report, Microsoft Access automatically creates an associated form or report module. Procedures in your form and report modules can call procedures you have added to standard modules. In Access 95, class modules exist in association only with a form or report. In Access 97 or later, class modules can also exist independent of a form or report, and this type of class module is listed in Modules under Objects on the Database window. You can use a class module in Modules to create a definition for a custom object.

STANDARD MODULES
Standard modules contain general procedures that aren't associated with any other object and frequently used procedures that can be run from anywhere within your database. You can view the list of standard modules in your database by clicking Modules under Objects in the Database window. Form, report, and standard modules are also listed in the Object Browser.

WHY DEFINE RELATIONSHIPS?


After you've set up different tables for each subject in your Microsoft Access database, you need a way of telling Microsoft Access how to bring that information back together again. The first step in this process is to define relationships between your tables. After you've done that, you can create queries, forms, and reports to display information from several tables at once. For example, this form includes information from five tables:

HOW DO RELATIONSHIPS WORK?


In the previous example, the fields in five tables must be coordinated so that they show information about the same order. This coordination is accomplished with relationships between tables. A relationship works by matching data in key fields usually a field with the same name in both tables. In most cases, these matching fields are the primary key from one table, which provides a unique identifier for each record, and a foreign key in the other table. For
SIDDHARTH INSTITUTE OF PG STUDIES Page | 45

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

example, employees can be associated with orders they're responsible for by creating a relationship between the Employees table and the Orders table using the EmployeeID fields.

A ONE-TO-MANY RELATIONSHIP
A one-to-many relationship is the most common type of relationship. In a one-to-many relationship, a record in Table A can have many matching records in Table B, but a record in Table B has only one matching record in Table A.

A MANY-TO-MANY RELATIONSHIP
In a many-to-many relationship, a record in Table A can have many matching records in Table B, and a record in Table B can have many matching records in Table A. This type of relationship is only possible by defining a third table (called a junction table) whose primary key consists of two fields the foreign keys from both Tables A and B. A many-to-many relationship is really two one-to-many relationships with a third table. For example, the Orders table and the Products table have a many-to-many relationship that's defined by creating two one-to-many relationships to the Order Details table.

A ONE-TO-ONE RELATIONSHIP
In a one-to-one relationship, each record in Table A can have only one matching record in Table B, and each record in Table B can have only one matching record in Table A. This type of relationship is not common, because most information related in this way would be in one table. You might use a one-to-one relationship to divide a table with many fields, to isolate part of a table for security reasons, or to store information that applies only to a subset of the main table. For example, you might want to create a table to track employees participating in a fundraising soccer game.

DEFINING RELATIONSHIPS

SIDDHARTH INSTITUTE OF PG STUDIES Page | 46

Y9MC93007 YOU RELATIONSHIPS

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks


DEFINE A RELATIONSHIP BY ADDING THE TABLES THAT YOU WANT TO RELATE TO THE WINDOW, AND THEN DRAGGING THE KEY FIELD FROM ONE TABLE AND DROPPING IT ON THE KEY

FIELD IN THE OTHER TABLE.

YOU CAN ALSO DEFINE RELATIONSHIPS BY USING THE KEYBOARD.

The kind of relationship that Microsoft Access creates depends on how the related fields are defined:

A one-to-many relationship is created if only one of the related fields is a primary key or

has a unique index.

A one-to-one relationship is created if both of the related fields are primary keys or have

unique indexes.

A many-to-many relationship is really two one-to-many relationships with a third table

whose primary key consists of two fields the foreign keys from the two other tables.

SIDDHARTH INSTITUTE OF PG STUDIES Page | 47

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

The purpose of the design phase is to plan a solution of the problem specified by the requirement document. This phase is the first step in moving from problem domain to the solution domain. The design of a system is perhaps the most critical factor affecting the quality of the software, and has a major impact on the later phases, particularly testing and maintenance. The output of this phase is SIDDHARTH INSTITUTE OF PG STUDIES Page | 48

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

the design document. This document is similar to a blue print or plan for the solution, and is used later during implementation, testing and maintenance. The design activity is often divided into two separate phase-system design and detailed design. System design, which is sometimes also called top-level design, aims to identify the modules that should be in the system, the specifications of these modules, and how they interact with each other to produce the desired results. At the end of system design all the major data structures, file formats, output formats, as well as the major modules in the system and their specifications are decided. During detailed design the internal logic of each of the modules specified in system design is decided. During this phase further details of the data structures and algorithmic design of each of the modules is specified. The logic of a module is usually specified in a high-level design description language, which is independent of the target language in which the software will eventually be implemented. In system design the focus is on identifying the modules, whereas during detailed design the focus is on designing the logic for each of the modules. In other words, in system design the attention is on what components are needed, while in detailed design how the components can be implemented in software is the issue.

During the design phase, often two separate documents are produced. One for the system design and one for the detailed design . Together, these documents completely specify the design of the system. That is they specify the different modules in the system and internal logic of each of the modules.

SIDDHARTH INSTITUTE OF PG STUDIES Page | 49

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

A design methodology is a systematic approach to creating a design by application of set of techniques and guidelines. Most methodologies focus on system design. The two basic principles used in any design methodology are problem partitioning and abstraction. A large system cannot be handled as a whole, and so for design it is partitioned into smaller systems. Abstraction is a concept related to problem partitioning. When partitioning is used during design, the design activity focuses on one part of the system at a time. Since the part being designed interacts with other parts of the system, a clear understanding of the interaction is essential for properly designing the part. For this, abstraction is used. An abstraction of a system or a part defines the overall behavior of the system at an abstract level without giving the internal details. While working with the part of a system, a designer needs to understand only the abstractions of the other parts with which the part being designed interacts. The use of abstraction allows the designer to practice the "divide and conquer" technique effectively by focusing one part at a time, without worrying about the details of other parts.Like every other phase, the design phase ends with verification of the design. If the design is not specified in some executable language, the verification has to be done by evaluating the design documents. One way of doing this is thorough reviews. Typically, at least two design reviews are held-one for the system design and one for the detailed and one for the detailed design.

Software Development Life Cycle


This document play a vital role in the development of life cycle (SDLC) as it describes the complete requirement of the system. It means for use by developers SIDDHARTH INSTITUTE OF PG STUDIES Page | 50

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

and will be the basic during testing phase. Any changes made to the requirements in the future will have to go through formal change approval process. The trends of increasing technical complexity of the systems, coupled with the need for repeatable and predictable process methodologies, have driven System Developers to establish system development models or software development life cycle models. Nearly three decades ago the operations in an organization used to be limited and so it was possible to maintain them using manual procedures. But with the growing operations of organizations, the need to automate the various activities increased, since for manual procedures it was becoming very difficult, slow and complicated. Like maintaining records for a thousand plus employees company on papers is definitely a cumbersome job. So, at that time more and more companies started going for automation. Since there were a lot of organizations, which were opting for automation, it was felt that some standard and structural procedure or methodology be introduced in the industry so that the transition from manual to automated system became easy. The concept of system life cycle came into existence then. Life cycle model emphasized on the need to follow some structured approach towards building new or improved system. There were many models suggested. A waterfall model was among the very first models that came into existence. Later on many other models like prototype, rapid application development model, etc were also introduced. System development begins with the recognition of user needs. Then there is a preliminary investigation stage. It includes evaluation of present system, information gathering, feasibility study, and request approval. Feasibility study includes technical, economic, legal and operational feasibility. In economic SIDDHARTH INSTITUTE OF PG STUDIES Page | 51

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

feasibility cost-benefit analysis is done. After that, there are detailed design, implementation, testing and maintenance stages. In this session, we'll be learning about various stages that make system's life cycle. In addition, different life cycles models will be discussed. These include Waterfall model, Prototype model, Object-Oriented Model, spiral model and Dynamic Systems Development Method (DSDM).

Object Oriented Analysis:


An object-oriented system is composed of objects. The behavior of the system is achieved through collaboration between these objects, and the state of the system is the combined state of all the objects in it. Collaboration between objects involves them sending messages to each other. The exact semantics of message sending between objects varies depending on what kind of system is being modeled. In some systems, "sending a message" is the same as "invoking a method". Object Oriented Analysis aims to model the problem domain, the problem we want to solve by developing an object-oriented (OO)System The source of the analysis is a written requirement statements, and/or written use cases, UML diagrams can be used to illustrate the statements . An analysis model will not take into account implementation constraints, such as concurrency, distribution, persistence, or inheritance, nor how the system will be built The model of a system can be divided into multiple domains each of which are separately analyzed, and represent separate business, technological, or conceptual areas of interest The result of objectoriented analysis is a description of what is to be built, using concepts and relationships between concepts, often expressed as a conceptual model. Any other documentation that is needed to describe what is to be built, is also included in the result of the analysis. That can include a
SIDDHARTH INSTITUTE OF PG STUDIES Page | 52

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

detailed user interface mock-up document The implementation constraints are decided during the object-oriented design (OOD) process

OBJECT ORIENTED DESIGN


Object-Oriented Design (OOD) is an activity where the designers are looking for logical solutions to solve a problem, using Objects Object-oriented design takes the conceptual model that is the result of object-oriented analysis, and adds implementation constraints imposed by the environment, the programming language and the chosen tools, as well as architectural assumptions chosen as basis of Design The concepts in the conceptual model are mapped to concrete classes, to abstract interfaces in APIs and to roles that the objects take in various situations. The interfaces and their implementations for stable concepts can be made available as reusable services. Concepts identified as unstable in object-oriented analysis will form basis for policy classes that make decisions, implement environment-specific or situation specific logic or algorithms The result of the object-oriented design is a detail description how the system can be built, using objects .Object-oriented software engineering (OOSE) is an object modeling language and Methodology OOSE was developed by Ivar Jacobson in 1992 while at Objectory AB. It is the first objectoriented design methodology to employ use cases to drive software design. It also uses other design products similar to those used by OMT

SIDDHARTH INSTITUTE OF PG STUDIES Page | 53

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

The tool Objectory was created by the team at Objectory AB to implement the OOSE methodology. After success in the marketplace, other tool vendors also supported OOSE After Rational bought Objectory AB, the OOSE notation, methodology, and tools became superseded

As one of the primary sources of the Unified Modeling Language (UML), concepts and notation from OOSE have been incorporated into UML

The methodology part of OOSE has since evolved into the Rational Unified Process (RUP)

The OOSE tools have been replaced by tools supporting UML and RUP

OOSE has been largely replaced by the UML notation and by the RUP methodology

Unified Modeling Language


The heart of object-oriented problem solving is the construction of a model. The model abstracts the essential details of the underlying problem from its usually complicated real world. Several modeling tools are wrapped under the heading of the UML, which stands for Unified Modeling Language. The purpose of this course is to present important highlights of the UML. At the center of the UML are its nine kinds of modeling diagrams, which we describe here.

Use case diagrams Class diagrams Object diagrams Sequence diagrams Collaboration diagrams Statechart diagrams Activity diagrams Component diagrams Deployment diagrams

Some of the sections of this course contain links to pages with more detailed information. And every section has short questions. Use them to test your understanding of the section topic. SIDDHARTH INSTITUTE OF PG STUDIES Page | 54

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

Why is UML important?


Let's look at this question from the point of view of the construction trade. Architects design buildings. Builders use the designs to create buildings. The more complicated the building, the more critical the communication between architect and builder. Blueprints are the standard graphical language that both architects and builders must learn as part of their trade. Writing software is not unlike constructing a building. The more complicated the underlying system, the more critical the communication among everyone involved in creating and deploying the software. In the past decade, the UML has emerged as the software blueprint language for analysts, designers, and programmers alike. It is now part of the software trade. The UML gives everyone from business analyst to designer to programmer a common vocabulary to talk about software design. The UML is applicable to object-oriented problem solving. Anyone interested in learning UML must be familiar with the underlying tenet of object-oriented problem solving -- it all begins with the construction of a model. A model is an abstraction of the underlying problem. The domain is the actual world from which the problem comes. Models consist of objects that interact by sending each other messages. Think of an object as "alive." Objects have things they know (attributes) and things they can do (behaviors or operations). The values of an object's attributes determine its state. Classes are the "blueprints" for objects. A class wraps attributes (data) and behaviors (methods or functions) into a single distinct entity. Objects are instances of classes.

. Group Business Term Accounting Periods Definition A defined period of time whereby performance reports may be extracted. (normally 4 week periods).

SIDDHARTH INSTITUTE OF PG STUDIES Page | 55

Y9MC93007
Technical Association

Managing Multidimensional Historical Aggregate


A relationship between two or more entities. Implies a connection of some type - for example one entity uses the services of another, or one entity is connected to another over a network link.

Data in Unstructured P2P Networks

Technical

Class

A logical entity encapsulating data and behavior. A class is a template for an object - the class is the design, the object the runtime instance. The component model provides a detailed view of the various hardware and software components that make up the proposed system. It shows both where these components reside and how they inter-relate with other components. Component requirements detail what responsibilities a component has to supply functionality or behavior within the system. A person or a company that requests An entity to transport goods on their behalf.

Technical

Component Model

Business

Customer

Technical

Deployment Architecture A view of the proposed hardware that will make up the new system, together with the physical components that will execute on that hardware. Includes specifications for machine, operating system, network links, backup units &etc. Deployment Model Extends Relationship A model of the system as it will be physically deployed A relationship between two use cases in which one use case 'extends' the behavior of another. Typically this represents optional behavior in a use case scenario - for example a user may optionally request a list or report

Technical Technical

SIDDHARTH INSTITUTE OF PG STUDIES Page | 56

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

Technical

Includes Relationship

A relationship between two use cases in which one use case 'includes' the behavior. This is indicated where there a specific business use cases which are used from many other places - for example updating a train record may be part of many larger business processes. A Use Case represents a discrete unit of interaction between a user (human or machine) and the system. A Use Case is a single unit of meaningful work; for example creating a train, modifying a train and creating orders are all Use Cases.Each Use Case has a description which describes the functionality that will be built in the proposed system. A Use Case may 'include' another Use Case's functionality or 'extend' another Use Case with its own behavior.Use Cases are typically related to 'actors'. An actor is a human or machine entity that interacts with the system to perform meaningful work.

Technical

Use Case

1.1

ACTORS

Actors are the users of the system being modeled. Each Actor will have a well-defined role, and in the context of that role have useful interactions with the system.

A person may perform the role of more than one Actor, although they will only assume one role during one use case interaction.

SIDDHARTH INSTITUTE OF PG STUDIES Page | 57

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

An Actor role may be performed by a non-human system, such as another computer program.

EA 7.1 Unregistered Trial Version EA 7.1 Unregistered Trial Version EA 7.1 Unregistered Trial Version uc Actors EA 7.1 Unregistered Trial Version EA 7.1 Unregistered Trial Version EA 7.1 Unregistered Trial Version
Figure 2: Actors
SecuritySpecialist

EA 7.1 Unregistered Trial Version


Use Cases
Use case Diagrams represent the functionality of the system from a users point of view. Use cases are used during requirements elicitation and analysis to represent the functionality of the system. Use cases focus on the behavior of the system from external point of view. Actors are external entities that interact with the system. Examples of actors include users like administrator, bank customer etc., or another system like central database.

EA 7.1 Unregistered Trial Version EA 7.1 Unregistered Trial Version EA 7.1 Unregistered Trial Version EA 7.1 Unregistered Trial Version Unregistered Trial Version

Use-Cases: 7.1 EA

Developers. EA Machine user can; begin using the system this represents whichever method The 7.1 Unregistered Trial Version the user will use in order to make initial interaction with the system. For example, they may need

We have identified 2 actors in these diagrams, the actual Machine Users and the Unix

EA on via a button, simply turn the key Version to turn the system7.1 Unregistered Trial in the ignition or some over method. They
can also view a page, click on a link or back button, scroll up and down and close the system.

EA 7.1 Unregistered cases, Version The Microsoft Developer inherits all these useTrialas well as being able to upload an html file
and view a list of problems.
SIDDHARTH INSTITUTE OF PG STUDIES Page | 58

EA 7.1 Unregistered Trial Version EA 7.1 Unregistered Trial Version

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

Nd 2 oe

Sn ed

Pr n atitio

Mr e eg

R c iv ee e

Nd 2 oe

Class Diagram:

SIDDHARTH INSTITUTE OF PG STUDIES Page | 59

Y9MC93007
class System

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks


Thread ClientStart ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ + + + + + age: String = "" ans1: String = "" con: Connection = null fileName: String = "" objWriter: ObjectOutputStream password: String = "" stmt: Statement = null ts: Cli entSocket user: Users username: Stri ng = "" usernamet: String = "" ClientStart() ClientStart(ClientSocket) main(Stri ng[]) : void run() : void saveFile(Users) : void updateOnLineUsers(ArrayList) : voi d ~tss

JFrame TimeGraph ~ ~ ~ ~ ~ + + + + + con: Connection = null r: Random = new Random(12) rs: ResultSet = null stmt: Statement = null timeList: ArrayList = new ArrayList() totalValue: double = 0 createCombinedChart(ArrayList) : JFreeChart createDataset(ArrayList) : XYDataset main(String[]) : void TimeGraph(String) ~t_graph ~demo1

JFrame ActionListener Login ~ ~ ~ ~ ~ ~ ~ + + + c: Connect filepath: String j_panel: JPanel jb_Exit: JButton jb_Sign_up: JButton jl_title: JLabel msg: String actionPerformed(ActionEvent) : void Logi n() main(String[]) : voi d ~l

~ts Thread ClientSocket ~ ~ ~ ~ ~ ~ + cli Frm: ClientFrame formListMap: HashMap = new HashMap() port: int = 100 tcpClientSocket: ServerSocket = null tcpSocket: Socket = null tss: ClientStart userList: ArrayList = new ArrayList() ~clientSocket JFrame ClientFrame + + + ~cliFrm ClientSocket(ClientFrame) mai n(String[]) : void run() : void

~c Connect ~ ~ ~ ~ ~ ~ ~ ~ ~ ~connect ~ ~ ~ ~ ~ ~ JPanel ~ ActionListener ~ Observer + + + + br: BufferedReader = null cli entSocket: ClientSocket f: File fr: FileReader frame: Cl ientFrame ip: String IpAddresss: String l: Login mgs1: String ([]) = new String[100] msg: String obj Reader: ObjectInputStream obj Writer: ObjectOutputStream port: int pw: PrintWriter = nul l s: Socket user: Users userName: String Connect(Login) Connect() connectProxyServer(String, Stri ng, String) : void logoutServer(String) : void

JDial og UploadForm ~ ~ ~ ~ ~ ~ ~ ~ ~ + + + + + buf: byte ([]) button1: JButton button2: JButton button3: JButton cf: ClientFrame chartMSG: String = "" fd: Fil eDialog fin: Fi leInputStream label1: JLabel luser: Users objReader: Obj ectInputStream objWriter: Obj ectOutputStream panel1: JPanel suser: Users textField1: JTextField button1ActionPerformed(Acti onEvent) : void button2ActionPerformed(Acti onEvent) : void button3ActionPerformed(Acti onEvent) : void initComponents() : void main(String[]) : void sendFile() : void Upl oadForm(Frame, Users, Cl ientFrame) Upl oadForm(Frame) Upl oadForm(Dialog)

~uploadForm

~ ~ ~ ~ ~ ~ ~cf ~ ~ ~ + + + + + +

button1: JButton chart: JFreeChart clientLi st: Vector = new Vector() connect: Connect filename: String t_graph: TimeGraph temp: int = 0 timeList: ArrayList = new ArrayList() uploadForm: UploadForm username: String ClientFrame() ClientFrame(Connect, String) downl oadFile() : void downl oadRecov() : voi d initComponents() : void list1MouseClicked(MouseEvent) : void list2MouseClicked(MouseEvent) : void loadData(Users) : void saveFile(Users) : void

~frame

FileSplitter ~fileSplitter + + + + deleteCheckBox: JCheckBox fileIcon: ImageIcon frame: JFrame MERGE: String = "MERGE" {readOnly} progressBar: JProgressBar selectedFile: Fil e = null SPLIT: String = "SPLIT" {readOnl y} splitSourceFil eField: JTextField unitsCombo: JComboBox actionPerformed(ActionEvent) : void addMergePanel() : void addSplitPanel () : void addStatusPanel() : void centralizeFrame() : void createMergePanelComponents() : void createSplitPanelComponents() : void Fil eSplitter() getSplitSi ze(String, String) : long main(Stri ng[]) : void mergeFile() : void notValidMergeFields() : bool ean setLookAndFeel(JFrame, String) : void showDirChooser(JTextField) : void splitFile() : void update(Observable, Object) : void

SIDDHARTH INSTITUTE OF PG STUDIES Page | 60

Y9MC93007
class Class Model Serializable ClientFile + + + + + + + + + + + + + access_flag: String buffer: byte ([]) filecode: String fileName: String msg: String username: String ClientFile() getAccess_flag() : String getBuffer() : byte[] getFilecode() : String getFileName() : String getMsg() : String getUsername() : String setAccess_flag(String) : void setBuffer(byte[]) : void setFilecode(String) : void setFileName(String) : void setMsg(String) : void setUsername(String) : void

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

JFrame Runnable AdminServ erFrame ~ ~ + ~ ~ ~ ~ + + + + + + + numberOfUsers: int = 0 panel1: JPanel scrollPane1: JScrollPane server: Server textArea1: JT extArea timeList: ArrayList = new ArrayList() user: Users userEditForm: UserEditForm userSaveForm: UserSaveForm addOnLineUsers() : void AdminServerFrame() checkRemoteUser() : void ini tComponents() : void main(String[]) : void run() : void updateMainUserList() : void updateUserList() : void

Thread Serv er ~ ~ ~server ~ ~ + + + + + asf: AdminServerFrame tcpServerSocket: ServerSocket = null tcpSocket: Socket = null tss: ServerStart userList: ArrayList = new ArrayList() main(String[]) : void run() : void Server() Server(AdminServerFrame) ~ts

~asf

DataBase + + ~ + ~ + + + + con: Connection = null stmt: Statement = null totalRecord: int createNewUser(Users) : String DataBase() deleteFi le(String) : String deleteUser(Users) : String getFilesForUser(String) : ArrayList validateuploadRights(Users) : String

~user java.io.Serializable Users + + + + + + + + + + active: String = "Y" buffer: byte ([]) download_flag: String = "Y" fileList: ArrayList fileName: String ipAddress: String loginTime: String passWord: String port: int getDownload_flag() : String getFileList() : ArrayList getLoginTime() : String getParm() : String getPassWord() : String getPort() : int getUserName() : String setActive(String) : void setBuffer(byte[]) : void Serv erStart ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ + + + ~ + + age: String = "" fil eName: String = "" msg: String = "" objReader: ObjectInputStream objWriter: ObjectOutputStream password: String = "" sex: String = "" timeGraphVO: T imeGraphVO user: Users username: String = "" downLoadFile(Users) : Users main(String[]) : void run() : void ServerStart() updateUserList() : void userLoginCeck(String) : boolean ~tss Thread

~user

property set + setfileName(String) : void

We have identified 5 classes in total. A Lexer class and a Parser class - which comprise the Analyser package a ParsedTreeStructure class, a Renderer class and a Frontend class. The Lexers job is to build a set of tokens from a source file. The Parser uses these tokens built and
SIDDHARTH INSTITUTE OF PG STUDIES Page | 61

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

deciphers their types. It then builds the tokens seen into nodes and parses them to the ParsedTreeStructure class, where a tree structure of nodes is stored. This tree is then used by the Renderer class to form a model of the page, which is in turn, is used by the Frontend in order to display the final rendered page.

Activity Diagrams:
These activity diagrams show how the use-cases interact with the system and interface. The User starts by initially interacting with the system. The main page is then rendered by the system and it is displayed by the interface, which the user can view. From here the user can click on a link, scroll or close the system. If they chose to click a link, the system renders the new page and it is displayed by the interface, which brings the user back to viewing. If the user chooses to scroll, the system will readjust the page and the interface will display the new snapshot of the page, which also brings the user back to viewing. If the user chooses to close the system, the activity diagram finishes in an exit state. The Unix Developer can do all of the above as it inherits all of the Machine Users use-cases. On top of this they can upload an html file, which will then begin to be rendered by the system. If problems are found with the code, the interface will display a list of problems which the developer can view. Otherwise, if no problems are found, the page will be fully rendered by the system, and then displayed by the interface, which the developer can view. Once developer can view the page or problems, they can then decide to load a new html file. If they do decide to, they go back to Upload New File, if not, the activity reaches an exit state.

SIDDHARTH INSTITUTE OF PG STUDIES Page | 62

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

Send

Splt

M erge

R eceiv e

Sequence Diagram:
A sequence diagram shows, as parallel vertical lines ( lifelines), different processes or objects that live simultaneously, and, as horizontal arrows, the messages exchanged between them, in the order in which they occur. This allows the specification of simple runtime scenarios in a graphical manner. For instance, the UML 1.x diagram on the right describes the sequences of messages of a (simple) restaurant system. This diagram represents a Patron ordering food and wine, drinking wine then eating the food, and finally paying for the food. The dotted lines extending downwards indicate the timeline. Time flows from top to bottom. The arrows represent messages (stimuli) from an actor or object to other objects. For example, the Patron sends message 'pay' to the Cashier. Half arrows indicate asynchronous method calls. The UML 2.0 Sequence Diagram supports similar notation to the UML 1.x Sequence Diagram with added support for modeling variations to the standard flow of events.

SIDDHARTH INSTITUTE OF PG STUDIES Page | 63

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

Node1

Send/Receive

Split / Merge

Node2

the file to be sending Splitting the file Merging the file

Receiving the file

the file has been received

Collaboration Diagram:
This diagram shows the collaboration between the 5 classes and demonstrates the order of events that will take place when a page is loaded. The Lexer will build tokens based on the text it has been given, the Parser then deciphers types of tokens that are generated by the Lexer class. These are then parsed to the Tree Structure in the form of nodes, which are then assembled into a tree structure. Next the Renderer class uses these nodes to build the page and finally the Frontend class takes what the Renderer has created and displays it for the user.
4: displayPage()

:Frontend

1: the file to be sending Node1 Send/Receive

4: Receiving rhe file 3: Merging rhe file 5: the file has been received 2: Splitting the file

Split/Merge

Node2

SIDDHARTH INSTITUTE OF PG STUDIES Page | 64

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

State Diagram:
State diagrams are used to give an abstract description of the behavior of a system. This behavior is analyzed and represented in series of events, that could occur in one or more possible states. Hereby "each diagram usually represents objects of a single class and track the different states of its objects through the system. State diagrams can be used to graphically represent finite state machines. This was introduced by Taylor Booth in his 1967 book "Sequential Machines and Automata Theory". Another possible representation is the State transition table.

Node1

Send Data Base

Split/Merge

Receive

Node2

Component Diagram:
Components are wired together by using an assembly connector to connect the required interface of one component with the provided interface of another component. This illustrates the service consumer service provider relationship between the two components.

SIDDHARTH INSTITUTE OF PG STUDIES Page | 65

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

An assembly connector is a "connector between two components that defines that one component provides the services that another component requires. An assembly connector is a connector that is defined from a required interface or port to a provided interface or port.

Send Node 1

Split/Merge Receive

Node 2

DEPLOYMENT DIAGRAM:
Deployment Diagram shows the configuration of run time processing nodes and the components that live on them. It is used for modeling topology of the hardware on which your system executes.
A deployment diagram in the Unified Modeling Language models the physical deployment of artifacts on nodes.[1] To describe a web site, for example, a deployment diagram would show what hardware components ("nodes") exist (e.g. a web server, an application server, and a database server), what software components ("artifacts") run on each node (e.g. web application, database), and how the different pieces are connected

SIDDHARTH INSTITUTE OF PG STUDIES Page | 66

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

Nd 1 oe

S n /R ed e iv ce e

Nd 2 oe

S lit/M p e rg e

c lass Sche ma1 USER_TB col u m n USERNAM E: VARCHAR2(20 BYT E) USERPASSWORD: VARCHAR2(2 0 BYT E) USERCODE: NUM BER(5 ) ACT IVE: VARCHAR2(20 BYT E) UPLO AD_FL AG: VARCHAR2(2 0 BYT E) DO WNLO AD_ FLAG : VARCHAR2(20 BYT E)

FILES_ TB col um n FIL ECODE: NUM BER(5) FIELNAM E: VARCHAR2(30 BYT E) ULD_USER_CO DE: NUM BER(10)

ACCESS_TB colum n FILECO DE: NUM BER(10) USERCO DE: NUM BER(10) ACCESS_FL AG: NUM BER(1 0)

SIDDHARTH INSTITUTE OF PG STUDIES Page | 67

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

ACCESS_TB

Database:
Detail: Notes:

Oracle, Stereotype: table, Package: Schema1 Created on 4/1/2011. Last modified on 4/1/2011.

Columns

PK

Name

Type NUMB ER NUMB ER NUMB ER

Not Null False False False

Unique Len Pre Scale Init c False False False 0 0 0 10 10 10 0 0 0

Notes

Fals FILECODE e Fals USERCODE e Fals ACCESS_FLAG e

Relationships

Columns

Association USER_TB. ACCESS_TB.

Notes

FILES_TB

Database:
Detail: Notes:

Oracle, Stereotype: table, Package: Schema1 Created on 4/1/2011. Last modified on 4/1/2011.

Columns

PK

Name

Type NUMB ER

Not Null False

Unique Len Pre Scale Init c False False 0 30 5 0 0 0

Notes

Fals FILECODE e Fals FIELNAME

VARCH False

SIDDHARTH INSTITUTE OF PG STUDIES Page | 68

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

Fals ULD_USER_CO NUMB e DE ER

False

False

10

Relationships

Columns

Association USER_TB. FILES_TB.

Notes

USER_TB

Database:
Detail: Notes:

Oracle, Stereotype: table, Package: Schema1 Created on 4/1/2011. Last modified on 4/1/2011.

Columns

PK

Name

Type

Not Null

Unique Len Pre Scale Init c False False False False False False 20 20 0 20 20 20 0 0 5 0 0 0 0 0 0 0 0 0

Notes

Fals USERNAME e Fals USERPASSWO e RD Fals USERCODE e Fals ACTIVE e

VARCH False AR2 VARCH False AR2 NUMB ER False

VARCH False AR2

Fals UPLOAD_FLAG VARCH False e AR2 Fals DOWNLOAD_F e LAG VARCH False AR2

Relationships

Columns

Association USER_TB.

Notes

SIDDHARTH INSTITUTE OF PG STUDIES Page | 69

Y9MC93007
Columns

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks


Association ACCESS_TB. USER_TB. FILES_TB. Notes

Development Phase
Once the design is complete, most of the major decisions about the system have been made. The goal of the coding phase is to translate the design of the system into code in a given programming language. For a given design, the aim of this phase is to implement the design in the best possible manner. The coding phase affects both testing and maintenance profoundly. A well written code reduces the testing and maintenance effort. Since the testing and maintenance cost of software are much higher than the coding cost, the goal of coding should be to reduce the testing and maintenance effort. Hence, during coding the focus should be on developing programs that are easy to write. Simplicity and clarity should be strived for, during the coding phase. An important concept that helps the understandability of programs is structured programming. The goal of structured programming is to arrange the control flow in the program. That is, program text should be organized as a sequence of statements, and during execution, the statements are executed in the sequence in the program. For structured programming, a few single-entry-single-exit constructs should be used. These constructs includes selection (if-then-else), and iteration (while - do, repeat - until etc). With these constructs it is possible to construct a program as sequence of single - entry - single - exit constructs. There are many methods available for verifying the code. Some methods are static in nature that is, that is they do not involve execution of the code. Examples of such methods are data flow SIDDHARTH INSTITUTE OF PG STUDIES Page | 70

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

analysis, code reading, code reviews, testing (a method that involves executing the code, which is used very heavily). In the coding phase, the entire system is not tested together. Rather, the different modules are tested separately. This testing of modules is called "unit testing". Consequently, this phase is often referred to as "coding and unit testing". The output of this phase is the verified and unit tested code of the different modules.

SIDDHARTH INSTITUTE OF PG STUDIES Page | 71

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

System Testing
Testing is the major quality control measure employed during software development. Its basic function is to detect errors in the software. During requirement analysis and design, the output is a document that is usually textual and non-executable. After the coding phase, computer programs are available that can be executed for testing phases. This implies that testing not only has to uncover errors introduced during coding, but also errors introduced during the SIDDHARTH INSTITUTE OF PG STUDIES Page | 72

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

previous phases. Thus, the goal of testing is to uncover requirement, design or coding errors in the programs.

Consequently, different levels of testing are employed. The starting point of testing is unit testing. In this a module is tested separately and is often performed by the coder himself simultaneously with the coding of the module. The purpose is to execute the different parts of the module code to detect coding errors. After this the modules are gradually integrated into subsystem, which are then integrated themselves eventually form the entire system. During integration of modules, integration testing is performed. The goal of this testing is to detect design errors, while focusing on testing the interconnection between modules. After the system is put together, system testing is performed. Here the system is tested against tech system requirements to see if all the requirements are met and the system performs as specified by the requirements. Finally, acceptance testing is performed to demonstrate to the client, on the real life data of the client, the separation of the system. For testing to be successful, proper selection of test cases is essential. There are two different approaches to selecting test cases-functional testing and structural testing. In functional testing the software for the module to be tested is treated as black box, and then test cases are decided based on the specifications of the system or module. For this reason, this form of testing is also called "black box testing". The focus is on testing the external behavior of the system. In structural testing the test cases are decided based on the logic of the module to be tested. Structural testing is sometimes called "glass box testing". Structural testing is used for lower levels of testing and functional testing is used for higher levels. Testing is an extremely critical and time-consuming activity. It requires proper planning of the overall testing process. Frequently the testing process starts with the test plan. This plan identifies all the testing related activities that must be performed and specifies the schedule, allocates the resources, and specify guidelines for testing. The test plan specifies manner in which the modules will integrate together. Then for different test units, a test case specification document is produced, which lists all the different test cases, together with the expected outputs, that will be used for testing. During the testing of the unit, the specified test cases are executed and actual result is compared with the expected output. The final output of the testing phases is to the text report and the error report, or set of such reports (one of each unit is tested). Each test report contains the set of such test cases and the result of executing the code with these test cases The error report describes the errors encountered and action taken to remove those errors.

SIDDHARTH INSTITUTE OF PG STUDIES Page | 73

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

Fundamentals of Software Testing

Testing is basically a process to detect errors in the software product. Before going into the details of testing techniques one should know what errors are. In day-today life we say whenever something goes wrong there is an error. This definition is quite vast. When we apply this concept to software products then we say whenever there is difference between what is expected out of software and what is being achieved, there is an error. For the output of the system, if it differs from what was required, it is due to an error. This output can be some numeric or alphabetic value, some formatted report, or some specific behavior from the system. In case of an error there may be change in the format of out, some unexpected behavior from system, or some value different from the expected is obtained. These errors can due to wrong analysis, wrong design, or some fault on developer's part. All these errors need to be discovered before the system is implemented at the customer's site. Because having a system that does not perform as desired be of no use. All the effort put in to build it goes waste. So testing is done. And it is equally important and crucial as any other stage of system development. For different types of errors there are different types of testing techniques. In the section that follows we'll try to understand those techniques.

OBJECTIVES OF TESTING
First of all the objective of the testing should be clear. We can define testing as a process of executing a program with the aim of finding errors. To perform testing, test cases are designed. A test case is a particular made up artificial situation upon which a program is exposed so as to find errors. So a good test case is one that finds undiscovered errors. If testing is done properly, it uncovers errors and after fixing those errors we have software that is being developed according to specifications.

TEST INFORMATION FLOW


Testing is a complete process. For testing we need two types of inputs. First is software configuration. It includes software requirement specification, design specifications and source code of program. Second is test configuration. It is basically test plan and procedure.

SIDDHARTH INSTITUTE OF PG STUDIES Page | 74

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

Software configuration is required so that the testers know what is to be expected and tested whereas test configuration is testing plan that is, the way how the testing will be conducted on the system. It specifies the test cases and their expected value. It also specifies if any tools for testing are to be used. Test cases are required to know what specific situations need to be tested. When tests are evaluated, test results are compared with actual results and if there is some error, then debugging is done to correct the error. Testing is a way to know about quality and reliability. Error rate that is the occurrence of errors is evaluated. This data can be used to predict the occurrence of errors in future.

Fig 9.1 Testing Process

TEST CASE

DESIGN

We now know, test cases are integral part of testing. So we need to know more about test cases and how these test cases are designed. The most desired or obvious expectation from a test case is that it should be able to find most errors with the least amount of time and effort. A software product can be tested in two ways. In the first approach only the overall functioning of the product is tested. Inputs are given and outputs are checked. This approach is called black box testing. It does not care about the internal functioning of the product. The other approach is called white box testing. Here the internal functioning of the product is tested. Each procedure is tested for its accuracy. It is more intensive than black box testing. But for the overall product both these techniques are crucial. There should be sufficient number of tests in both categories to test the overall product. SIDDHARTH INSTITUTE OF PG STUDIES Page | 75

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

White Box Testing


White box testing focuses on the internal functioning of the product. For this different procedures are tested. White box testing tests the following

Loops of the procedure Decision points Execution paths

For performing white box testing, basic path testing technique is used. We will illustrate how to use this technique, in the following section.

BASIS PATH TESTING


Basic path testing a white box testing technique .It was proposed by Tom McCabe. These tests guarantee to execute every statement in the program at least one time during testing. Basic set is the set of all the execution path of a procedure.

FLOW GRAPH NOTATION


Before basic path procedure is discussed, it is important to know the simple notation used for the repres4enttation of control flow. This notation is known as flow graph. Flow graph depicts control flow and uses the following constructs. These individual constructs combine together to produce the flow graph for a particular procedure

Sequence -

Until -

SIDDHARTH INSTITUTE OF PG STUDIES Page | 76

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

If -

While -

Case -

BASIC

TERMINOLOGY ASSOCIATED WITH THE FLOW GRAPH

Node: Each flow graph node represents one or more procedural statements. Each

node that contains a condition is called a predicate node.


Edge: Edge is the connection between two nodes. The edges between nodes

represent flow of control. An edge must terminate at a node, even if the node does not represent any useful procedural statements.
Region: A region in a flow graph is an area bounded by edges and nodes.

Cyclomatic complexity: Independent path is an execution flow from the start point to the end point. Since a procedure contains control statements, there are various execution paths depending upon decision taken on the control statement. So Cyclomatic complexity provides the number of such execution independent paths. Thus it provides a upper bound for number of tests that must be produced because for each independent path, a test should be conducted to see if it is actually reaching the end point of the procedure or not.

SIDDHARTH INSTITUTE OF PG STUDIES Page | 77

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

Black Box Testing


Black box testing test the overall functional requirements of product. Input are supplied to product and outputs are verified. If the outputs obtained are same as the expected ones then the product meets the functional requirements. In this approach internal procedures are not considered. It is conducted at later stages of testing. Now we will look at black box testing technique. Black box testing uncovers following types of errors. 1. 2. 3. 4. 5. Incorrect or missing functions Interface errors External database access Performance errors Initialization and termination errors.

The following techniques are employed during black box testing

EQUIVALENCE PARTITIONING
In equivalence partitioning, a test case is designed so as to uncover a group or class of error. This limits the number of test cases that might need to be developed otherwise. Here input domain is divided into classes or group of data. These classes are known as equivalence classes and the process of making equivalence classes is called equivalence partitioning. Equivalence classes represent a set of valid or invalid states for input condition. An input condition can be a range, a specific value, a set of values, or a boolean value. Then depending upon type of input equivalence classes is defined. For defining equivalence classes the following guidelines should be used. 1. If an input condition specifies a range, one valid and two invalid equivalence classes are defined. 2. If an input condition requires a specific value, then one valid and two invalid equivalence classes are defined. 3. If an input condition specifies a member of a set, then one valid and one invalid equivalence class are defined. 4. If an input condition is Boolean, then one valid and one invalid equivalence class are defined. For example, the range is say, 0 < count < Max1000. Then form a valid equivalence class with that range of values and two invalid equivalence classes, one SIDDHARTH INSTITUTE OF PG STUDIES Page | 78

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

with values less than the lower bound of range (i.e., count < 0) and other with values higher than the higher bound( count > 1000).

BOUNDARY VALUE ANALYSIS


It has been observed that programs that work correctly for a set of values in an equivalence class fail on some special values. These values often lie on the boundary of the equivalence class. Test cases that have values on the boundaries of equivalence classes are therefore likely to be error producing so selecting such test cases for those boundaries is the aim of boundary value analysis. In boundary value analysis, we choose input for a test case from an equivalence class, such that the input lies at the edge of the equivalence classes. Boundary values for each equivalence class, including the equivalence classes of the output, should be covered. Boundary value test cases are also called extreme cases. Hence, a boundary value test case is a set of input data that lies on the edge or boundary of a class of input data or that generates output that lies at the boundary of a class of output data. In case of ranges, for boundary value analysis it is useful to select boundary elements of the range and an invalid value just beyond the two ends (for the two invalid equivalence classes. For example, if the range is 0.0 <= x <= 1.0, then the test cases are 0.0,1.0for valid inputs and 0.1 and 1.1 for invalid inputs. For boundary value analysis, the following guidelines should be used: For input ranges bounded by a and b, test cases should include values a and b and just above and just below a and b respectively. If an input condition specifies a number of values, test cases should be developed to exercise the minimum and maximum numbers and values just above and below these limits. If internal data structures have prescribed boundaries, a test case should be designed to exercise the data structure at its boundary. Now we know how the testing for software product is done. But testing software is not an easy task since the size of software developed for the various systems is often too big. Testing needs a specific systematic procedure, which should guide the tester in performing different tests at correct time. This systematic procedure is testing strategies, which should be followed in order to test the system developed SIDDHARTH INSTITUTE OF PG STUDIES Page | 79

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

thoroughly. Performing testing without some testing strategy would be very cumbersome and difficult. Testing strategies are discussed the following pages of this chapter.

Strategic Approach towards Software Testing


Developers are under great pressure to deliver more complex software on increasingly aggressive schedules and with limited resources. Testers are expected to verify the quality of such software in less time and with even fewer resources. In such an environment, solid, repeatable, and practical testing methods and automation are a must. In a software development life cycle, bug can be injected at any stage. Earlier the bugs are identified, more cost saving it has. There are different techniques for detecting and eliminating bugs that originate in respective phase. Software testing strategy integrates software test case design techniques into a wellplanned series of steps that result in the successful construction of software. Any test strategy incorporate test planning, test case design, test execution, and the resultant data collection and evaluation. Testing is a set of activities. These activities so planned and conducted systematically that it leaves no scope for rework or bugs. Various software-testing strategies have been proposed so far. All provide a template for testing. Things that are common and important in these strategies are Testing begins at the module level and works outward : tests which are carried out, are done at the module level where major functionality is tested and then it works toward the integration of the entire system. Different testing techniques are appropriate at different points in time: Under different circumstances, different testing methodologies are to be used which will be the decisive factor for software robustness and scalability. Circumstance essentially means the level at which the testing is being done (Unit testing, system testing, Integration testing etc.) and the purpose of testing. The developer of the software conducts testing and if the project is big then there is a testing team: All programmers should test and verify that their results are SIDDHARTH INSTITUTE OF PG STUDIES Page | 80

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

according to the specification given to them while coding. In cases where programs are big enough or collective effort is involved for coding, responsibilities for testing lies with the team as a whole. Debugging and testing are altogether different processes. Testing aims to finds the errors whereas debugging is the process of fixing those errors. But debugging should be incorporated in testing strategies. A software strategy must have low-level tests to test the source code and high-level tests that validate system functions against customer requirements.

Unit Testing
We know that smallest unit of software design is a module. Unit testing is performed to check the functionality of these units. it is done before these modules are integrated together to build the overall system. Since the modules are small in size, individual programmers can do unit testing on their respective modules. So unit testing is basically white box oriented. Procedural design descriptions are used and control paths are tested to uncover errors within individual modules. Unit testing can be done for more than one module at a time. The following are the tests that are performed during the unit testing:

Module interface test: here it is checked if the information is properly flowing into the program unit and properly coming out of it. Local data structures: these are tested to see if the local data within unit(module) is stored properly by them. Boundary conditions: It is observed that much software often fails at boundary conditions. That's why boundary conditions are tested to ensure that the program is properly working at its boundary conditions. Independent paths: All independent paths are tested to see that they are properly executing their task and terminating at the end of the program. Error handling paths: These are tested to check if errors are handled properly by them.

See fig. 9.4 for overview of unit testing

SIDDHARTH INSTITUTE OF PG STUDIES Page | 81

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

Fig 9.4 Unit Testing Unit Testing Procedure

Fig 9.5 Unit Test Procedure Unit testing begins after the source code is developed, reviewed and verified for the correct syntax. Here design documents help in making test cases. Though each module performs a specific task yet it is not a standalone program. It may need data from some other module or it may need to send some data or control information to some other module. Since in unit testing each module is tested individually, so the need to obtain data from other module or passing data to other module is achieved by the use of stubs and drivers. Stubs and drivers are used to simulate those modules. A driver is basically a program that accepts test case data and passes that data to the module that is being tested. It also prints the relevant results. Similarly stubs are also programs that are used to replace modules that are subordinate to the module to be tested. It does minimal data manipulation, prints verification of entry, and returns. Fig. 9.5 illustrates this unit test procedure. Drivers and stubs are overhead because they are developed but are not a part of the product. This overhead can be reduced if these are kept very simple. Once the individual modules are tested then these modules are integrated to form the bigger program structures. So next stage of testing deals with the errors that occur while integrating modules. That's why next testing done is called integration testing, which is discussed next. SIDDHARTH INSTITUTE OF PG STUDIES Page | 82

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

Integration Testing
Unit testing ensures that all modules have been tested and each of them works properly individually. Unit testing does not guarantee if these modules will work fine if these are integrated together as a whole system. It is observed that many errors crop up when the modules are joined together. Integration testing uncovers errors that arises when modules are integrated to build the overall system. Following types of errors may arise:

Data can be lost across an interface. That is data coming out of a module is not going into the desired module. Sub-functions, when combined, may not produce the desired major function. Individually acceptable imprecision may be magnified to unacceptable levels. For example, in a module there is error-precision taken as +- 10 units. In other module same error-precision is used. Now these modules are combined. Suppose the errorprecision from both modules needs to be multiplied then the error precision would be +-100 which would not be acceptable to the system. Global data structures can present problems: For example, in a system there is a global memory. Now these modules are combined. All are accessing the same global memory. Because so many functions are accessing that memory, low memory problem can arise.

There are two approaches in integration testing. One is top down integration and the other is bottom up integration. Now we'll discuss these approaches.

1. Top-Down Integration in Integration Testing


Top-down integration is an incremental approach to construction of program structure. In top down integration, first control hierarchy is identified. That is which module is driving or controlling which module. Main control module, modules subordinate to and ultimately sub-ordinate to the main control block are integrated to some bigger structure. For integrating depth-first or breadth-first approach is used.

SIDDHARTH INSTITUTE OF PG STUDIES Page | 83

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

Fig. 9.6 Top down integration In depth first approach all modules on a control path are integrated first. See fig. 9.6. Here sequence of integration would be (M1, M2, M3), M4, M5, M6, M7, and M8. In breadth first all modules directly subordinate at each level are integrated together. Using breadth first for fig. 9.6 the sequence of integration would be (M1, M2, M8), (M3, M6), M4, M7, andM5. Another approach for integration is bottom up integration, which we discuss in the following page.

2. Bottom-Up Integration in Integration Testing


Bottom-up integration testing starts at the atomic modules level. Atomic modules are the lowest levels in the program structure. Since modules are integrated from the bottom up, processing required for modules that are subordinate to a given level is always available, so stubs are not required in this approach. A bottom-up integration implemented with the following steps: 1. Low-level modules are combined into clusters that perform a specific software subfunction. These clusters are sometimes called builds. 2. A driver (a control program for testing) is written to coordinate test case input and output. 3. The build is tested. SIDDHARTH INSTITUTE OF PG STUDIES Page | 84

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

4. Drivers are removed and clusters are combined moving upward in the program structure.

Fig. 9.7 (a) Program Modules (b)Bottom-up integration applied to program modules in (a) Fig 9.7 shows the how the bottom up integration is done. Whenever a new module is added to as a part of integration testing, the program structure changes. There SIDDHARTH INSTITUTE OF PG STUDIES Page | 85

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

may be new data flow paths, some new I/O or some new control logic. These changes may cause problems with functions in the tested modules, which were working fine previously.

To detect these errors regression testing is done. Regression testing is the reexecution of some subset of tests that have already been conducted to ensure that changes have not propagated unintended side effects in the programs. Regression testing is the activity that helps to ensure that changes (due to testing or for other reason) do not introduce undesirable behavior or additional errors. As integration testing proceeds, the number of regression tests can grow quite large. Therefore, regression test suite should be designed to include only those tests that address one or more classes of errors in each of the major program functions. It is impractical and inefficient to re-execute every test for every program functions once a change has occurred.

Validation Testing
After the integration testing we have an assembled package that is free from modules and interfacing errors. At this stage a final series of software tests, validation testing begin. Validation succeeds when software functions in a manner that can be expected by the customer. Major question here is what are expectations of customers. Expectations are defined in the software requirement specification identified during the analysis of the system.

The specification contains a section titled Validation Criteria Information contained in that section forms the basis for a validation testing. Software validation is achieved through a series of black-box tests that demonstrate conformity with requirements. There is a test plan that describes the classes of tests to be conducted, and a test procedure defines specific test cases that will be used in an attempt to uncover errors in the conformity with requirements. After each validation test case has been conducted, one of two possible conditions exists: The function or performance characteristics conform to specification and are accepted, or A deviation from specification is uncovered and a deficiency list is created. Deviation or error discovered at this stage in a project can rarely be corrected prior to scheduled completion. It is often necessary to negotiate with the customer to establish a method for resolving deficiencies. ALPHA AND BETA TESTING
SIDDHARTH INSTITUTE OF PG STUDIES Page | 86

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

For a software developer, it is difficult to foresee how the customer will really use a program. Instructions for use may be misinterpreted; strange combination of data may be regularly used; and the output that seemed clear to the tester may be unintelligible to a user in the field. When custom software is built for one customer, a series of acceptance tests are conducted to enable the customer to validate all requirements. Acceptance test is conducted by customer rather than by developer. It can range from an informal test drive to a planned and systematically executed series of tests. In fact, acceptance testing can be conducted over a period of weeks or months, thereby uncovering cumulative errors that might degrade the system over time. If software is developed as a product to be used by many customers, it is impractical to perform formal acceptance tests with each one. Most software product builders use a process called alpha and beta testing to uncover errors that only the end user seems able to find. Customer conducts the alpha testing at the developers site. The software is used in a natural setting with the developer. The developer records errors and usage problem. Alpha tests are conducted in a controlled environment. The beta test is conducted at one or more customer sites by the end user(s) of the software. Here, developer is not present. Therefore, the beta test is a live application of the software in an environment that cannot be controlled by the developer. The customer records all problems that are encountered during beta testing and reports these to the developer at regular intervals. Because of problems reported during beta test, the software developer makes modifications and then prepares for release of the software product to the entire customer base.

SIDDHARTH INSTITUTE OF PG STUDIES Page | 87

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

SIDDHARTH INSTITUTE OF PG STUDIES Page | 88

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

Function Requirements
Functional requirements specify which outputs should be produced from the given inputs. They describe the relationship between the input and output of the system, for each functional requirement a detailed description of all data inputs and their source and the range of valid inputs must be specified.

All the operations to be performed on the input data to obtain the output should be specified.

1. Server Management 2. Peer Management 3. Data Publishing 4. Data Extraction

NON FUNCTIONAL REQUIREMENTS


The project non functional requirements include the following. Updating Work status. Problem resolution. Error occurrence in the system. Customer requests.

SIDDHARTH INSTITUTE OF PG STUDIES Page | 89

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

SIDDHARTH INSTITUTE OF PG STUDIES Page | 90

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

SIDDHARTH INSTITUTE OF PG STUDIES Page | 91

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

Server:

SIDDHARTH INSTITUTE OF PG STUDIES Page | 92

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

Client Page:

SIDDHARTH INSTITUTE OF PG STUDIES Page | 93

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

Partition Value:

SIDDHARTH INSTITUTE OF PG STUDIES Page | 94

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

Send File:

SIDDHARTH INSTITUTE OF PG STUDIES Page | 95

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

Partition Files:

Partition Metafile:
SIDDHARTH INSTITUTE OF PG STUDIES Page | 96

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

SIDDHARTH INSTITUTE OF PG STUDIES Page | 97

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

SIDDHARTH INSTITUTE OF PG STUDIES Page | 98

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

CONCLUSIONS AND FUTURE WORK


We proposed a framework for sharing and performing analytical queries on historical multidimensional data in unstructured peer-to-peer networks. In our approach, participants make their resources (and possibly their data in a suitable compressed format) available for the other peers in exchange for the possibility of accessing and posing range queries against the data published by others. Our solution is based on suitable data summarization and indexing techniques, and on mechanisms for data distribution and replication that properly take into account the need of preserving the autonomy of peers as well as the interest exhibited by the users in the data to support an efficient query evaluation. The experimental results showed the effectiveness of our approach in providing fast and accurate query answers, and ensuring the robustness that is mandatory in peer-to-peer settings. Future work will be devoted to considering data updates. On the one hand, updates along the temporal dimension can be managed relatively easily: new data could, in fact, be considered as a new data set to be partitioned, compressed, indexed, and distributed independently from the old data; queries over a time range that involves synopses referring to different time intervals could just be split into subqueries to be processed separately. On the other hand, removing the assumption of consolidated/historical data makes the problem much more complex, as updates can affect the homogeneity of data, making the results of both the partitioning and the compression steps obsolete. The crucial objective is, therefore, that of avoiding the computation of the partitioning and the construction of the subsynopses from scratch by possibly detecting the regions of data whose features are not significantly affected by the update. This would limit the computational load for computing the up-todate synopsis, as well as the network traffic for replacing old data. The absence of a centralized coordination in our setting poses further challenges, as it makes it necessary to devise a nontrivial mechanism for distinguishing among old and new data during query evaluation.

SIDDHARTH INSTITUTE OF PG STUDIES Page | 99

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

[1] S. Acharya, V. Poosala, and S. Ramaswamy, Selectivity Estimation in Spatial Databases, Proc. 1999 ACM SIGMOD, 1999. [2] A. Andrzejak and Z. Xu, Scalable, Efficient Range Queries for
SIDDHARTH INSTITUTE OF PG STUDIES Page | 100

Y9MC93007

Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks

Grid Information Services, Proc. Second Intl Conf. Peer-to-Peer Computing, 2002.

[3] B. Arai, G. Das, D. Gunopulos, and V. Kalogeraki, Approximating Aggregation Queries in Peer-to-Peer Networks, Proc. 22nd Intl Conf. Data Eng., 2006. [4] M. Bawa, H. Garcia-Molina, A. Gionis, and R. Motwani, Estimating Aggregates on a Peer-to-Peer Network, technical report, Stanford InfoLab, 2003. [5] N. Bruno, S. Chaudhuri, and L. Gravano, STHoles: A Multidimensional Workload-Aware Histogram, Proc. 2001 ACM SIGMOD, 2001. [6] S. Chaudhuri and U. Dayal, An Overview of Data Warehousing and OLAP Technology, Sigmod Record, vol. 26, no. 1, pp. 65-74, Mar. 1997. [7] E. Cohen and S. Shenker, Replication Strategies in Unstructured Peer-to-Peer Networks, Proc. ACM SIGCOMM 02, 2002. [ 8] A. Crainiceanu, P. Linga, J. Gehrke, and J. Shanmugasundaram, Querying Peer-to-Peer Networks Using P-Trees, Proc. Seventh Intl Workshop Web and Databases, 2004.

SIDDHARTH INSTITUTE OF PG STUDIES Page | 101

Das könnte Ihnen auch gefallen