Beruflich Dokumente
Kultur Dokumente
An Entity Relationship model (ER model) is an abstract way to describe a database. It is a visual representation of different data using conventions that describe how these data are related to each other. There are three basic elements in ER models:
Entities are the things about which we seek information. Attributes are the data we collect about the entities. Relationships provide the structure needed to draw information from multiple entities.
Entities and Attributes Entity Type: It is a set of similar objects or a category of entities that are well defined
A rectangle represents an entity set Ex: students, courses We often just say entity and mean entity type
Attribute: It describes one aspect of an entity type; usually [and best when] single valued and indivisible (atomic)
Types of Attribute: Simple and Composite Attribute Simple attribute that consist of a single atomic value.A simple attribute cannot be subdivided. For example the attributes age, sex etc are simple attributes. A composite attribute is an attribute that can be further subdivided. For example the attribute ADDRESS can be subdivided into street, city, state, and zip code. Simple Attribute: Attribute that consist of a single atomic value. Example: Salary, age etc Composite Attribute : Attribute value not atomic. Example : Address : House_no:City:State Name : First Name: Middle Name: Last Name
Single Valued and Multi Valued attribute A single valued attribute can have only a single value. For example a person can have only one date of birth, age etc. That is a single valued attributes can have only single value. But it can be simple or composite attribute.That is date of birth is a composite attribute , age is a simple attribute. But both are single valued attributes. Multivalued attributes can have multiple values. For instance a person may have multiple phone numbers,multiple degrees etc.Multivalued attributes are shown by a double line connecting to the entity in the ER diagram. Single Valued Attribute: Attribute that hold a single value Example1: Age Exampe 2: City Example 3: Customer id Multi Valued Attribute: Attribute that hold multiple values. Example1: A customer can have multiple phone numbers, email ids etc Example 2: A person may have several college degrees Stored and Derived Attributes The value for the derived attribute is derived from the stored attribute. For example Date of birth of a person is a stored attribute. The value for the attribute AGE can be derived by subtracting the Date of Birth(DOB) from the current date. Stored attribute supplies a value to the related attribute. Stored Attribute: An attribute that supplies a value to the related attribute. Example: Date of Birth 2
Derived Attribute: An attribute thats value is derived from a stored attribute. Example : age, and its value is derived from the stored attribute Date of Birth. Keys Super key: An attribute or set of attributes that uniquely identifies an entitythere can be many of these Composite key:A key requiring more than one attribute Candidate key: a superkey such that no proper subset of its attributes is also a superkey (minimal superkey has no unnecessary attributes) Primary key: The candidate key chosen to be used for identifying entities and accessing records. Unless otherwise noted key means primary key Alternate key: A candidate key not used for primary key Secondary key: Attribute or set of attributes commonly used for accessing records, but not necessarily unique Foreign key: An attribute that is the primary key of another table and is used to establish a relationship with that table where it appears as an attribute also. Graphical Representation in E-R diagram
Rectangle Entity Ellipses Attribute (underlined attributes are [part of] the primary key) Double ellipses multi-valued attribute Dashed ellipses derived attribute, e.g. age is derivable from birthdate and current date. Relationships Relationship: connects two or more entities into an association/relationship
Student (entity type) is related to Department (entity type) by MajorsIn (relationship type).
Relationship Types may also have attributes in the E-R model. When they are mapped to the relational model, the attributes become part of the relation. Represented by a diamond on E-R diagram. Cardinality of Relationships Cardinality is the number of entity instances to which another entity set can map under the relationship. This does not reflect a requirement that an entity has to participate in a relationship. Participation is another concept. One-to-one: X-Y is 1:1 when each entity in X is associated with at most one entity in Y, and each entity in Y is associated with at most one entity in X. One-to-many: X-Y is 1:M when each entity in X can be associated with many entities in Y, but each entity in Y is associated with at most one entity in X. Many-to-many: X:Y is M:M if each entity in X can be associated with many entities in Y, and each entity in Y is associated with many entities in X (many =>one or more and sometimes zero)
Every entity
member set in
of must the
participate relationship
E.g., A Class entity cannot exist unless related to a Faculty member entity in this example, not necessarily at Juniata. You can set this double line in Dia In a relational model we will use the references clause. If every entity participates in exactly one relationship, both a total participation and a key constraint hold E.g., if a class is taught by only one faculty member. Not every entity instance must participate Represented by single line from entity rectangle to relationship diamond E.g., A Textbook entity can exist without being related to a Class or vice versa.
Key constraint
Partial participation
Strong and Weak Entities Strong Entity Vs Weak Entity An entity set that does not have sufficient attributes to form a primary key is termed as a weak entity set. An entity set that has a primary key is termed as strong entity set. A weak entity is existence dependent. That is the existence of a weak entity depends on the existence of a identifying entity set. The discriminator (or partial key) is used to identify other attributes of a weak entity set.The primary key of a weak entity set is formed by primary key of identifying entity set and the discriminator of weak entity set. The existence of a weak entity is indicated by a double rectangle in the ER diagram. We underline the discriminator of a weak entity set with a dashed line in the ER diagram.
2. Make a ER Diagram of Library Management System? (all three levels) Library management System (LMS) provides a simple GUI (graphical user interface) for the Library Staff to manage the functions of the library effectively. Usually when a book is returned or issued, it is noted down in a register after which data entry is done to update the status of the books in a moderate scale. This process takes some time and proper updation cannot be guaranteed. Such anomalies in the updation process can cause loss of books. So a more user friendly interface which could update the database instantly, has a great demand in libraries. E-R Diagram for LMS:
3. Explain decision table and its parts? Make a decision table of a report card A decision table is an excellent tool to use in both testing and requirements management. Essentially it is a structured exercise to formulate requirements when dealing with complex business rules. Decision tables are used to model complicated logic. They can make it easy to see that all possible combinations of conditions have been considered and when conditions are missed, it is easy to see this. A decision table is a good way to deal with combinations of things (e.g. inputs). This technique is sometimes also referred to as a cause-effect table. The reason for this is that there is an associated logic diagramming technique called cause-effect graphing which was sometimes used to help derive the decision table (Myers describes this as a combinatorial logic network. However, most people find it more useful just to use the table described in.
Decision tables provide a systematic way of stating complex business rules, which is useful for developers as well as for testers. Decision tables can be used in test design whether or not they are used in specifications, as they help testers explore the effects of combinations of different inputs and other software states that must correctly implement business rules.
It helps the developers to do a better job can also lead to better relationships with them. Testing combinations can be a challenge, as the number of combinations can often be huge. Testing all combinations may be impractical if not impossible. We have to be satisfied with testing just a small subset of combinations but making the choice of which combinations to test and which to leave out is also important. If you do not have a systematic way of selecting combinations, an arbitrary subset will be used and this may well result in an ineffective test effort. The four quadrants Conditions Condition alternatives Actions Action entries
Each decision corresponds to a variable, relation or predicate whose possible values are listed among the condition alternatives. Each action is a procedure or operation to perform, and the entries specify whether (or in what order) the action is to be performed for the set of condition alternatives the entry corresponds to. Many decision tables include in their condition alternatives the don't care symbol, a hyphen. Using don't cares can simplify
decision tables, especially when a given condition has little influence on the actions to be performed. In some cases, entire conditions thought to be important initially are found to be irrelevant when none of the conditions influence which actions are performed. Aside from the basic four quadrant structure, decision tables vary widely in the way the condition alternatives and action entries are represented. Some decision tables use simple true/false values to represent the alternatives to a condition (akin to if-then-else), other tables may use numbered alternatives (akin to switch-case), and some tables even use fuzzy logic or probabilistic representations for condition alternatives. In a similar way, action entries can simply represent whether an action is to be performed (check the actions to perform), or in more advanced decision tables, the sequencing of actions to perform (number the actions to perform).
10
4. Explain various types of cohesion and coupling along with the Diagram? In software engineering, coupling or dependency is the degree to which each program module relies on each one of the other modules. Coupling is usually contrasted with cohesion. Low coupling often correlates with high cohesion, and vice versa. The software quality metrics of coupling and cohesion were invented by Larry Constantine, an original developer of Structured Design who was also an early proponent of these concepts (see also SSADM). Low coupling is often a sign of a well-structured computer system and a good design, and when combined with high cohesion, supports the general goals of high readability and maintainability. In computer programming, cohesion refers to the degree to which the elements of a module belong together. Thus, it is a measure of how strongly related each piece of functionality expressed by the source code of a software module is. Cohesion is an ordinal type of measurement and is usually expressed as high cohesion or low cohesion when being discussed. Modules with high cohesion tend to be preferable because high cohesion is associated with several desirable traits of software including robustness, reliability, reusability, and understandability whereas low cohesion is associated with undesirable traits such as being difficult to maintain, difficult to test, difficult to reuse, and even difficult to understand. Cohesion is often contrasted with coupling, a different concept. High cohesion often correlates with loose coupling, and vice versa. The software quality metrics of coupling and cohesion were invented by Larry Constantine based on characteristics of good programming practices that reduced maintenance and modification costs. Types of coupling
11
Conceptual model of coupling Coupling can be "low" (also "loose" and "weak") or "high" (also "tight" and "strong"). Some types of coupling, in order of highest to lowest coupling, are as follows: Procedural programming A module here refers to a subroutine of any kind, i.e. a set of one or more statements having a name and preferably its own set of variable names. Content coupling (high) Content coupling (also known as Pathological coupling) occurs when one module modifies or relies on the internal workings of another module (e.g., accessing local data of another module). Therefore changing the way the second module produces data (location, type, timing) will lead to changing the dependent module. Common coupling Common coupling (also known as Global coupling) occurs when two modules share the same global data (e.g., a global variable). Changing the shared resource implies changing all the modules using it. External coupling External coupling occurs when two modules share an externally imposed data format, communication protocol, or device interface. This is basically related to the communication to external tools and devices. Control coupling Control coupling is one module controlling the flow of another, by passing it information on what to do (e.g., passing a what-to-do flag). Stamp coupling (Data-structured coupling) Stamp coupling occurs when modules share a composite data structure and use only a part of it, possibly a different part (e.g., passing a whole record to a function that only needs one field of it). This may lead to changing the way a module reads a record because a field that the module does not need has been modified. Data coupling Data coupling occurs when modules share data through, for example, parameters. Each datum is an elementary piece, and these are the only data shared (e.g., passing an integer to a function that computes a square root). 12
Message coupling (low) This is the loosest type of coupling. It can be achieved by state decentralization (as in objects) and component communication is done via parameters or message passing (see Message passing). No coupling Modules do not communicate at all with one another. Object-oriented programming Subclass Coupling Describes the relationship between a child and its parent. The child is connected to its parent, but the parent is not connected to the child. Temporal coupling When two actions are bundled together into one module just because they happen to occur at the same time. In recent work various other coupling concepts have been investigated and used as indicators for different modularization principles used in practice. Disadvantages Tightly coupled systems tend to exhibit the following developmental characteristics, which are often seen as disadvantages: 1. A change in one module usually forces a ripple effect of changes in other modules. 2. Assembly of modules might require more effort and/or time due to the increased inter-module dependency. 3. A particular module might be harder to reuse and/or test because dependent modules must be included. Performance issues Whether loosely or tightly coupled, a system's performance is often reduced by message and parameter creation, transmission, translation (e.g. marshaling) and message interpretation (which might be a reference to a string, array or data structure), which require less overhead than creating a complicated message such as a SOAP message. Longer messages require more CPU and memory to produce. To optimize runtime performance, message length must be minimized and message meaning must be maximized.
13
Message Transmission Overhead and Performance Since a message must be transmitted in full to retain its complete meaning, message transmission must be optimized. Longer messages require more CPU and memory to transmit and receive. Also, when necessary, receivers must reassemble a message into its original state to completely receive it. Hence, to optimize runtime performance, message length must be minimized and message meaning must be maximized. Message Translation Overhead and Performance Message protocols and messages themselves often contain extra information (i.e., packet, structure, definition and language information). Hence, the receiver often needs to translate a message into a more refined form by removing extra characters and structure information and/or by converting values from one type to another. Any sort of translation increases CPU and/or memory overhead. To optimize runtime performance, message form and content must be reduced and refined to maximize its meaning and reduce translation. Message Interpretation Overhead and Performance All messages must be interpreted by the receiver. Simple messages such as integers might not require additional processing to be interpreted. However, complex messages such as SOAP messages require a parser and a string transformer for them to exhibit intended meanings. To optimize runtime performance, messages must be refined and reduced to minimize interpretation overhead. Solutions One approach to decreasing coupling is functional design, which seeks to limit the responsibilities of modules along functionality, coupling increases between two classes A and B if:
A has an attribute that refers to (is of type) B. A calls on services of an object B. A has a method that references B (via return type or parameter). A is a subclass of (or implements) class B.
Low coupling refers to a relationship in which one module interacts with another module through a simple and stable interface and does not need to be concerned with the other module's internal implementation (see Information Hiding).
14
Systems such as CORBA or COM allow objects to communicate with each other without having to know anything about the other object's implementation. Both of these systems even allow for objects to communicate with objects written in other languages. Coupling versus Cohesion Coupling and Cohesion are terms which occur together very frequently. Coupling refers to the interdependencies between modules, while cohesion describes how related are the functions within a single module. Low cohesion implies that a given module performs tasks which are not very related to each other and hence can create problems as the module becomes large. Module coupling Coupling in Software Engineering describes a version of metrics associated with this concept. For data and control flow coupling:
di: number of input data parameters ci: number of input control parameters do: number of output data parameters co: number of output control parameters gd: number of global variables used as data gc: number of global variables used as control w: number of modules called (fan-out) r: number of modules calling the module under consideration (fan-in)
Coupling(C) makes the value larger the more coupled the module is. This number ranges from approximately 0.67 (low coupling) to 1.0 (highly coupled) For example, if a module has only a single input and output data parameter
If a module has 5 input and output data parameters, an equal number of control parameters, and accesses 10 items of global data, with a fan-in of 3 and a fan-out of 4,
15
COUPLING An indication of the strength of interconnections between program units. Highly coupled have program units dependent on each other. Loosely coupled are made up of units that are independent or almost independent. Modules are independent if they can function completely without the presence of the other. Obviously, can't have modules completely independent of each other. Must interact so that can produce desired outputs. The more connections between modules, the more dependent they are in the sense that more info about one modules is required to understand the other module. Three factors: number of interfaces, complexity of interfaces, type of info flow along interfaces. Want to minimize number of interfaces between modules, minimize the complexity of each interface, and control the type of info flow. An interface of a module is used to pass information to and from other modules. In general, modules tightly coupled if they use shared variables or if they exchange control info. Loose coupling if info held within a unit and interface with other units via parameter lists. Tight coupling if shared global data. If need only one field of a record, don't pass entire record. Keep interface as simple and small as possible. Two types of info flow: data or control.
Passing or receiving back control info means that the action of the module will depend on this control info, which makes it difficult to understand the module. Interfaces with only data communication result in lowest degree of coupling, followed by interfaces that only transfer control data. Highest if data is hybrid.
Ranked highest to lowest: 1. Content coupling: if one directly references the contents of the other. When one module modifies local data values or instructions in another module. (can happen in assembly language) if one refers to local data in another module. if one branches into a local label of another.
16
2. Common coupling: access to global data. modules bound together by global data structures. 3. Control coupling: passing control flags (as parameters or globals) so that one module controls the sequence of processing steps in another module. 4. Stamp coupling: similar to common coupling except that global variables are shared selectively among routines that require the data. E.g., packages in Ada. More desirable than common coupling because fewer modules will have to be modified if a shared data structure is modified. Pass entire data structure but need only parts of it. 5. Data coupling: use of parameter lists to pass data items between routines. COHESION Measure of how well module fits together. A component should implement a single logical function or single logical entity. All the parts should contribute to the implementation. Many levels of cohesion: 1. Coincidental cohesion: the parts of a component are not related but simply bundled into a single component. harder to understand and not reusable. 2. Logical association: similar functions such as input, error handling, etc. put together. Functions fall in same logical class. May pass a flag to determine which ones executed. interface difficult to understand. Code for more than one function may be intertwined, leading to severe maintenance problems. Difficult to reuse 3. Temporal cohesion: all of statements activated at a single time, such as start up or shut down, are brought together. Initialization, clean up. Functions weakly related to one another, but more strongly related to functions in other modules so may need to change lots of modules when do maintenance. 4. Procedural cohesion: a single control sequence, e.g., a loop or sequence of decision statements. Often cuts across functional lines. May contain only part of a complete function or parts of several functions. Functions still weakly connected, and again unlikely to be reusable in another product.
17
5. Communicational cohesion: operate on same input data or produce same output data. May be performing more than one function. Generally acceptable if alternate structures with higher cohesion cannot be easily identified. still problems with reusability. 6. Sequential cohesion: output from one part serves as input for another part. May contain several functions or parts of different functions. 7. Informational cohesion: performs a number of functions, each with its own entry point, with independent code for each function, all performed on same data structure. Different than logical cohesion because functions not intertwined. 8. Functional cohesion: each part necessary for execution of a single function. e.g., compute square root or sort the array. Usually reusable in other contexts. Maintenance easier. 9. Type cohesion: modules that support a data abstraction. Not strictly a linear scale. Functional much stronger than rest while first two much weaker than others. Often many levels may be applicable when considering two elements of a module. Cohesion of module considered as highest level of cohesion that is applicable to all elements in the module.
18
5. Explain project selection technique and data dictionary with the help of example? One of the biggest decisions that any organization would have to make is related to the projects they would undertake. Once a proposal has been received, there are numerous factors that need to be considered before an organization decides to take it up. The most viable option needs to be chosen, keeping in mind the goals and requirements of the organization. How is it then that you decide whether a project is viable? How do you decide if the project at hand is worth approving? This is where project selection methods come in use. Choosing a project using the right method is therefore of utmost importance. This is what will ultimately define the way the project is to be carried out. But the question then arises as to how you would go about finding the right methodology for your particular organization. At this instance, you would need careful guidance in the project selection criteria, as a small mistake could be detrimental to your project as a whole, and in the long run, the organization as well. Selection Methods There are various project selection methods practised by the modern business organizations. These methods have different features and characteristics. Therefore, each selection method is best for different organizations. Although there are many differences between these project selection methods, usually the underlying concepts and principles are the same. Following is an illustration of two of such methods (Benefit Measurement and Constrained Optimization methods):
19
As the value of one project would need to be compared against the other projects, you could use the benefit measurement methods. This could include various techniques, of which the following are the most common:
You and your team could come up with certain criteria that you want your ideal project objectives to meet. You could then give each project scores based on how they rate in each of these criteria and then choose the project with the highest score.
When it comes to the Discounted Cash flow method, the future value of a project is ascertained by considering the present value and the interest earned on the money. The higher the present value of the project, the better it would be for your organization.
The rate of return received from the money is what is known as the IRR. Here again, you need to be looking for a high rate of return from the project.
The mathematical approach is commonly used for larger projects. The constrained optimization methods require several calculations in order to decide on whether or not a project should be rejected. Cost-benefit analysis is used by several organizations to assist them to make their selections. Going by this method, you would have to consider all the positive aspects of the project which are the benefits and then deduct the negative aspects (or the costs) from the benefits. Based on the results you receive for different projects, you could choose which option would be the most viable and financially rewarding. These benefits and costs need to be carefully considered and quantified in order to arrive at a proper conclusion. Questions that you may want to consider asking in the selection process are:
Would this decision help me to increase organizational value in the long run? How long will the equipment last for? Would I be able to cut down on costs as I go along?
In addition to these methods, you could also consider choosing based on opportunity cost When choosing any project, you would need to keep in mind the profits that you would make if you decide to go ahead with the project. Profit optimization is therefore the ultimate goal. You need to consider the difference between the profits of the project you are primarily interested in and the next best alternative. 20
Implementation of the Chosen Method: The methods mentioned above can be carried out in various combinations. It is best that you try out different methods, as in this way you would be able to make the best decision for your organization considering a wide range of factors rather than concentrating on just a few. Careful consideration would therefore need to be given to each project. Conclusion: In conclusion, you would need to remember that these methods are time-consuming, but are absolutely essential for efficient business planning. It is always best to have a good plan from the inception, with a list of criteria to be considered and goals to be achieved. This will guide you through the entire selection process and will also ensure that you do make the right choice. A data dictionary is a collection of data about data. It maintains information about the defintion, structure, and use of each data element that an organization uses. There are many attributes that may be stored about a data element. Typical attributes used in CASE tools (Computer Assisted Software Engineering) are: Name Aliases or synonyms Default label Description Source(s) Date of origin Users Programs in which used Change authorizations Access authorization Data type Length Units(cm., degrees C, etc.) Range of values Frequency of use Input/output/local Conditional values Parent structure 21
Subsidiary structures Repetitive structures Physical location: record, file, data base A data dictionary is invaluable for documentation purposes, for keeping control information on corporate data, for ensuring consistency of elements between organizational systems, and for use in developing databases. Data dictionary software packages are commercially available, often as part of a CASE package or DBMS. DD software allows for consistency checks and code generation. It is also used in DBMSs to generate reports. The term data dictionary and data repository are used to indicate a more general software utility than a catalogue. A catalogue is closely coupled with the DBMS software. It provides the information stored in it to the user and the DBA, but it is mainly accessed by the various software modules of the DBMS itself, such as DDL and DML compilers, the query optimiser, the transaction processor, report generators, and the constraint enforcer. On the other hand, a data dictionary is a data structure that stores metadata, i.e., (structured) data about data. The software package for a stand-alone data dictionary or data repository may interact with the software modules of the DBMS, but it is mainly used by the designers, users and administrators of a computer system for information resource management. These systems are used to maintain information on system hardware and software configuration, documentation, application and users as well as other information relevant to system administration. If a data dictionary system is used only by the designers, users, and administrators and not by the DBMS Software, it is called a passive data dictionary. Otherwise, it is called an active data dictionary or data dictionary. When a passive data dictionary is updated, it is done so manually and independently from any changes to a DBMS (database) structure. With an active data dictionary, the dictionary is updated first and changes occur in the DBMS automatically as a result. Database users and application developers can benefit from an authoritative data dictionary document that catalogs the organization, contents, and conventions of one or more databases. This typically includes the names and descriptions of various tables (records or Entities) and their contents (fields) plus additional details, like the type and length of each data element. Another important piece of information that a data dictionary can provide is the relationship between Tables. This is sometimes referred to in Entity22
Relationship diagrams, or if using Set descriptors, identifying in which Sets database Tables participate. In an active data dictionary constraints may be placed upon the underlying data. For instance, a Range may be imposed on the value of numeric data in a data element (field), or a Record in a Table may be FORCED to participate in a set relationship with another Record-Type. Additionally, a distributed DBMS may have certain location specifics described within its active data dictionary (e.g. where Tables are physically located). The data dictionary consists of record types (tables) created in the database by systems generated command files, tailored for each supported back-end DBMS. Command files contain SQL Statements for CREATE TABLE, CREATE UNIQUE INDEX, ALTER TABLE (for referential integrity), etc., using the specific statement required by that type of database. There is no universal standard as to the level of detail in such a document. Middleware In the construction of database applications, it can be useful to introduce an additional layer of data dictionary software, i.e. middleware, which communicates with the underlying DBMS data dictionary. Such a "high-level" data dictionary may offer additional features and a degree of flexibility that goes beyond the limitations of the native "low-level" data dictionary, whose primary purpose is to support the basic functions of the DBMS, not the requirements of a typical application. For example, a high-level data dictionary can provide alternative entity-relationship models tailored to suit different applications that share a common database. Extensions to the data dictionary also can assist in query optimization against distributed databases. Additionally, DBA functions are often automated using restructuring tools that are tightly coupled to an active data dictionary. Software frameworks aimed at rapid application development sometimes include highlevel data dictionary facilities, which can substantially reduce the amount of programming required to build menus, forms, reports, and other components of a database application, including the database itself. For example, PHPLens includes a PHP class library to automate the creation of tables, indexes, and foreign key constraints portably for multiple databases. Another PHP-based data dictionary, part of the RADICORE toolkit, automatically generates program objects, scripts, and SQL code for menus and forms with data validation and complex joins. For the ASP.NET environment, Base One's data 23
dictionary provides cross-DBMS facilities for automated database creation, data validation, performance enhancement (caching and index utilization), application security, and extended data types. Visual DataFlex features provides the ability to use DataDictionaries as class files to form middle layer between the user interface and the underlying database. The intent is to create standardized rules to maintain data integrity and enforce business rules throughout one or more related applications. Platform-specific examples Data description specifications (DDS) allow the developer to describe data attributes in file descriptions that are external to the application program that processes the data, in the context of an IBM System i. The table below is an example of a typical data dictionary entry. The IT staff uses this to develop and maintain the database. Field Name CustomerID Title Data Type Autonumber Text Other information Primary key field Lookup: Mr, Mrs, Miss, Ms Field size 4 Field size 15 Indexed Field size 15 Format: Medium Date Range check: >=01/01/1930 Field size: 12 Presence check
HomeTelephone
Text
24
6. Explain data flow diagram and pseudo codes with the difference between physical DFD and logical DFD any five points? To understand the differences between a physical and logical DFD, we need to know what DFD is. A DFD stands for data flow diagram and it helps in representing graphically the flow of data in an organization, particularly its information system. A DFD enables a user to know where information comes in, where it goes inside the organization and how it finally leaves the organization. DFD does give information about whether the processing of information takes place sequentially or if it is processed in a parallel fashion. There are two types of DFDs known as physical and logical DFD. Though both serve the same purpose of representing data flow, there are some differences between the two that will be discussed in this article. Any DFD begins with an overview DFD that describes in a nutshell the system to be designed. A logical data flow diagram, as the name indicates concentrates on the business and tells about the events that take place in a business and the data generated from each such event. A physical DFD, on the other hand is more concerned with how the flow of information is to be represented. It is a usual practice to use DFDs for representation of logical data flow and processing of data. However, it is prudent to evolve a logical DFD after first developing a physical DFD that reflects all the persons in the organization performing various operations and how data flows between all these persons. What is the difference between Physical DFD and Logical DFD? While there is no restraint on the developer to depict how the system is constructed in the case of logical DFD, it is necessary to show how the system has been constructed. There are certain features of logical DFD that make it popular among organizations. A logical DFD makes it easier to communicate for the employees of an organization, leads to more stable systems, allows for better understanding of the system by analysts, is flexible and easy to maintain, and allows the user to remove redundancies easily. On the other hand, a physical DFD is clear on division between manual and automated processes, gives detailed description of processes, identifies temporary data stores, and adds more controls to make the system more efficient and simple. Data Flow Diagrams (DFDs) are used to show the flow of data through a system in terms of the inputs, processes, and outputs.
25
External Entities Data either comes from or goes to External Entities. They are either the source or destination (sometimes called a source or sink) of data, which is considered to be external to the system. It could be people or groups that provide or input data to the system or who receive data from the system Defined by an oval see below. Identified by a noun. External Entities are not part of the system but are needed to provide sources of data used by the system. Fig 1 below shows an example of an External Entity
Customer
Fig 1 External Entity Processes and Data Flows Data passed to, or from an External Entity must be processed in some way. The passing of data (flow of data) is shown on the DFD as an arrow. The direction of the arrow defines the direction of the flow of data. All data flows to and from External Entities to Processes and vice versa need to be named. Fig 2 below shows an example of a data flow: Customer details Fig 2 Data Flow Process processing data that emanates from external entities or data stores. The process could be manual, mechanised, or automated/computed. A data process will use or alter the data in some way. Identified from a scenario by a verb or action. Each process is given a unique number and is also given a name. An example of a Process is shown in Fig 3 below: 1 Add New Customer
Fig 3 - Process
26
Data Stores A Data Store is a held and receives through data flows. Entity Entity No Process Yes Store No point where data is or provides data Examples of data
stores are transaction records, data files, reports, and documents. Could be a filing cabinet or magnetic media. Data stores are named in the singular and numbered. A manual store such as a filing cabinet is numbered with an M prefix. A D is used as a prefix for an electronic store such as a relational table. An example of an electronic data store is shown in Fig 4 Customer below
Fig 4 Data Store Rules There are certain rules that must be applied when drawing DFDs. These are explained below:
An external entity cannot be connected to another external entity by a data flow An external entity cannot be connected directly to a data store An external entity must pass data to, or receive data from a process using a data flow A data store cannot be directly connected to another data store A data store cannot be directly connected to an external entity A data store can pass data to, or receive data from a process A process can pass data to and receive data from another process Data must flow from external entity to a process and then be passed onto anther process or a data store
Process Store
Yes No
Yes Yes
Yes No
27
There are different levels of DFDs depending on the level of detail shown Level 0 or context diagram The context diagram shows the top-level process, the whole system, as a single process rectangle. It shows all external entities and all data flows to and from the system. Analysts draw the context diagram first to show the high-level processing in a system. An example of a Context Diagram is shown in Fig 6 below:
Customer
invoice details
Customer
Fig 6 Context Diagram for a Car Sales System Level 1 DFD This level of DFD shows all external entities that are on the context diagram, all the highlevel processes and all data stores used in the system. Each high-level process may contain sub-processes. These are shown on lower level DFDs.
28
A Level 1 DFD for the Car Sales scenario is shown in Fig 7 below:
1 Add New Customer 2 Create Monthly Sales Report monthly report details * Management
Sales
Customer updated customer details customer details D1 Customer D3 5 Update Customer * Sales sales details updated customer details customer details D2 Car customer details sales details invoice details car details D1 Customer
car details car details 3 Add New Sale Customer Order Details 4 staff details Add New Car Details new car details * Management car details 6 Create Customer Invoice
D4 Customer
Staff
29
Level 2 DFDs Each Level 1 DFD process may contain further internal processes. These are shown on the Level 2 DFD. The numbering system used in the Level 1 DFD is continued and each process in the Level 2 DFD is prefixed by the Level 1 DFD number followed by a unique number for each process i.e. for process 1, sub processes 1.1, 1.2, 1.3 etc see fig 8 below
3 Customer
Add New Sale 3.3 3.1 Validate Order * validated order details validated staff dets 3.2 Generate New Sale D4 * Staff staff details Add staff to order
D1
Customer
car details
sales details
D2
Car
D3
Sales
Fig 8 Level 2 DFD for Level 1 Process Add New Sale Each of the Level 2 DFDs could also have sub-processes and could be decomposed further into lower level DFDs i.e. 1.1.1, 1.1.2, 1.1.3 etc More than 3 levels for a DFD would become unmanageable. Lowest Level DFDs and Process Specification Once the DFD has been decomposed into its lowest level, each of the lower level DFDs can be described using pseudo-code (structured English), flow chart or similar process specification method that can be used by a programmer to code each process or function. For example, the Level 2 DFD for the Add New Sale process could be described as being a process that contains 3 sub-processes, Validate Order, Add Staff to Order and Generate New Sale. The structured English could be written thus: Open Customer File If existing customer Check Customer Details Else Add customer details
30
End If Open Car File If car available then Open Sale File Add customer to sale Set car to unavailable Add car to sale Add staff details Calculate price Generate Invoice Close Sale File Close Customer File Close Car File Inform User of successful sale exit process Else Inform User of problem exit process Close Customer File Close Car File End If The above example is not carved in stone as the analyst may decide to write separate functions to validate customer and car details and that the Generate New Sale process could include other sub-processes. All that matters is that the underlying processing logic solves the problem. For example, if you look at Figure 8 there is a process named Validate Order, which has a duel purpose of checking both the customer details (is customer a current customer, if not add to customer file) and the car details (is car available, if not stop the sale process). A separate process called Validate Order could be created, but I have written the structured English to show a logical sequence that shows that, only if the car is available do we begin the transaction of creating the sale. I have also assumed that the staff dealing with the sale will know their own details so there would not be a need for the process named Add Staff to Order. Like all analysis and design processes, the process of producing DFDs and writing structured English is an iterative process 31
7. Explain coding techniques and types of codes? It is required that information must be encoded into signals before it can be transported across communication media. In more precise words we may say that the waveform pattern of voltage or current used to represent the 1s and 0s of a digital signal on a transmission link is called digital to digital line encoding. There are different encoding schemes available: Digltal-to-Digltal Encoding It is the representation of digital information by a digital signal.
There are basically following types of digital to-digital encoding available like: Unipolar Polar Bipolar. Unipolar Unipolar encoding uses only one level of value 1 as a positive value and 0 remains Idle. Since unipolar line encoding has one of its states at 0 Volts, its also called Return to Zero (RTZ) as shown in Figure. A common example of unipolar line encoding is the 11'L logic levels used in computers and digital logic.
Unipolar encoding represents DC (Direct Current) component and therefore, ca.'1nottravel through media such as microwaves or transformers. It has low noise margin and needs extra hardware for synchronization purposes. It is well suited where the signal path is short. For long distances, it produces stray capacitance in the transmission medium and therefore, it never returns to zero as shown in Figure.
32
Polar Polar encoding uses two levels of voltages say positive and negative. For example, the RS:232D interface uses Polar line encoding. The signal does not return to zero; it is either a positive voltage or a negative voltage. Polar encoding may be classified as nonreturn to zero (NRZ), return to zero (RZ) and biphase. NRZ may be further divided into NRZL and NRZI. Biphase has also two different categories as Manchester and Differential Manchester encoding. Polar line encoding is the simplest pattern that eliminates most of the residua! DC problem. Figure shows the Polar line encoding. It has the same problem of synchronization as that of unipolar encoding. The added benefit of polar encoding is that it reduces the power required to transmit the signal by one-half.
Non-Return to Zero (NRZ) In NRZL, the level of the signal is 1 if the amplitude is positive and 0 in case of negative amplitude. In NRZI, whenever a positive amplitude or bit I appears in the signal, the signal gets inverted, Figure explains the concepts of NRZ-L and NRZI more precisely.
33
Return to Zero (RZ) RZ uses three values to represent the signal. These are positive, negative, and zero. Bit 1is represented when signal changes from positive to zero. Bit 0 is represented when signal changes from negative to zero. Figure explains the RZ concept.
Biphase Biphase is implemented in two different ways as Manchester and Differential Manchester encoding. In Manchester encoding, transition happens at the middle of each bit period. A low to high transition represents a 1 and a high to low transition represents a 0.In case of Differential Manchester encoding, transition occurs at the beginning of a bit time, which represents a zero. These encoding can detect errors during transmission because of the transition during every bit period. Therefore, the absence of a transition would indicate an error condition.
34
They have no DC component and there is always a transition available for synchronizing receives and transmits clocks. Bipolar Bipolar uses three voltage levels. These are positive, negative, and zero. Bit 0 occurs at zero level of amplitude. Bit 1 occurs alternatively when the voltage level is either positive or negative and therefore, also called as Alternate Mark Inversion (AMI). There is no DC component because of the alternate polarity of the pulses for Is. Figure describes bipolar encoding.
Analog to Digital Analog to digital encoding is the representation of analog information by a digital signal. These include PAM (Pulse Amplitude Modulation), and PCM (Pulse Code Modulation). Digital to Analog These include ASK (Amplitude Shift Keying), FSK (Frequency Shift Keying), PSK (Phase Shift Keying), QPSK (Quadrature Phase Shift Keying), are QAM (Quadrature Amplitude Modulation). Analog to Analog These are Amplitude modulation, Frequency modulation and Phase modulation techniques, Codecs (Coders and Decoders) Codec stands for coders/decompression in data communication. The reverse conversion of analog to digital is necessary in situations where it is advantageous to send analog information across a digital circuit. Certainly, this is often the case in carrier networks, where huge volumes of analog voice are digitized and sent across high capacity, digital circuits. The device that accomplishes the analog to digital conversion is known as a 35
codec. Codecs code an analog input into a digital format on the transmitting side of the connection, reversing the process, or decoding the information on the receiving side, in order to reconstitute the analog signal. Codecs are widely used to convert analog voice and video to digital format, and to reverse the process on the receiving end.
36
8. Explain algorithm with detect error module (eleven code) and module n code with the help of algorithm and examples In information theory and coding theory with applications in computer science and telecommunication, error detection and correction or error control are techniques that enable reliable delivery of digital data over unreliable communication channels. Many communication channels are subject to channel noise, and thus errors may be introduced during transmission from the source to a receiver. Error detection techniques allow detecting such errors, while error correction enables reconstruction of the original data. Error correction may generally be realized in two different ways:
Automatic repeat request (ARQ) (sometimes also referred to as backward error correction): This is an error control technique whereby an error detection scheme is combined with requests for retransmission of erroneous data. Every block of data received is checked using the error detection code used, and if the check fails, retransmission of the data is requested this may be done repeatedly, until the data can be verified.
Forward error correction (FEC): The sender encodes the data using an errorcorrecting code (ECC) prior to transmission. The additional information (redundancy) added by the code is used by the receiver to recover the original data. In general, the reconstructed data is what is deemed the "most likely" original data.
ARQ and FEC may be combined, such that minor errors are corrected without retransmission, and major errors are corrected via a request for retransmission: this is called hybrid automatic repeat-request (HARQ). Error detection is most commonly realized using a suitable hash function (or checksum algorithm). A hash function adds a fixed-length tag to a message, which enables receivers to verify the delivered message by recomputing the tag and comparing it with the one provided. There exists a vast variety of different hash function designs. However, some are of particularly widespread use because of either their simplicity or their suitability for detecting certain kinds of errors (e.g., the cyclic redundancy check's performance in detecting burst errors). Random-error-correcting codes based on minimum distance coding can provide a suitable alternative to hash functions when a strict guarantee on the minimum number of errors to be detected is desired. Repetition codes, described below, are special cases of error37
correcting codes: although rather inefficient, they find applications for both error correction and detection due to their simplicity. Repetition codes A repetition code is a coding scheme that repeats the bits across a channel to achieve error-free communication. Given a stream of data to be transmitted, the data is divided into blocks of bits. Each block is transmitted some predetermined number of times. For example, to send the bit pattern "1011", the four-bit block can be repeated three times, thus producing "1011 1011 1011". However, if this twelve-bit pattern was received as "1010 1011 1011" where the first block is unlike the other two it can be determined that an error has occurred. Repetition codes are very inefficient, and can be susceptible to problems if the error occurs in exactly the same place for each group (e.g., "1010 1010 1010" in the previous example would be detected as correct). The advantage of repetition codes is that they are extremely simple, and are in fact used in some transmissions of numbers stations. Parity bits A parity bit is a bit that is added to a group of source bits to ensure that the number of set bits (i.e., bits with value 1) in the outcome is even or odd. It is a very simple scheme that can be used to detect single or any other odd number (i.e., three, five, etc.) of errors in the output. An even number of flipped bits will make the parity bit appear correct even though the data is erroneous. Extensions and variations on the parity bit mechanism are horizontal redundancy checks, vertical redundancy checks, and "double," "dual," or "diagonal" parity (used in RAID-DP). Checksums A checksum of a message is a modular arithmetic sum of message code words of a fixed word length (e.g., byte values). The sum may be negated by means of a ones'-complement operation prior to transmission to detect errors resulting in all-zero messages. Checksum schemes include parity bits, check digits, and longitudinal redundancy checks. Some checksum schemes, such as the Damm algorithm, the Luhn algorithm, and the Verhoeff algorithm, are specifically designed to detect errors commonly introduced by humans in writing down or remembering identification numbers. Cyclic redundancy checks (CRCs) A cyclic redundancy check (CRC) is a single-burst-error-detecting cyclic code and nonsecure hash function designed to detect accidental changes to digital data in computer 38
networks. It is not suitable for detecting maliciously introduced errors. It is characterized by specification of a so-called generator polynomial, which is used as the divisor in a polynomial long division over a finite field, taking the input data as the dividend, and where the remainder becomes the result. Cyclic codes have favorable properties in that they are well suited for detecting burst errors. CRCs are particularly easy to implement in hardware, and are therefore commonly used in digital networks and storage devices such as hard disk drives. Even parity is a special case of a cyclic redundancy check, where the single-bit CRC is generated by the divisor x + 1. Cryptographic hash functions The output of a cryptographic hash function, also known as a message digest, can provide strong assurances about data integrity, whether changes of the data are accidental (e.g., due to transmission errors) or maliciously introduced. Any modification to the data will likely be detected through a mismatching hash value. Furthermore, given some hash value, it is infeasible to find some input data (other than the one given) that will yield the same hash value. If an attacker can change not only the message but also the hash value, then a keyed hash or message authentication code (MAC) can be used for additional security. Without knowing the key, it is infeasible for the attacker to calculate the correct keyed hash value for a modified message. Error-correcting codes Any error-correcting code can be used for error detection. A code with minimum Hamming distance, d, can detect up to d 1 errors in a code word. Using minimumdistance-based error-correcting codes for error detection can be suitable if a strict limit on the minimum number of errors to be detected is desired. Codes with minimum Hamming distance d = 2 are degenerate cases of error-correcting codes, and can be used to detect single errors. The parity bit is an example of a singleerror-detecting code. In digital data transmission, error occurs due to noise. The probability of error or bit error rate depends on the signal to noise ratio, the modulation type and the method of demodulation.
39
For example, if p=0.1 we would expect on average there would be 1 error in every 10 bits. A p=0.1 actually stating that every bit has a 1/10th probability of being in error. Depending on the type of system and many factors, error rates typically range from 10-1 to 10-5 or better. Information transfer via digital system is usually packaged into a structure (a block of bits) called a message block or frame. A typical message block contains the following: Synchronization pattern to mark the start of message block Destination and sometimes source addresses System control/ commands Information Error control coding check bits
The total number of the bits in the block may vary widely ( from say 32 bits to several hundreds bits) depending on the requirement. Clearly, if the bits are subjected to an error rate p, there is some probability that a message block will be received with 1 or more bits in error. In order to counteract the effects of errors, error control coding techniques are used to either: a) detect errors error detection b) correct error error detection and correction Broadly, there are two types of error control codes: a) Block Codes Parity codes Array codes Repetition codes Cyclic codes etc b) Convolutional Codes
40
BLOCK CODES
A block code is a coding technique which generates C check bits for M message bits to give a stand alone block of M+C= N bits.
The sync bits are usually not included in the error control coding because message synchronization must be achieved before the message and check bits can be processed. The code rate is given by Rate =
M M = M +C N
Where, M = number of message bits C = number of check bits N= M+C= total number of bits. The code rate is the measure of the proportion of free user assigned bits (M) to the total bits in the blocks (N). For example, i) A single parity bit (C=1 bit) applied to a block of 7 bits give a code rate R=
7 7 = 7 +1 8
ii)
4 7
41
iii)
and the receiver carries out a majority vote on each bit has a code rate 1 M = mM m
Rate =
Consider message transferred from a Source to a Destination, and assume that the
Destination is able to check the received messages and detect errors.
If no errors are detected, the Destination will accept the messages. If errors are detected, there are two forms of error corrections.
a) Automatic Retransmission Request (ARQ)
In ARQ system, the destination send an acknowledgment ACK message back to the source if the errors are not detected, and a Not-ACK (NAK) message back to the source if errors are detected. If the source receives an ACK to a message it will send the next message. If the source receives a NAK it repeats the same message. This process repeat until all the messages is accepted by the destination.
42
The error control code may be powerful enough to allow the destination to attempt to correct the errors by further processing. This is called Forward Error Correction, no ACKs or NAKs are required. Many systems are hybrid in that they use both ARQ (ACK/NAK) and FEC strategies for error correction.
Successful, False & Lost Message Transfer
The process of checking the received messages for errors gives two possible outcomes: a) Errors not detected messages accepted b) Errors detected messages re rejected An error not detected does not mean that errors are not present. Error control codes cannot detect every possible error or combinations of errors. However, if error are not detected the destination has not alternative but to accept the message, true or false. That is, we may conclude if errors are not detected either
43
a) that there were no errors, i.e. the messages accepted are true or in other words successful message transfer. b) that there were undetected errors, i.e. the messages accepted was false or in other words a false message transfer. If errors are detected, the destination does not accept the message and may either request a re-transmission (ARQ-system) or process the block further in an attempt to correct the error (FEC). In processing the block of error correction, again there are two possible outcomes. a) the processor may get it right, i.e. correct the error and give a successful message transfer. b) the processor may get it wrong, i.e. not correct the errors in which case there is a false message transfer. Some codes have a range of ability to detect and correct errors. For example a code may be able to detect and correct 1 error (single bit error) and detect 2,3 and 4 bits in error, but not correct them. Thus even with FEC, some messages may still be rejected and we think of these as lost messages. These ideas are illustrated below:
44
MESSAGE TRANSFERS
Consider message transfer between two computers e.g. it is required to transfer the contents of Computer A to Computer B.
COMPUTER A
COMPUTER B
As discussed, of the messages transferred to the Computer B, some may be rejected (lost) and some will be accepted, and will be either true (successful transfer) or false. Obviously the requirement is for a high probability of successful transfer (ideally = 1), low probability of false transfer (ideally = 0) and a low probability of lost messages. In particular the false rate should be kept low, even at the expense of an increased lost message rate. Note in some messages there may be in-built redundancy for example in text message REPAUT FOR WEDLESDAY However if this is followed by 10 JUNE we would ?? 10 Other example where there is little or no redundancy are Car registration numbers, Accounts etc, generally numeric or unstructured alpha-numeric information. There is thus a need for a low false rate appropriate to the function of the system and it is important for the information in Computer B to be correct even if it takes a long time to transfer. Error control coding may be considered further in two main ways.