Sie sind auf Seite 1von 6

Information Theoretic SOP Expression Minimization Technique

Md. Faisal Kabir, Salahuddin Aziz , Suman Ahmmed and Chowdhury Mofizur Rahman
Department Of Computer Science & Engineering., United International University, Dhaka, Bangladesh, E-mail: faisal@uiu.ac.bd, saziz@uiu.ac.bd , suman@uiu.ac.bd and cmr@uiu.ac.bd

AbstractThe efficient design of multiple Boolean functions is becoming important and necessary during Computer Aided Design for Circuit and Systems (CADCS), especially the manufacture of chips have reached a density of several ten thousands transistors per chip, so called Very Large Scale Integration (VLSI). To simplify the Boolean expressions by conventional approaches like graphical observation-based K-MAP or other simplification procedures become tedious or even impossible when the number of variables in a truth table exceeds certain limit. In this paper we propose an information theory based circuit designing approach for deriving minimal Sum of Product (SOP) expressions for unlimited number of variable. We have verified our approach on a number of cases with the conclusion that the proposed approach is a better alternative to conventional approaches particularly when the numbers of variables restrict the use of conventional approaches. The key feature of proposed method is that it performs a hill-climbing search through the state space of Boolean variables using information theoretic heuristic to find the minimal SOP expression.

of excessive number of Boolean variables. In our approach we develop an information based decision table to determine the tentative SOP expressions and thereafter we revise that SOP expressions by applying some logical rules to finalize the minimal SOP. In the remaining of the paper we describe the detail methodology of the proposed method, briefly describe the relevant underpinning theories, empirical results and performance, and finally we present future research direction. II. PROPOSED METHOD The main objective of this work is to propose a new information based method to find out the minimal sum of product expressions from any Boolean function. To design a digital circuit someone needs to derive SOP expression which will cover all the positive instances (having output one) but restrict the negative ones (having output zero). The proposed method first finds out the information gain for all the variables available in the truth table. We discuss in detail about information gain in section 3.Then we identify the variable (classifier variable) having the highest information gain. Based on this variable we split the truth table instances into two groups. One group contains the instances where the classifier variable is one and other branch contains the instances where the classifier variable is zero. Thus we get two branches from a single root. Then we repeat the same process for each branch by splitting instances into two branches based on best classifier variable until the leaves of the tree contain either positive or negative instances. The leaf nodes containing all the positive instances in different parts of the expression tree constitute the Boolean expressions. If we have four such leaves in the expression tree, then we will have four product expressions. It is to be noted that selection of appropriate variables at each step of tree construction is crucial in coming up with a minimum expression. After collecting the product expressions from the expression tree we minimize each product term by removing redundant variables. We remove a variable from a product term if removing it does not result in coverage of any negative instances. The major steps of minimizing SOP expression are summarized in Figure 1, which are explained further as follows: Step 1: Draw /Create truth table for a given Boolean function. Step 2: Find out the variable set X = { X i} .

KeywordsK-Map, SOP Expression, Information Gain, Entropy, Classification.

I. INTRODUCTION An effective design for a digital circuit is always an important issue. There exist several approaches [6] to simplify and derive a circuit design from the required input/output functions. To design an efficient and minimal expression for a required circuit is a real challenge for the circuit designers. Usually designing of a circuit starts from the truth table of a Boolean function. The algebraic expressions of a circuit derived from the truth table containing a large number of product terms which need to be simplified and minimized in order to produce a cost effective, simple to implement, and easy to understand design. Conventional approaches usually utilize graphical K- MAP or different algebraic simplification methods to minimize a Boolean expression. However these approaches become frustrating and unworkable when the numbers of Boolean variables become larger. In this paper we propose an information theory based approach to simplify the Boolean expressions. The proposed method performs a hill-climbing search through the state space of Boolean variables [8], [11] to build the required expression trees [1-5], [7-8], [9-11]. The proposed method is useful when conventional approaches become computationally expensive and difficult to apply because

1-4244-1551-9/07/$25.00 2007 IEEE.

Authorized licensed use limited to: Air University. Downloaded on June 19, 2009 at 00:43 from IEEE Xplore. Restrictions apply.

Figure 1: Flow chart for minimizing SOP expression

Step 3: Calculate information gain for all elements of X according to information gain measure. Step 4: Determine the best variable C; associated with most information gain. Step 5: Divide the total instances into two groups on the basis of Ci = 1 and Ci = 0 according to the following equations. Y = {Xi / Ci = 1} and Z = {Xi / Ci = 0} Step 6: If all the leaves do not contain all +ve or all ve instances then go to step 7, otherwise go to step 8.

Step 7: Find out the information gain for the elements of Y and Z and find out the best classifier variable Ci for each leaf. Step 8: Find out the SOP expression going down from root to leaves covering only the +ve instances. Step 9: Discard redundant variables from each SOP expression. Step 10: Collect the minimal number of SOP expression covering all the +ve instances.

Authorized licensed use limited to: Air University. Downloaded on June 19, 2009 at 00:43 from IEEE Xplore. Restrictions apply.

III. INFORMATION THEORETIC MEASURE A. Best Classifier The central choice in our algorithm is selecting which variable to test at each step of constructing the product expression. We would like to select the variable that is most useful for separating the instances of the truth table having different outputs. What is a good quantitative measure of the worth of a variable? We will define a statistical property, information gain, to determine how well a given variable separates the instances according to their target outputs. Our algorithm uses this information gain measure to select among the candidate variables at each step while finding the individual product expression. C. Information Gain Measure Given entropy as a measure of the impurity in a collection of instances, we can now define a measure of the effectiveness of a variable in classifying the instances in the truth table. The measure we will use, called information gain, is simply the expected reduction in entropy caused by partitioning the instances according to this variable. More precisely, the information gain, Gain (S, A) of a variable A relative to a collection of instances S, is defined as Gain (S, A) { Entropy (S) -

/Sv / S / Entropy (S) v Values (A)

B. Entropy Measure In order to define information gain precisely, we begin by defining a measure commonly used in information theory, called entropy, that characterizes the (im)purity of an arbitrary collection of instances. Given a collection S, containing positive instances and negative instances of target Boolean function, the entropy of S relative to this Boolean output is Entropy (s) = - p0 log2 p0 p1 log2 p1 Where po is the proportion of positive instances in S and p1 is the proportion of negative instances in S. To illustrate, suppose S is a collection of 14 instances of some Boolean concept, including 9 positive instances and 5 negative instances (we adopt the notation [9+, 5-] to summarize such a sample of data). Then the entropy of S relative to this Boolean classification is Entropy ([9+, 5-]) = - (9/14) log2 (9/14) - (5/14) log2 (5/14) = 0.940 Notice that the entropy is 0 if all members of S belong to the same class. Note that entropy is 1 if the collection contains an equal number of positive and negative instances. If the collection contains unequal numbers of positive and negative instances, the entropy is between 0 and 1. One interpretation of entropy from information theory is that it specifies the minimum number of bits of information needed to encode the classification of an arbitrary member of S (i.e., member of S drawn at random with uniform probability). For example, if po is 1, the receiver knows the drawn example will be positive, so no message need be sent, and the entropy is zero. On the other hand if po is 0.5, one bit is required to indicate whether the drawn example is positive or negative. If po is 0.8, then a collection of messages can be encoded using on average less than 1 bit per message by assigning shorter codes to collections of positive instances and longer codes to less likely negative instances. Here, we consider entropy in the special case where the target output is Boolean. The entropy of S relative to this Boolean output is defined as Entropy (S) { where v is either 0 or 1 and S is the subset of S for which variable A has value v. Note the first term in the above Equation is just the entropy of the original collection S, and the second term is the expected value of the entropy after S is partitioned using variable A. The expected entropy described by this second term is simply the sum of the entropies of each subset Sv weighted by the fraction of instances that belong to S. Gain (S, A) is therefore the expected reduction in entropy caused by knowing the value of variable A. Put another way, Gain (S, A) is the information provided about the target function value, given the value of some other variable A. The value of Gain (S,A) is the number of bits saved when encoding the target value of an arbitrary member of S, by knowing the value of variable A. Information gain is precisely the measure used by our algorithm to select the best variable at each step in growing a tree. The use of information gain to evaluate the relevance of variables is summarized in Figure 2. In this figure the information gain of two different variables, C and D, is computed in order to determine which one is the best variable for

classifying the instances.

- pi log2 pi
i 1

Figure 2: C provides greater information gain than D, relative to the target classification. Here, E stands for entropy and S for the original collection of instances. Given an initial collection S of 9 instances with output 1 and 5 negative instances with output 0, [9+,5-], sorting these by C produces collections [3+,4-] (C =1) and [6+,1-] (C =0). The information gain by this partitioning is 0.151, compared to a gain of only, 0.048 for the variable D.

Where pi is the proportion of S belonging to class i.

Authorized licensed use limited to: Air University. Downloaded on June 19, 2009 at 00:43 from IEEE Xplore. Restrictions apply.

D. i.Building SOP_Expression Tree The SOP EXPRESSION tree is a greedy algorithm that grows the tree top-down, at each node selecting the variable that best classifies the local instances. This process continues until the tree perfectly separates the instances, or until all variables have been used. The pseudo- code of building an expression tree is as follows:

E. An illustrative example To illustrate the operation of our algorithm, consider the truth table in Table 1. Here the target variable Y, which can have values yes (1) or no (0). Consider the first step through the algorithm, In which the topmost node

Function SOP EXPRESSION (variables, instances) if instances have the same output (either 1 or 0) then return the expression going down from root to the leaf. else best = CHOOSE VARIABLES(variables, instances) //using impurity measure tree = a new tree with root best. for each value 1 and 0 of best do instancei = instances with best = i where i= 0 or 1} subtree = SOP EXPRESSION (variables-best, instancesi) Add a branch to tree with labeli (0 or 1) subtree subtree return tree

of the decision tree is created. Which variable should be tested first in the tree? The proposed algorithm determines the information gain for each candidate variable (i.e., A, B, C, D, E, F), then selects the one with highest information gain. The information gain values for all the six variables areGain (S, A) = 0.106 Gain (S, C) = 0.0 Gain (S, E)=0.106 Gain (S, B) = 0.106 Gain (S, D) =0.00 Gain (S, F)=0.106

Where S denotes the collection of instances from Table 1.


SL. 0 1 2 3 4 5 A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 B 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 C 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 D 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 E 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 F 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Y 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1

ii. Further minimization of SOP EXPRESSION algorithm

6 7 8 9

The MINIMAL SOP EXPRESSION algorithm is specialized to discard redundant variable. The pseudo code of MINIMAL SOP EXPRESSION is as follows: Function MINIMAL SOP EXPRESSION 0
1. 2. call SOP EXPRESSION(variable, instances) for each leaf p of the TREE If p has label= 1 S=S U p 3. for each product pE S for each variable of the product p if it can be omitted //if its removal from the product do not coverage // any instance with output 0. then omit variable. 4. Collect the products after omitting variables.

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

5. Check for redundancy of the products and find out final minimal solution.

Authorized licensed use limited to: Air University. Downloaded on June 19, 2009 at 00:43 from IEEE Xplore. Restrictions apply.

SL.
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63

A
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

B
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

C
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

D
0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1

E
0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1

F
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

Y
0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

node until either of two conditions is met: (1) every variable has already been included along this path through the tree, or (2) the instances associated with this leaf node all have the same target variable value (i.e., their entropy is Zero). Figure 3 illustrates the partition of the instances after the first step in growing the decision tree. The final decision tree learned by SOP_EXPRESSION from the 64 instance of Table 1 is shown in Figure 4.

Figure 3: The partial decision tree resulting form the first step of SOP_EXPRESSION. The instances are sorted to the corresponding descendant nodes. These two nodes will be further expanded, by selecting the variable with highest information gain relative to the new subsets of instances.

Table 1 Truth table for the Boolean function Y.

According to the information gain measure, the variables A , B, E and F provides the best prediction of the target variable, Y, over the instances. Therefore, we can select either A, B, C or D as the decision variable for the root node. In our example, A is selected as the decision variable for the root node, and branches are created below the root for each of its possible values. The process of selecting a new variable and partitioning the instances is now repeated for each non-terminal descendant node, this time using only the instances associated with that node. Variables that have been incorporated higher in the tree are excluded, so that any given variable can appear at most once along any path through the tree. This process continues for each new leaf

Figure 4: Final Tree for concept Y

From the final figure the expressions we have found is, S= {AB, AEF, ABEF} Now, we should test each of the products if we can omit a variable. In the first product, we cannot omit A because Y is not 1 for all B We cannot omit B because Y is not 1 for all A

Authorized licensed use limited to: Air University. Downloaded on June 19, 2009 at 00:43 from IEEE Xplore. Restrictions apply.

In the second product, we can omit A' because Y is 1 for all EF We cannot omit E because Y is not 1 for all A'F We cannot omit F because Y is not 1 for all A'E In the second product, we can omit A because Y is 1 for all B'EF We can omit B' because Y is 1 for all AEF We cannot omit E because Y is not 1 for all AB'F We cannot omit F because Y is not 1 for all AB'E So we get, after omitting the redundant variables Y = AB + EF + EF After checking for redundancy the final solution becomes Y = AB + EF CONCLUSION In our experiments we have used a complete truth table as input. However we know for n Boolean variables the truth table contain 2n instances. If n is very large, then handling such enormous data will be cumbersome and computationally expensive. In this situation random sampling from a complete truth table may be an option. In our future research work we will try to verify whether this kind of incomplete data result in our target expression or not. We have used our technique for Boolean logic. It will be interesting to see whether it works for other multi-valued logic. REFERENCES
1. Fayyad, U. M. "On the induction of decision trees for multiple concept learning" (Ph.D. dissertation). EECS Department, University of Michigan, 1991. 2. Fayyad, U. M., & Irani, K. B. "On the handling of continuousvalued attributes in decision tree generation" Machine Learning, 8, 87102, 1992. 3. Fayyad, U. M., & Irani, K. B. "Multi-interval discretization of continuous-valued attributes for classification learning" Proceedings of the 13`h International Joint Conference on Artificial Intelligence (pp. 1022-1027). Morgan Kaufmann, 1993. 4. Lopez de Mantaras, R. "A distance-based attribute selection measure for decision tree induction" Machine Learning, 6(1), 81-92 , 1991. 5. Malerba, D., Floriana, E., & Semeraro, G. "A further comparison of simplification methods for decision tree induction" In D. Fisher & H. Lenz (Eds.), Learning from data: Al and statistics. Springer- erlag,1995 6. M. Morris Mano, "Digital Logic and Computer Design" 9th Edition, 1991 7. Mehta, M., Rissanen, J., & Agarwal, R. "MDL-based decision tree pruning" Proceeding of the First International Conference on Knowledge Discovery and Data Mining (pp. 216-221). Menlo Park, CA: AAAI Press, 1995. 8. Mingers, J. "An empirical comparison of selection measures for decision-tree induction" Machine Learning, 3(4), 319-342, 1886 9. Quinlan, J. R "Induction of decision trees" Machine Learning, 1(1), 81-106, 1986. 10. Ronald J. Tocci, Neal S. Widmer . "Digital Systems Principles and Applications" ,2002 11. Russel Stuart & Norvig Peter "Artificial Intelligence A Modem Approach", 2nd Edition, Pearson Education Series in Artificial Intelligence, 2004.

Authorized licensed use limited to: Air University. Downloaded on June 19, 2009 at 00:43 from IEEE Xplore. Restrictions apply.

Das könnte Ihnen auch gefallen