Sie sind auf Seite 1von 16

PM World Today June 2010 (Vol XII, Issue VI)

PM WORLD TODAY FEATURED PAPER JUNE 2010

Project Data Mining and Project Estimation Top-down Methodology with TRANSCALE Tool
By Pavel Barseghyan, PhD

Abstract
The purpose of project data mining is the generalization of project management experience for successful implementation of projects during their planning and execution. The quantitative nature of the project data implies that the known methods of the theory of experiments can be applied for their processing. But the traditional methods of the theory of experiment have limited applications in the case of project data, since these data are not the results of planned experiments. In fact these data are the results of a random collection of information about the projects. In turn, this means that from the standpoint of the theory of experiments these projects as an experiment are carried out in a variety of conditions. Consequently the results of such experiments impossible and inappropriate compare with each other. Under these conditions, the successful processing of project data is mainly related to how well these projects can be divided into groups, using different project similarity criteria. Such a partitioning of project data into the groups has only one goal. This goal is to convert projects as experiments, performed in different conditions, into the groups of experiments carried out in almost identical conditions. This enables the application of the classical theory of experiments, in particular, regression analysis techniques, to project data. This article discusses a new methodology of project data mining, through which data is presented in the form of the families of curves, reflecting the fundamental relationships between the parameters of projects. This enables us to radically improve the accuracy of estimates and predictions of project parameters. In this work the processing and analysis of project data is performed using TRANSCALE technology and tool, which in turn are based on the new mathematical theory of projects. Key words: Project data mining, critique of data mining statistical methods, generalized theory of projects, analytical relationships between project parameters, top-down methodology of project data mining, TRANSCALE tool for top-down data mining and knowledge extraction.

PM World Today is a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net

Page 1

PM World Today June 2010 (Vol XII, Issue VI)

Introduction
Experience of project data mining over the last 40-50 years showed that utilization in this area of primitive methods of statistical analysis gives equally primitive and most importantly wrong results. For unknown reasons, it is assumed that complex and multifaceted process of project data mining can be solved by such a primitive method, which is a direct application of regression analysis to these data. Even the prolonged failure in this area is unable to shake people's faith that their intuition is sufficient to solve complex quantitative problems of project data mining. Sometimes people make even such a statement that the application of basic quantitative methods in this area is an end in itself and cannot lead to success. Specifying the above statements for project data mining one can be simply amazed by the insistence of the leading universities and research centers that continue to use statistical methods for this specific purpose, despite the fact that these methods in terms of accuracy over the past 40-50 years have never paid off in the field of project management. Usually, if some scientific methodology systematically does not work very well, people simply refuse its further usage, trying to replace it with new, more reasonable solutions to the problems. But to our surprise, this does not happen in the area of project data mining and estimation of project parameters. It's time to realize that to solve problems in this area there is a need to shift from the outdated methods of statistical analysis to a more scientifically-sound methodologies. To do this it is necessary, following the experience of more developed areas of knowledge, to try to get out of simple empiricism which currently dominated in this area and to develop more sound scientific methodologies in the field of project management. Fortunately, one can cite many instructive examples from the other areas of knowledge. Nearly every serious quantitative science has passed through this way and it is not necessary to break a new ground in project management. In order to do this we must use the experience of those areas of knowledge, which in spirit are the closest to the problems of project management. In this sense, it is important to use the experience of classical thermodynamics, which has passed all the way from the primitive empiricism to the most current heights of scientific and practical achievements. Experience in other fields of knowledge shows that overcoming the limitations of the statistical approach one can proceed to the development of the genuine mathematical theory of projects. In this way, we must first get rid of the so-called statistical curse, when the results of data processing are directly dependent on the choice of specific data. In a truly scientific approach, this cannot happen, and always stable results of data processing should be invariant with respect to specific data.

PM World Today is a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net

Page 2

PM World Today June 2010 (Vol XII, Issue VI)

As an example we can point to Ohm's law, the essence of which is independent of specific data. The same can be said about the other laws of a fundamental nature. Namely such approaches and theories must be developed in the field of project management. For example, if we study the functional relationship between the project total effort and its duration, each new dataset could lead to a new and uncertain result. Of course, for 15-20 years one can collect data on, say, 10,000 projects and be confident that 7-8 projects over the past month cannot change the statistical trend, derived from data on 10,000 projects. But that does not mean that these stable results are correct and that this approach to data processing is legitimate. In reality it is simply a self-deception, regardless of whether it is done consciously, or unconsciously. Assume we deal with the functional relationship between project effort and its duration. Only the fact that the project data were collected over a long period of time makes the joint processing of the whole data meaningless, because of change in productivity during the long time of data collection due to new methodologies and tools. On the other hand, if we try to use for analysis purposes only the most recent projects, we will inevitably face the problem of non-applicability of statistical methods to small data. The persistent application of statistical methods in this case of small project data already wears cartoon character and can only be justified by considerations of business. Obviously, such a statistical approach to interpretation of small project data has nothing to do with the scientific method.

Project data mining: State of the art


Lets for the analysis of contemporary methods of project data use a database consisting of 56 projects. The database contains information about the complexity of projects W , their total effort E , the duration of projectsT , average team size N av and productivity of teams P . Multi-parametric flat representation of these data with the aid of TRANSCALE tool [1] is shown in Fig.1. Lets using the sequence of coordinate axes denote this representation of data as [ N av ,T , E , P ]. There are numerous other multi-parametric plane representations of these data. TRANSCALE tool enables smooth transitions between these representations. According to contemporary methods of project data mining, these data can be processed by statistical methods [2]. As a result of this empirical analysis the functional relationships between the parameters of projects can be obtained (Fig.2.1 - Fig. 2.8).

PM World Today is a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net

Page 3

PM World Today June 2010 (Vol XII, Issue VI)

Despite the fact that from a qualitative point of view, all the obtained curves have correct behavior, that is the logic of increases and decreases of project parameters are not violated, but this is not enough to ensure the adequacy and practicality of functional relationships obtained in this way. N av

E Fig.1 [ N av ,T , E , P ] presentation of project data in the flat multi-dimensional project space In addition, for one of the curves even qualitative adequacy is not ensured. It is a functional relationship between team productivity P and the team average size N av that falls too fast. Other curves also contain qualitative discrepancies. Just these discrepancies cannot be detected with the naked eye. An overall analysis of statistical methods for processing project data shows that their accuracy is very low. This can be easily seen by applying the obtained empirical relationships for the individual assessments of projects. This circumstance indicates that the statistical methodology is a deadlock for the area of quantitative project management. A more detailed analysis shows that the statistical approach to the problems of data mining and project estimation have two main drawbacks. Lets analyze these shortcomings using statistical curves, presented in Fig.2.1 - Fig. 2.8. 1. These curves contain qualitative discrepancies, which simply means, that the trends presented in Fig.2 does not reflect the genuine behavior of functional relationships between project parameters.

PM World Today is a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net

Page 4

PM World Today June 2010 (Vol XII, Issue VI)

Fig.2.1

Fig.2.2

Fig.2.3

Fig.2.4

Fig.2.5

Fig.2.6

Fig.2.7

Fig.2.8

PM World Today is a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net

Page 5

PM World Today June 2010 (Vol XII, Issue VI)

Fig.2 Results of the statistical treatment of project data 2. Even if these curves would not contain qualitative discrepancies, still they could not provide high accuracy of the estimates, because the result of the statistical processing of the entire system of data is a single fitting curve. According to the most elementary considerations, based on the method of least squares, a single curve is not able to provide a relevant accuracy for data mining and project estimation in principle. This problem can be solved only by replacing the data systems with the families of curves, rather than a single curve. Such a family of curves can be constructed based on different principles. The most basic and obvious of these principles is the construction of approximating curves using the state equation of projects with different conditions of constancy of the values of project parameters.

Representation of project data in the form of a family of curves


For precise experimental investigation of phenomena people typically proceed as follows. If the phenomenon is described by the large number of parameters, two of them, the functional relationship between which is investigated, remain free, and the other parameters are kept constant. Then, changing the values of one of the free parameters, the values of the other free parameter are measured. Then the same procedure is repeated for other constant values of other parameters. This approach permits the direct application of regression analysis for data analysis. But such an approach is possible only when there is a chance to control the parameters of the object under study. Unfortunately experimentation in such a classic manner in the area of project management is simply impossible, because it is connected with the huge organizational and financial difficulties.

PM World Today is a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net

Page 6

PM World Today June 2010 (Vol XII, Issue VI)

Fig.3 Presentation of the functional relationship between project effort E and its complexity W in the form of the family of curves

Fig.4 Presentation of the functional relationship between team productivity P and the average team size N av in the form of the family of curves For a more detailed discussion of the problem we turn to the state equation of projects [3].

PM World Today is a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net

Page 7

PM World Today June 2010 (Vol XII, Issue VI)

Fig.5. Presentation of the functional relationship between team productivity P and project effort E in the form of the family of curves In the field of project management, where there is no experiment in the classical sense of this word and the project data is a result of a random collection, there are other ways to overcome such difficulties. In particular, the project data can be divided into groups, using the condition of the relative constancy of one of project parameters. At the systemic level, the state equation of projects combines the parameters of the project and development team [3]. N av * T * P = W , and E*P =W . (1) (2)

For the dividing of project data into groups, we can order that data by increasing values of team productivity, and divide this sequence of projects into groups. As a result we can have groups of projects with relatively constant values of productivity. Fig.3 represents the functional relationship between project effort and its complexity for the four groups of projects with relatively constant values of productivity. This allows us to replace functional relationship shown in Fig. 2.2 in the form of a single approximating curve, with the family of straight lines (Fig. 3), which have higher accuracy of approximation. Similarly, Fig.4 presents the functional relationship between team productivity and average team W size for relativly constant values of the ratio , which is the throughput. T

PM World Today is a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net

Page 8

PM World Today June 2010 (Vol XII, Issue VI)

Comparing the accuracies of a family of curves, shown in this figure, with the accuracy of approximation, shown in Fig.2.7 it is easy to see that the accuracy in the case of a family of W curves is higher. Decreasing the interval of the relative constancy of the ratio one can T achieve greater accuracy of approximation. Fig.5 represents another example, which shows the functional relationship between team productivity and project effort as a family of curves that is consistent with the zones of constant values of project complexity.

Project data mining and project estimation have a common methodological basis
From the methodological point of view project data mining and project estimation are closely linked, because they have a common conceptual framework. Therefore, lets consider the conceptual framework and common sources of information, on which are based both project data mining and estimation of projects. At the system level, the project can be represented by the following three main components. 1. The model of accumulation of the work performed during the execution of the project or just a model of projects, 2. The objectives of the project (development cost, project duration, risk and other program level or corporate level goals and objectives) 3. Restrictions imposed on the project. At a structural level, the presentation of the project with three components shown in Fig.6. Such a presentation of the project can be used for different purposes. In particular, it applies both to project data mining, and for the planning and execution of projects. Only in such diverse applications inputs and outputs for them differ from each other and have different meanings. In the case of the planning of projects having as inputs project complexity W and team productivity P , it is necessary to find out the total effort E required for the project and the distribution of that effort over time, including the definition of the planned project duration T and the required number of people N av .

PM World Today is a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net

Page 9

PM World Today June 2010 (Vol XII, Issue VI)

Inputs

Model of the process of work

Project goals and objectives

Project constraints and restrictions

Project model

Outputs Fig. 6 Quantitative presentation of projects with three components In the case of project data mining all parameters of individual projects are known and it is necessary to solve the problems associated with the classification of projects, and find out the functional dependencies between project parameters.

Information, needed for the reconstruction of projects


For the sake of simplicity, lets first determine which input information is needed for the reconstruction of the average behavior of a project. If in order to achieve such a goal to use as input information: 1. Project complexity W and 2. Team productivity P , with the hope that these data are of sufficient reliability, then, on this basis can be estimated the amount of total project effort only. But for the planning or synthesis of a project we need not only the total amount of effort. In addition, we must have the distribution of this effort over time, which means that we must have the number of working people as a function of time. If the finding of this function is associated with difficulties, we must know at least the average team size N av . But the solution of the problem of finding of the effort distribution over time, having information about the complexity W and productivity P only, is impossible in principle. This means that the solution to this problem requires additional input. To clarify the essence of this additional information, consider the possible different implementations of the same project (Fig. 7).

PM World Today is a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net

Page 10

PM World Today June 2010 (Vol XII, Issue VI)

Fig.7 Analysis of input information needed for the reconstruction of a project This picture presents a screenshot of TRANSCALE tool that contains two implementations of the same project. Each of these implementations characterizes with its four [ N av ,T , E , P ] coordinates. The first implementation of the project is presented with the magnitudes [ W , N av1 , T1 , E1 , P ] and the second implementation - with the magnitudes 1 [ W , N av 2 , T2 , E2 , P2 ]. As we can see, both implementations have the same complexity W but the other parameters differ from each other respectively on the values N av , T , E , P . Besides, the project may have a large number of other implementations, which are located along the curves shown in the figure, including the two discussed implementations. Using the methodology developed in [1], lets clarify the main reasons for the change in values of the parameters N av , T and E . This analysis will allow us to find out additional inputs, which, combined with the complexity W and productivity P will help us to improve the quality and accuracy of data processing. This,

PM World Today is a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net

Page 11

PM World Today June 2010 (Vol XII, Issue VI)

in turn, will help us to improve the quality and accuracy of estimates and predictions of project parameters. 1. 2. Thus, as the complexity W of project remains constant, then, of course, it cannot be the cause of changes in project parameters N av , T and E . Similarly, all changes of project parameters have little to do with the team productivity P as well. In particular, the analysis of the functional relationship between team productivity P and average team size N av indicates that the productivity is a slowly falling function of the number of people. In addition, for large values of the average team size change in productivity is so small that as a first approximation, team productivity can be considered as a constant. This means that only a small part of the change N av is related to the value of productivity and mainly that change is defined by the value of change of the duration of project T . If the value of team productivity P is almost constant for the larger values of N av then it means that in this case the total project effort E also will have a constant value. In turn, this means that the distribution of project efforts over time is associated only with the values of N av ,T , N av , T and almost has nothing to do with the values of project complexity W and team productivity P . From here it can be made the main conclusion, which states that it is fundamentally impossible to obtain the effort distribution over time having as inputs project complexity W and team productivity P only. This means that any project estimation system designed for the definition of project duration and team size, along with input information on the project complexity and team productivity must have at least one more input. Otherwise, estimates of the project duration and average team size will be an arena of arbitrary decisions (by the way, is what is happening now).

3. 4.

5. 6.

Project objectives and effort distribution over time


Analysis shows that to find out the distribution of project effort over time first of all we need to have information about the goals and objectives of project and their relative importance in achieving maximum benefits at the level of the whole enterprise. Find out the effort distribution over time, it means to define the duration of the project and, as a minimum, the average size of the development team. In turn, these values have a direct link with objectives of the project because each project within their feasibility range can be performed within a short time with large number of people and for a long time with a small number of people.

PM World Today is a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net

Page 12

PM World Today June 2010 (Vol XII, Issue VI)

On the other hand it is also known that the development cost of projects increases with the reduction of project duration. This means that the solution of the problem of effort distribution over time is closely related to the trade-off between the duration and cost of the project. In its turn this means that the solution of this problem can be reduced to the analysis of the priorities of the project objective functions and quantitative representation of these priorities. A detailed discussion of this problem can be found in [4], where projects are classified in terms of project objectives. This classification as the criterion for the similarity of projects uses the T ratio of project duration over average team size R = . N av From the standpoint of project data mining the above analysis means that it is necessary with the aid of this criterion to divide the database into the groups of similar projects after which the regression techniques can be applied to separate groups of data. In terms of project estimation this means that for a complete presentation of the essence of the project, along with the complexity of the project W and productivity P it is necessary to have quantitative information about the project objectives and their priorities.

Missing input information in the modern systems of project data mining and project estimation
The main result of the above analysis is that in modern systems of project data mining and project estimation there is a lack of information on the objectives of projects and their priorities. This circumstance makes it impossible to obtain accurate functional dependencies between the parameters of the project by statistical data processing. Further utilization of these inaccurate functional relationships for the assessment and prediction of new projects results large errors in the estimation of parameters of new projects, and ultimately to the failure of projects. The need to integrate the goals of projects and their priorities in the process of data mining is explained by the fact that each specific value of project parameters reflects the entire design process, including the direct impact of goals and priorities on these parameters. Accordingly, the processing of data must take into account the same considerations. In particular, processing of project data must take into account the considerable impacts that have project objectives and their priorities on the project duration. The same applies to the estimation of projects in the process of their planning. Utilizing in planning systems the input information on the project complexity and team productivity only is not enough for the comprehensive assessment of a project. In order to estimate the duration of projects and the average team size there is a need for the input information on the project objectives and their priorities.

PM World Today is a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net

Page 13

PM World Today June 2010 (Vol XII, Issue VI)

Conclusions
1. The accuracy of statistical methods of contemporary quantitative project management is unacceptably low and therefore these methods are completely unsuitable to meet the daily needs of industry. 2. Statistical methodology of project data mining and analysis has a number of fundamental weaknesses (For small data, it is unsuitable, since in this case, this approach can generate very random results, and, moreover, the results of this treatment are highly dependent on the specific data. This methodology is not suitable also for the large databases, due to the difficulties of processing of the collected data related to their incompatibility with each other). 3. The Achilles heel of statistical methods of project data mining is the strong instability of the results of such treatment and their dependence on specific data. 4. Even if as a result of statistical treatment of a large project database are obtained stable results, they also may be unsuitable for practical applications, since the stable result doesnt mean correct result. 5. Very often the stable results of statistical project data mining are not able to reflect the reality in an adequate way; moreover, they simply might be wrong. 6. One of the main reasons of inaccuracy of statistical methods of project data mining is that the replacement of the entire system of data by a single approximating curve. 7. In order to increase the accuracy of statistical methods of project data mining it is necessary to cover the systems of data points not with a single curve but with the families of curves. 8. For that purpose the system of data points must be divided into groups by applying advanced methods of project similarity analysis. 9. These methods of project similarity analysis should be based on top-down analysis of the project objectives and their priorities. 10. The main shortcoming of modern methods of project data mining and project estimation is that these methods do not take into account for the project objectives and their priorities.

PM World Today is a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net

Page 14

PM World Today June 2010 (Vol XII, Issue VI)

11. Methods of project data mining and project estimation should have the same methodological framework, based on the accounting of the objectives of projects and their priorities.

References 1. 2. Pavel Barseghyan (2010) Project Nonlinear Scaling and Transformation Methodology and TRANSCALE Tool. PM World Today May 2010 (Vol XII, Issue V). 16 pages. S. Oligny, P. Bourque, A. Abrain, B. Fournier. Exploring the Relation Between Effort and Duration in Software Engineering Projects. http://www.lrgl.uqam.ca/publications/pdf/536.pdf Pavel Barseghyan. (2009). Principles of Top-Down Quantitative Analysis of Projects. Part 1: State Equation of Projects and Project Change Analysis. PM World Today May 2009 (Vol XI, Issue V) http://www.pmworldtoday.net/featured_papers/2009/may/Principlesof Top-Down-Quantitative-Analysis-of-Projects.html Pavel Barseghyan (2009) Problems of the Mathematical Theory of Human Work (Principles of mathematical modeling in project management). PM World Today August 2009 (Vol XI, Issue VIII).

3.

4.

PM World Today is a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net

Page 15

PM World Today June 2010 (Vol XII, Issue VI)

About the Author

Pavel Barseghyan, PhD


Author

Dr. Pavel Barseghyan is a consultant in the field of quantitative project management, project data mining and organizational science. He is the founder of Systemic PM, LLC, a project management company. Has over 40 years experience in academia, the electronics industry, the EDA industry and Project Management Research and tools development. During the period of 1999-2010 he was the Vice President of Research for Numetrics Management Systems. Prior to joining Numetrics, Dr. Barseghyan worked as an R&D manager at Infinite Technology Corp. in Texas. He was also a founder and the president of an EDA start-up company, DAN Technologies, Ltd. that focused on high-level chip design planning and RTL structural floor planning technologies. Before joining ITC, Dr. Barseghyan was head of the Electronic Design and CAD department at the State Engineering University of Armenia, focusing on development of the Theory of Massively Interconnected Systems and its applications to electronic design. During the period of 1975-1990, he was also a member of the University Educational Policy Commission for Electronic Design and CAD Direction in the Higher Education Ministry of the former USSR. Earlier in his career he was a senior researcher in Yerevan Research and Development Institute of Mathematical Machines (Armenia). He is an author of nine monographs and textbooks and more than 100 scientific articles in the area of quantitative project management, mathematical theory of human work, electronic design and EDA methodologies, and tools development. More than 10 Ph.D. degrees have been awarded under his supervision. Dr. Barseghyan holds an MS in Electrical Engineering (1967) and Ph.D. (1972) and Doctor of Technical Sciences (1990) in Computer Engineering from Yerevan Polytechnic Institute (Armenia). Pavel can be contacted at pavel@systemicpm.com.

PM World Today is a free monthly eJournal - Subscriptions available at http://www.pmworldtoday.net

Page 16

Das könnte Ihnen auch gefallen