Sie sind auf Seite 1von 43

Master Project

January 14, 2014

Information Fragments

Andr Meyer Philipp Ntzi Florian Stucki


supervised by

Prof. Dr. Thomas Fritz

software evolution & architecture lab

Master Project

Information Fragments

Andr Meyer Philipp Ntzi Florian Stucki

software evolution & architecture lab

Master Project Authors: Andr Meyer Philipp Ntzi Florian Stucki Project period: 07.07.13 - 07.01.14

Software Evolution & Architecture Lab Department of Informatics, University of Zurich

Acknowledgements
We would like to thank Prof. Thomas Fritz for his valuable feedback and supervision of our project. Additional thanks go to Jetbrains and Microsoft, for providing us with free licenses of the Resharper tool and Windows Azure Online Services.

Abstract
In a typical workday, developers have to answer several questions, such as Who is working on what? or Which is the most popular class?. Todays tool support is limited, as we only found tools, where the usage was tedious and time consuming, where the user has to learn a new query language or where the license costs are very high. Besides this issue, there are enormous amounts of information a developer has to manage. A solution to increase the efciency of answering everyday questions is needed to support the developer in keeping track with the growing complexity of the information. Fritz and Murphy [FM10] developed a concept, information fragments, which compares and merges different data sets from different repositories using an id and text matching algorithm between the connections of these repositories. These nodes and edges are aggregated to a graph, the composed fragment, and presented to the user. We base our work on this approach and developed an extensible web application prototype that lets the user easily manipulate and lter the composed data by using an easy to understand abstraction of the model. Additionally, the data is represented using ve different visualizations, each meaningful for different situations. The usefulness of the approach and its implementation was evaluated, using four usage scenarios. Finally, interesting directions for future work have been presented and discussed.

Contents
1 2 Introduction Approach 2.1 Original Information Fragments Concept 2.2 Objectives . . . . . . . . . . . . . . . . . . 2.3 Scenario . . . . . . . . . . . . . . . . . . . 2.4 Model . . . . . . . . . . . . . . . . . . . . . 2.4.1 Extensibility . . . . . . . . . . . . . 2.4.2 Abstraction . . . . . . . . . . . . . 2.4.3 Visualizations . . . . . . . . . . . . Implementation 3.1 Project Setup . . . . . . . . . 3.2 Architecture . . . . . . . . . 3.3 Prototype . . . . . . . . . . . 3.3.1 Add Base Fragments 3.3.2 Model Builder . . . . 3.3.3 Smart Filter . . . . . 3.3.4 Node Details . . . . 3.3.5 Visualizations . . . . Related Work Discussion 5.1 Evaluation . . . . . . . . . . . . 5.2 Key Assets . . . . . . . . . . . . 5.3 Improvements in the Prototype 5.4 Future Work of the Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 3 3 3 4 4 5 5 6 9 9 9 11 11 12 12 13 13 15 17 17 20 21 22 25 25 26 29 31 31 31

4 5

Conclusion 6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

A Extensibility B Used Libraries, Frameworks, Tools B.1 Web Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.2 Programming Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

vi

Contents 33

C Contents of the CD-ROM

Contents

vii

List of Figures
2.1 2.2 2.3 2.4 3.1 3.2 3.3 3.4 3.5 5.1 5.2 5.3 Snippet of the tool. It shows the visualization of a tree view. . . . . . . . . . . . . . The hyper-tree shows a tree of base fragments, including persons, source code, change sets and work items. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tree map visualizing the composition between persons, change sets and source code for a quick comparison. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bar chart summarizing the composition between persons, change sets and source code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . High-level architecture of the prototype. Points marked with an asterisk (*) are hints for a possible future implementation. Test projects are not visualized. . . . . Information Fragments prototype overview. . . . . . . . . . . . . . . . . . . . . . . . Snapshot of the Add Base Fragments module in the GUI. . . . . . . . . . . . . . . . Snapshot of the model builder with three added base fragments. . . . . . . . . . . . Snapshot of the smart lter dialog currently showing lter group persons. . . . . . Model Builder and Tree Map of the question: Who is working on what? . . . . . . Model Builder and Bar Chart of the question: What is the most popular class? . . . Model Builder and Hyper Tree of the question: Given a feature nding out who the most relevant engineers are in order to contact them? . . . . . . . . . . . . . . . . . 5 6 7 8 10 11 12 13 13 18 19 20

viii

Contents

Chapter 1

Introduction
While developing a software system, the different stakeholders of the system, such as developers, testers or managers, contribute a lot of information that forms the system. As each stakeholder works on the system, they continuously consult various fragments of information to answer their questions about the system and the process of producing it. Some of these questions are easier to answer, have a well-dened meaning, and require only one kind of information from a single repository [FM10]. An example could be the question Who did the last commit on this le? It is obvious to the stakeholder to look at the last entry in the version control tool. Other questions are harder to answer and require the stakeholder to integrate multiple kinds of information from various repositories [FM10]. The answer of these harder questions may be interpreted in many different ways. An example question could be What have my coworkers been doing? Answering this question requires the stakeholder to piece together various fragments from multiple sources of information, creating a cognitive burden on the stakeholder. Answering many different questions is a typical scenario, as previously described in [FM10, KAP+ 07, BZ10]. Fritz and Murphy identied in [FM10], that existing approaches provide little support for answering these questions. This issue is also discussed in our related work section in Chapter 4. They interviewed eleven professional software developers and identied 78 questions that developers frequently need to answer and require multiple kinds of information to answer. To support developers and other stakeholders to answer their questions, they discuss different approaches. For example, a query language is powerful, but requires the user to learn its language and explicitly express how the information should be integrated. This is the reason why they introduce a new model that automates the integration of the different kinds of information using the structure of the information. The advantage of this so-called composition of information is that the user only has to specify which information is integrated, rather than knowing how to integrate it. Because the model separates the composition from the visualization of the information, it allows the user to express a wide variety of questions and tailor the composed information to his personal question. The value of their approach and its tool implementation was then shown in the evaluation. We want to base our work on this successful model and build a tool to answer the hard questions, but also the simple ones. We focus our work on making the approach easily extensible, providing the user with a good abstraction by hiding the complexity and offering meaningful visualizations of the aggregated information, as described in Chapter 2. The approach is implemented as a web page to offer easy access with a minimum set-up time from everywhere, see Chapter 3. In Chapter 5, we evaluate our approach by presenting different use cases, explaining what the key assets of this extended approach compared to the original one are and what we could imagine to do as future work.

Chapter 2

Approach
The last chapter described the need of an easily deployable approach to support a developer to answer his questions with data spread over multiple different repositories. In this chapter, we briey explain the information fragments concept (as introduced in [FM10]), dene high-level objectives of our approach and an example usage scenario. Subsequently, we present the main three pillars of our approach.

2.1

Original Information Fragments Concept

In their paper [FM10], Fritz and Murphy dene an information fragment as a subset of development information for the system of interest. As Fritz and Murphy say the data or this subset of development information is modeled as a graph. Each graph consists of a set of labeled nodes, a set of directed labeled edges and a mapping function, which maps each node to a set of information fragments. They name the combination of different information fragments as the composed information fragment. Each node in the composed information fragment is assigned to a so-called base fragment. Fritz and Murphy describe the base fragment as an information fragment of only one domain, such as source code, work items, test cases, etc. Each node is uniquely identiable through an id and has some properties that hold the detail information of that node. The edges represent the relationships between these nodes. Fritz and Murphy use the id matching composition operator to build these edges by only creating an edge when the identier property of a node exactly matching the value of another nodes property value. The resulting composed information fragment is used to present the information in their Eclipse plug-in.

2.2

Objectives

The goal of our approach is to develop a model and a exible representation for integrating and visualizing information of software development projects. First, the implementation of the approach has to be very intuitive and easy to use. It should only require minimal or no effort to explain the tool to the user. The tool should be very easily deployable and require almost no set-up time. In the best case, this would allow for an immediate usage as soon as the user logs into the tool for the rst time and species some basic congurations. To answer frequently occurring questions, such as What has my team been working on?, the approach should support the integration of information on many different repositories. A broad range of initial repositories shall be offered to support the immediate usage. Additionally, the approach should also offer an interface for easy extensibility by adding additional repositories. Since there are nuances in the

Chapter 2. Approach

way different stakeholders want to answer their questions, the approach shall allow for a exible adaption of the presentation of the underlying fragments. To achieve this, the stakeholder has to be able to run queries on top of the aggregated data. The process of querying the data shall be abstracted to be very easy to use but remain powerful. During this process, new questions on the same data might arise or the question is being narrowed down further in a stepwise renement. The approach also shall support this stepwise renement or adaption of the visualization as well as of the underlying information fragments. The responses of these queries shall nally be visualized in a meaningful way. At all times, the user shall be able to modify his queries, change the used repositories, select the visualizations that make the most sense and get additional information on the different data sets.

2.3

Scenario

Imagine a developer had a three week holiday break. After these restorative holidays, the developer needs to get updated about the project and work status. To achieve this, he wants an answer to the question: What have my co-workers been working on? One way to solve this problem is by questioning other team members what they have done and gather the appropriate information. Apparently, there is a problem: Nobody is able to tell him exactly what they were working on the last three weeks. Therefore, he gets incomplete and insufcient information. A second way to get updated would be, that the developer reads the logs and histories of different repositories. Scanning through lists of change sets and work items will lead him to an informed worker. However, this process is very time consuming and a rather exhaustive and demotivating way to get the needed information. The developer could overcome these problems, by using our software tool implemented during this project. He navigates to the website in his web browser and lands on a site where he can select different fragments, congure them and use the composed fragment to analyze and visualize them to answer his question. Consequently, the developer now just wants to nd out who touched the code he is responsible for. First, he adds the fragment persons to the composed fragment. He can now see all team members. In a second step, he adds the work items of the project. As a result, the developer can see which persons are related to which work item and what the relation is. As a third fragment, the user adds the change sets. Here, he sets the lter to the last three weeks. In a forth step he adds all the source code les to the composed fragment. The visualization now shows all the source code les which were changed in the last three weeks and by whom. Finally, he again adds the persons fragment and sets the lter to himself to see only the les he owns. By choosing the tree view visualization the developer would retrieve the result, displayed in Figure 2.1. He gets a decent overview about who was working on source code les where he is the owner. Using the detailed change set and work item information, he can gather what they were working for. Conclusively the developer is aware of the past activities very fast and can return to productive work within time.

2.4

Model

This subchapter presents our approach, which is an extension of the model developed and described by Fritz and Murphy in [FM10]. The three most important additions to the original approach are highlighted in the following subchapters using examples where it helps for the understanding.

2.4 Model

Figure 2.1: Snippet of the tool. It shows the visualization of a tree view.

2.4.1

Extensibility

As previously mentioned in the objectives section (see Chapter 2.2), it is success relevant to develop an approach that is extensible. It is impossible to develop a tool that connects to every possible repository, fetching its data, preparing its representation and providing full functionality, without specifying how the connection looks like. With a predened interface for the connection, the tool administrator could easily add new repositories to the tool, without the need to know all the implementation details of the tool itself. The administrator could just specify some settings, the default behavior of the repository and the data to fetch from a web service. Theoretically, every web service that returns responses in a standardized data format, such as XML or JSON, could then be added to the tool. Appendix A describes in detail, how the tool could be extended with a new repository. The administrator might even have the possibility to specify what user gets what repositories to run his queries and analyses on. The advantage of this extensibility is that almost any data set can be compared and visualized with any other data set as long as there is a connection between the data sets.

2.4.2

Abstraction

By adding new fragments to the composed fragment, the complexity raises. Each fragment has relations to other fragments. For example, a person can change and create work items, can own source code and initiate change sets. Work items normally are related with different change sets. Change sets affects various source code les. As a simple question: Assuming I add persons, then change sets and after that source code les; which relations will be drawn and which relations exist? These questions show how hard it is to imagine and understand the relations between the different fragments. An implementation would need a component, where the user can understand how the visualization is related to the underlying composed fragment. This abstraction makes it possible to hide the complexity and simplify the understanding. However, the abstraction is not only for representation purposes. Additionally, it supports the user in changing the model. In contrast to other tools, which are described in the related work (see Chapter 4), the user would not have to learn a new query language or formulate complex queries to nd the desired results. An implementation of our approach could offer a lightweight, intuitive and fast way to change and query about the composed fragments. It should be easy to use and the user should be able to see immediately what he changed.

Chapter 2. Approach

This is why an implementation of the abstraction is one of the key assets of this approach. It allows a user to adjust and congure the model quickly and intuitively without understanding the underlying processes or learning a new query language. Moreover, it helps the user to understand what is shown in the visualizations.

2.4.3

Visualizations

As the model builder supports the abstraction of the data to give an overview to the user, the visualizations present the data in a valuable way to the user. There are many possibilities to visualize data to a user such as a table, a graph, a pie chart, etc. In addition, each developer has its own habit of analyzing and working with data. Additionally, there are different situations where one visualization might make more sense than another. To face these facts, a possible implementation of the approach should support multiple visualizations, each of them with their own benets for different questions to answer. Some of these possible visualizations are described in the following paragraphs.

Figure 2.2: The hyper-tree shows a tree of base fragments, including persons, source code, change sets and work items.

The tree view is considered as a very simple but powerful visualization as it shows the calculated tree in a simple directory structure where the user has the possibility to expand and collapse a nodes children. On each node, the tree view reveals detailed information about its content and the relationship to its parent node to the user. An example of a tree view can be found in Figure 2.1. The hyper-tree is a graphical visualization of the composed data that arranges the different nodes in circles around a specied root node. This graph is not usable in each situation, but can be valuable in situations where the user wants to compare children of nearby lying nodes. As an example and as visualized in Figure 2.2, a user wants to know all work items associated with

2.4 Model change sets that changed code in les he is the owner of.

Similar to the hyper tree, the graph represents the ltered composed fragment as the current state of the underlying model is. For example, this visualization can be benecial in a situation where the user wants to see the whole underlying model, specically the connections between all the items. The graph arranges them in a nice way and can provide the developer with a useful overview. This makes naturally only sense if the data basis is not too big. The tree map visualizes data similar to a class blueprint or city map.This visualization is very helpful to get a quick overview of the number of children of sibling nodes. For each node, the tree map draws a rectangle and equally splits the space to each node. How a tree map could look like for a concrete example, is visualized in Figure 4 . In this example, the user wants to quickly identify the user who has changed the most les in the last two weeks. Finally, the bar chart visualization could be used to compare related data. The bar chart makes only sense with at least two different fragments to compare one for the x-axis and one for the y-axis. Our approach includes a third dimension the z-axis where additional information of one of the fragments is visualized inside each bar. This visualization has a similar intention of usage such as the tree map. In the example, a user might want to know the change sets per person which have affected the most les. This situation is shown in gure 2.4.

Figure 2.3: Tree map visualizing the composition between persons, change sets and source code for a quick comparison.

Chapter 2. Approach

Figure 2.4: Bar chart summarizing the composition between persons, change sets and source code.

Chapter 3

Implementation
The previous chapter described how we dened our extended information fragments approach. This chapter presents one possible implementation in form of a web-based tool. The set-up of the development, the tools high-level architecture and the prototype itself are described in the following subchapters.

3.1

Project Setup

At project start, we decided to develop the prototype agile. The reason for this was that we wanted a runnable prototype to present to the customer at every time. According to this decision, the project was set up following the Scrum framework. We dened the default sprint length as four weeks, because the workload was only three days a week. We started each workday with a daily Scrum meeting, where each developer reected his past workday and selected his next backlog items from the Scrum backlog. In the end of each iteration, we held a sprint review meeting, where we presented our prototype to the customer. At that time, the customer could give direct feedback and we could discuss the priorities of their implementation. With this approach, it was possible to develop very close to the customers needs. As our team concurrently worked on the implementation of the prototype and it could have happened that two developers work on the same module, it had to be developed appropriately using a test-driven approach. This implies that each developer rst writes tests for a class or a method he wants to create. Moreover, we decided that the written tests must always succeed before a build can be committed. To support the agile software development, continuous integration and test automation, an online instance of the Microsoft Team Foundation Server (TFS) was used. One of the benets of this is that the server automatically ran all our tests before committing a build into the version control repository. This reduced the risk of having a non-runnable state of the prototype on the TFS. To test the deployment process and to run the prototype in a server environment, the solution was deployed to a Microsoft Azure Server, also hosted on Microsoft servers.

3.2

Architecture

To simplify the understanding of the implementation for the reader, a high-level architecture is presented consecutively. The Asp.Net MVC framework requires the implementation to use the

10

Chapter 3. Implementation

Figure 3.1: High-level architecture of the prototype. Points marked with an asterisk (*) are hints for a possible future implementation. Test projects are not visualized.

Model View Controller (MVC) pattern. Figure 3.1 presents an abstract overview of the architecture and is separated into these three components: As in the well-known MVC (or MVVM, or MVP) pattern, the Model represents the data. A couple of classes are important to be mentioned at this point: The IRepository interface and its various implementations are used to specify the behavior of the different repositories. There are interfaces and their implementations for nodes and edges. ComposedFragment is the place where all the nodes and edges are saved. ParameterTypes are important to compare the value of different parameters of a node when running the join algorithm. DataTransferObjects are used to store the necessary data needed in the different visualizations or partial views. The Controller handles the communication between the model and the view. The ComponentsController is the applications heart. It handles all AJAX calls from the view, and returns the appropriate responses. The JoinOperator and its matchers run the previously described join algorithm and save the results in the ComposedFragments object. The extensibility is dened via the adapters. The IAdapter interface denes the information needed to specify a new additional repository. This could for example be a request to TFS, active directory (AD) or any other web request that returns its data in the XML or JSON format. More information describing the extension mechanism is described in the appendix A . Finally, the controller also handles errors and messages between the different components and performs appropriate actions. View: The tool is built to support only one single page. Different modules, such as the model builder, logging or ltering section are implemented as partial views, which are only updated if needed. Each visualization is implemented in its own partial view to simplify adding new visualizations. Finally, FilterSettings is a pop-up, which is used to run queries (i.e. lters) on the data.

3.3 Prototype

11

Figure 3.2: Information Fragments prototype overview.

3.3

Prototype

The concrete implementation of the prototype is presented by explaining the different modules in the user interface and by clarifying some technical aspects where necessary. As previously mentioned, the prototype is developed as a web application. This makes the application platform independent and requires a minimum set-up and conguration time. The prototype is implemented on one single page, with multiple modules, see Figure 3.2. The modules adapt their size based on the size of the contents and the browsers window. The left column offers the user the possibility to create a new base fragment (see Subchapter 3.3.1) to congure the connection between the other base fragments in the model (see Subchapter 3.3.2) and to query them (see Subchapter 3.3.3). The right column visualizes the aggregated data in a tabbed view (see Subchapter 3.3.5) and displays some additional information once a user clicks on a node (see Subchapter 3.3.4).

3.3.1

Add Base Fragments

This module provides basic functionality to add base fragments where each base fragment corresponds to one of the integrated repositories. In the current implementation, all repositories fetch their data from the TFS. These are persons, work items, builds, change sets, source code les and test cases. As previously described in the approach (see Subchapter2.4.1) and in more detail in the appendix (see Appendix A), the prototype can easily be extended with new repositories. When a user clicks on one of these buttons, a pop-up is opened to lter the repository. The lazy loading

12

Chapter 3. Implementation

algorithm automatically fetches the data in the background to reduce the waiting time. Once the data is loaded, the result is immediately reected in the currently selected visualization. Each time the user adds a base fragment, the join algorithm compares the new base fragment to all the added ones to nd corresponding connections between the base fragments properties. These sets of nodes and edges build the composed fragment. This composed fragment is abstracted in the model builder, described in the next section.

Figure 3.3: Snapshot of the Add Base Fragments module in the GUI.

3.3.2

Model Builder

As already mentioned in the abstraction section 2.4.2, the model builder provides an abstraction of the current state of the underlying data model, namely the composed fragment. The user sees already added base fragments and their associated edges. In Figure 3.4, three base fragments are added: persons, work items and builds. Furthermore, their related edges are shown in different colors. Both, the RequestedBy and the RequestedFor, edges are grayed out. This happens, when the user clicks on them. The intention of this action is to hide or show them in the visualizations by setting a visibility ag on the respective edge objects. In addition, the model builder allows the user to change the order of the added base fragments by simply dragging & dropping a base fragment vertically. Moreover, a user can delete a base fragment from the model builder and the underlying composed fragment by right clicking on an added base fragment and selecting the delete option. Summing up, a user can perform the following actions: hiding/showing edges, as well as reordering, removing and setting lters on the base fragments. When the user makes some changes in the model builder, they are immediately shown in the visualizations. This module is extremely important because it strongly supports the understanding for the developer.

3.3.3

Smart Filter

Thee smart lter extends the model builder by allowing the user to lter the information in the different visualizations. This is achieved by logical queries. Using dropdowns and auto-completion in the textbox, the user can intuitively add as many lters he wants on each base fragment. The effect of the lter is immediately displayed as plain text. As an example in Figure 3.5, the visualizations should only show the active persons whose name is not Florian. Additionally, there is a checkbox to activate or deactivate the strict lter. If the strict lter is activated, the visualizations only show nodes which have at least one child node or are leafs. Otherwise, these visualizations would show all nodes which are not affected by a lter, even if they have no child nodes.

3.3 Prototype

13

Figure 3.4: Snapshot of the model builder with three added base fragments.

Figure 3.5: Snapshot of the smart lter dialog currently showing lter group persons.

3.3.4

Node Details

The node details view displays the user additional information about a node he selected in one of the visualizations. In some cases, it also offers a direct connection to the item in the repository and the user can open it in the browser.

3.3.5

Visualizations

The visualization module offers an implementation of the ve visualizations, previously discussed in the approach in Section 2.4.3. A tab view only shows one visualization at a time and lets the user chose the most appropriate one for each situation. On small screens or in big visualizations, the user can click to see the visualization in the full screen. This section explains some implementation specic details that were not discussed in the theoretical approach section, plus how the manipulation of some visualization works. The tree view offers the most information directly inside the visualization itself. It lets the user easily navigate down the tree by expanding or collapsing children of particular nodes. The hypertree is a tree that orders its nodes in a circular fashion around their parent node. A double click on a node centers it and orders its children around itself, using a nice animation. To go back, the user can always click on any other node or reset the hyper-tree with the corresponding button. The graph visualizes the nodes and edges, as they are stored in the composed fragment. A positioning algorithm in the background tries to draw them in the best way by using all the available space.

14

Chapter 3. Implementation

The user can also manually arrange the nodes by himself to get the best possible view of the aggregated data. The tree map can be manipulated by double clicking on a node, which sets it as the root, only displaying its children. A right click goes one level up to its parent. Finally, the bar chart compares the connections between two or more base fragments. It automatically tries to nd the connections that make the most sense and uses summing and counting techniques for the height of the bars. A third axis, we call it z-axis, displays additional information about one of the added base fragments inside each bar. An additional column describes the displayed information with a legend and lets the user select the property that is displayed in the z-axis. While the bar chart only offers additional information if the user hovers with the mouse over an item, the other visualizations also display information if the user clicks on a node of choice.

Chapter 4

Related Work
The approach and one possible implementation we presented in this report is based on the information fragments concept, rst described in [FM10]. In their approach, Fritz and Murphy implemented a prototype as an Eclipse plug-in. This plug-in should help developers to answer the 78 elicited questions from the developer interviews, which might arise during the implementation process. According to them, developers would like a composed information fragment to be visualized in different ways. To do this, they dened a projection function that converts an information fragment into a set of trees, very similar to the transformation in our approach from a composed fragment to a single tree with an anonymous root node. Their plug-in succeeded in helping the 11 developers to answer 94% of the 78 questions in a mean time of 2.3 minutes. According to Fritz and Murphy, some systems have an automatically linking mechanism for different kinds of software development project information. For example, Hipikat, an application that uses a xed schema to mine information from software project repositories or STePIN that does the same, but provides additional detailed information about the programmers themselves. Fritz and Murphy compare Hipikat with their own approach and conclude that in contrast to Hipikat, which recommends artifacts related to a provided starting artifact, their approach concentrates on giving the developer the possibility to compose and visualize different kinds of project information. As discussed in more detail in the next chapter, our approach assimilates this point and extends their approach. In addition to the "automatic" ltering through a conversion from a graph to a tree with strict ltering, our approach provides the possibility to set more detailed lter queries on each added base fragment. De Alwis and Murphy try in [dAM08] to face developers questions with another approach. They developed Ferret to connect similar information from the four spheres static, dynamic, software evolution and Eclipse PDE. They dene a sphere as a view of a source providing information about a software system. In contrary, our approach is not focused on nding similar objects in different spheres or views, as it tries to connect completely different types of objects like work items and source code with the join algorithm. Ferret provides the user with 36 conceptual queries to answer the developers questions. Our approach gives the user the possibility to build the queries by himself. With this feature, the user can search for information more adequate to his question. Begel, Phang and Zimmermann [BZ10] implemented the two applications Hoozizat and Deep Intellisense to address the information needs, developers have from the survey they conducted. Hoozizat is a web-based search portal that helps engineers to nd other people, which are responsible for a particular feature, API, product or service; the engineers currently have to deal with. For a given search term, the application returns a list of related people, work items, code and les from the appropriate repositories. In comparison, our approach does not provide a fea-

16

Chapter 4. Related Work

ture to search for people with a keyword. However, in our approach, the user could do the same by specifying what repositories he wants to use, by combining them and by ltering them for each property. Moreover, the data is visualized in different meaningful ways. We try to connect repository items of different types to answer software development project questions as they are specied in [FM10]. Deep Intellisense is implemented as a Visual Studio add-in to support information needs related to source code: When the user selects any source code element in the Visual Studio editor, a list of events which happened to this source code element is shown, like code changes or work items, related to it and others [BZ10]. Our approach supports similar information in the detail view of a source code node but connects additionally the node with other repositories items and provides so the developer with other information about the source code node. Through the possibility of ltering and visualizing these connected data in different ways the user receives a further benet.

Chapter 5

Discussion
The rst section in this chapter presents several questions that different stakeholders might want to answer using the tool. These insights and a comparison to the implementation of the original approach, help dening the key assets of the tool. Additionally, future work for the prototype and the approach is listed.

5.1

Evaluation

To evaluate the prototype, we selected four questions, which were investigated in the related work and different stakeholders might want to answer using our tool. Each subchapter presents one question with additional comments and assumptions about the most appropriate visualization and possible stakeholders that could ask this question. Who is working on what? As explained in the example scenario (Chapter 2.3), a possible question could be to nd out which developer is working on what code. An answer to this question improves the awareness of current activities and could be useful for developers as well as managers. Based on Fritz and Murphy [FM10], this is one of the 78 questions developers commonly seek an answer. To solve this question, four different fragments are required: team members, work items, change sets and source code les. The objective is to receive a quick overview of the work status of a project. As previously described in the scenario, each modication in the model can deliver different results. It is primarily the duty of the user to dene what exactly he wants to see and analyze. The question mentioned above normally results in many nodes. This is the reason why the tree view and the tree map might be the most appropriate visualizations to answer this question, as they have no problem displaying a lot of data. An example of the tree view is already provided in the scenario in Section 2.3. In both visualizations, the user has the possibility to focus on parts of the graph and hide the rest. In the tree map visualization in Figure 5.1, the user may easily conclude the classes where most of the changes were made. For example, the class ComponentsController was affected by many changes and even more than one person is related with these changes. By simply moving over the different rectangles the user gets more information about the selected item. This is just one simple example where the user can benet of a fast and clear overview of the aggregated data to answer his question. What is the most popular class? Another question that was listed by Fritz and Murphy [FM10] is to nd the most popular class by listing the class with the most changes. Apparently, there is an interest in analyzing which classes have the most changes. The answer to this question can led to additional questions such as why certain classes have so many changes. Is it because

18

Chapter 5. Discussion

Figure 5.1: Model Builder and Tree Map of the question: Who is working on what?

the requirements change so many times? Or is it a bad architecture? What conclusion the user will get from the visualization is up to his interpretation. It can be different, depending on whom it is asked by - a developer, architect or project manager. To answer the question above the following two fragments are needed: source code and change sets. To aggregate some results and compare them to each other, the bar chart might be the perfect visualization to use. In Figure 5.2 above the user sees four different classes and the number of change sets for each source code le. In each bar, the user also sees that there are different grayscale colors. Each one represents a count of the owner of the respective change sets. For example, the FilterHandler class has been affected by 19 change sets. Eight change sets are owned by developer A, eight change sets are owned by developer B and three change sets are owned by developer C.

Given a feature nding out who the most relevant engineers are in order to contact them? Begel et al. found out [BZ10] that the most popular coordination information needed in a department is to nd the responsible developer for a certain feature The developer might have to know whom to contact in case he needs more information about a certain feature. To answer this question, three repositories are needed: work items, change sets and persons. The model builder visualizes the current model in Figure 5.3. To represent a composed fragment with a smaller amount of nodes, the hyper-tree might be considered as a good representation. In this case, each work item might represent a feature. Several change sets are linked to these work items, which are committed by different persons. To

5.2 Key Assets

19

Figure 5.2: Model Builder and Bar Chart of the question: What is the most popular class?

answer the question, the user could nd the two users, which might know the most about the feature, because they committed all the changes related with these work items. How many requirements are covered by test cases? A test manager might want to know which requirements are covered. Considering this problem more deeply, the test manager needs three questions to be answered. First, he wants to know which requirements are covered by test cases. Second, he wants to know if a certain test case did actually run. Last, the result of the test run is of high interest. These questions were investigated by a Swiss company. In the current implementation of our prototype, these questions cannot yet be answered, because the prototype offers no repository for requirements. However, test cases are included in the tool. Nevertheless, the missing repository could easily be added to the prototype to answer this question with the extension possibilities described throughout this report. The bar chart might be the most effective visualization for this problem. Considering these four use cases, we conclude that many questions and problems are covered through the tool. Supporting six repositories and ve visualizations, the user already possesses a lot of opportunities and freedom to quickly nd answers for his question. Keep in mind, that the set of available properties can always be extended. By adjusting the model with clicking and drag & drop actions, the user can easily adjust the model to his preferences. Moreover, he receives a fast feedback because the visualization is updating immediately. Even if in some cases one visualization may be more appropriate, it might also be useful to compare various visualizations with each other by switching tabs, to gather a deeper understanding.

5.2

Key Assets

This report discusses and extends the information fragments approach presented by Fritz and Murphy in [FM10]. To emphasize the most important contributions of the presented approach and one possible implementation, we compare it to the Eclipse plug-in, implemented by the au-

20

Chapter 5. Discussion

Figure 5.3: Model Builder and Hyper Tree of the question: Given a feature nding out who the most relevant engineers are in order to contact them?

thors, and discuss how we were able to add value to the approach. Fritz and Murphys Eclipse plug-in aggregates information that are available inside the Eclipse IDE. In the given implementation, the data sources are limited to four repositories that have Eclipse-specic data sets. Our implementation encourages the extension of the pre-dened data sets, by conguring additional repositories. Every repository that supports access via a web request could be added to our prototype. As long as the different repositories have at least one connection to each other, every possible combination could be used to analyze the given data. Being a web service of its own, the access to our prototype is not limited to just one IDE or operating system. Users could access the prototype on any device with a browser and need limited or no set-up time to use it. They just need to login to the website with their account and perform the desired actions. The prototype by Fritz and Murphy offers a very easy and intuitive way to arrange base fragments by specifying the order and visible connections. The model builder in our prototype offers similar features and retains the intuitive and lightweight manner of the original prototype. Our goal was to come up with a useful abstraction to hide the complexity of the information fragments concept to the user. Creating, modifying and ordering base fragments and the connections, i.e. edges, between them should be very straightforward. The colored edges are helpful as a legend to the visualizations. A double click on a base fragment opens a pop-up where the items could be ltered. Again, the user does not have to learn a new query syntax, but could easily select some drop-down boxes and use auto-completion to generate lters. The tree view, implemented in both prototypes, is a very useful visualization to understand connections between different fragments and get a fast overview of the composed data. As the tree view is not always the best choice to answer a developers questions, our prototype offers other meaningful visualizations to all possible combinations of data sets. As an example, the bar chart visualization tries to automatically nd a good representation of the composed data without any customization. The bar chart and other visualizations are discussed in the report.

5.3 Improvements in the Prototype

21

5.3

Improvements in the Prototype

The previous chapter discusses the key assets and advantages of our approach and prototype tool. Even though the prototype already offers a big set of features and works quite stable, there are some improvements, we would like to make in a future version. The most important ones are presented in this subsection. In the current implementation, the data is automatically fetched for a hardcoded user. To make the prototype useful for other users, a login system is needed. Additionally, an admin might then specify what repositories the user is granted access to. The model builder abstraction and its easy ltering technique are the heart of our prototype. It already works great and is very helpful in most cases, but could even offer more support to the user. For example by showing the most recent base fragments a user specied or providing him with the ability to save his favorite ones with quick access. Currently, the model builder and smart ltering are only loosely coupled. For example, if a user hides a connection between two base fragments in the model builder, the smart ltering does not display this conguration. This coupling could be improved in a future version. Changes on both sides should be reected in the other view, where necessary and possible. The smart ltering already offers a very easy way to query the data without learning a new query language. It also offers some predened lters, dropdowns and auto completion of text boxes. Some of the lters, such as the timespan, a team member or a project, might be ltered most often. For these cases, it might make sense to provide the user with an even easier ltering technique, where he could just select the time on a slider or select a team member or a project. Finally, it might be useful for the user to provide him with a predened set of the most often asked questions. For example, the 78 questions Fritz and Murphy [FM10] discovered, could be offered to the user. Selecting one of these questions would immediately congure the right fragments and select the most appropriate visualization. With all these possible changes, it is still important to keep the easiness and lightweight characteristics of the present implementation sound. As we applied our own tool during the implementation process, we were also able to identify some scalability and performance issues. The current implementation always loads all items of a repository, which might slow down the prototype in a repository with thousands of items. A caching-system or improved lazy-loading system would be necessary to use the tool in a huge software project. Obviously, there are some stability issues and bugs we would like to x. When we played around with the prototype, we sometimes felt the need to go back to a previous state of the conguration of the base fragments, lters, orderings, etc. This could be achieved with a backwardor forward-function. In some cases, the visualizations are quite crowded and it could be useful to provide the user with a zooming-function. Once, the user nished conguring the composed fragment, he might want to export some of the conguration data or visualizations, for example as a PDF le. As the prototype works in every modern browser, the user might want to use it on his touch tablet. This usage could be further optimized. In a future version, we would also like to implement a textual matcher. For example, this could be useful to link teammates or work items inside commit comments. Finally, we could even imagine building a commercial product out of this idea. Assuming, the login mechanism and administration system has been implemented, every interested user could create a new account, select pre-dened repositories or even add his own, congure them and

22

Chapter 5. Discussion

then immediately make use of all the visualizations and features the tool offers. With some optimizations and improvements in terms of speed and scalability, this tool could be commercially used as software as a service.

5.4

Future Work of the Approach

Fritz and Murphy showed in their information fragments approach [FM10], that it could answer a wide variety of questions that developers want to answer on a regular basis. Eighteen professional software developers could successfully use their prototype tool and rated it as very easy and effective. Our approach and implementation builds on these very promising results. We tried to improve the approach by experimenting with the abstraction and creation of the model, providing the user with additional visualizations and developing an easily extensible and accessible web-version of this approach. The details have been discussed throughout this report and emphasized in Section 5.2. It would be very interesting to evaluate this new implementation with professional developers in their real-world projects to nd advantages and disadvantages of the approach and nd out if it can help the developers and managers to answer their questions. We leave this evaluation to future work. Nevertheless, we believe that our approach and prototype can also be used to answer most questions, developers ask. Four possible questions were presented in the evaluation section (see Section 5.1) of this chapter. By using our own prototype, reading related work and discussing new ideas, we found a couple of possible improvements of the approach, which are explained in this chapter. As the overall goal of the approach is to answer a developers questions, we found some further improvements, which could make the analysis of the data even easier. Extending the prototype is success-relevant. By making the extensibility easier and providing more interfaces, the possible use cases and questions that can be answered could be broadened tremendously. We believe that we were able to successfully hide the complexity of the approach and offer a tool that can be used without explaining the model behind it. However, this abstraction could probably be further improved and made even easier and more intuitive. Concerning the visualizations, we could imagine that the tool would automatically suggest the user the most appropriate visualization, based on the fragments that have been composed. It could also be useful to let the user build multiple kinds of visualizations and offer them side-by-side to allow easy comparisons between the data. This could especially be useful for data that changed over time. We could also imagine a new visualization, the timeline. All the data that contains time information (i.e. a timestamp) could be placed on the timeline, and its connections visualized with arrows, bars or colors. To improve the understanding of all visualizations, context-specic information could be highlighted. For example, this could be the currently logged in user, items that were changed today or expired items. Finally, the information and visualizations might be used to identify patterns that could be used to automatically analyze the data. These analyses might help the user to better understand the connections between the repositories, the processes inside the project or even some problems in the current sprint. It could especially be useful to identify automatically the black sheep, a developer who is performing badly or often checks in code that does not build correctly.

Chapter 6

Conclusion
This chapter summarizes and discusses our achievements and future work. In the second section, we reect about our experiences and lessons learned in the project.

6.1

Summary

As seen throughout this report, different stakeholders, such as developers and managers, face many questions during their work. Who is working on what? or What is the most popular class? are examples of questions revealing during a typical workday. Various possibilities and tools already exist to support to answer them. Fritz and Murphy [FM10] developed a new concept, based on information fragments. Comparing and merging different data sets from different repositories should result in an understandable solution for a question or a problem. An id and text matching algorithm is used to draw edges between nodes, which are then aggregated to a graph, the composed fragment. The more fragments that are added to the composed fragment, the higher the complexity gets. Therefore, a software solution is required to enable a developer or manager to manage the information reasonably and intuitively. An understanding of the aggregated information is gathered by visualizations. The approach and one possible implementation possess three major assets that add value. First, the solution is extensible. Adapters for common repositories enable to include different sources of data sets. Additionally, new repositories can easily be integrated by writing own adapters. Second, reducing the complexity is achieved through abstraction. The model builder, one component of the user interface, offers a simplied representation of the underlying model. Moreover, lightweight ltering and ordering actions can be performed in an intuitive way. Default lters and auto-completion additionally provide the user the required assistance to arrange the visualizations. Depending on the use case, the user can analyze the composed data using different visualizations. For example, a tree view might suitable to visualize a big amount of nodes. Who is working on what? often results in a lot of data, where collapsing and expanding information is necessary. What is the most popular class? might be best analyzed using a bar chart. The prototype implements ve different visualizations to provide solutions for different scenarios. Six repositories are integrated into the solution, which can be added to the composed fragment by running the join algorithm. The easy-to-use model builder abstracts and adjusts the model intuitively. The implementation of our approach, in form of a tool prototype, was evaluated. Commonly asked questions developers ask, found in the related work, were answered using this prototype. From four chosen questions, three could easily be answered. The forth questions could not be answered in the current implementation because of a missing repository. However, this repository could easily be added using the extension support of our tool. Nevertheless, the prototype allows

24

Chapter 6. Conclusion

to answer many other questions and provides the adequate visualizations and data sets for almost any combination of information fragments. In a second step, the prototype was compared with the implementation of the original approach by Fritz and Murphy. Our prototype is very exible and platform independent because it is a web-based solution, which requires no installation or set-up. It further retains the intuitive and lightweight approach building a fragment. Another difference is that our prototype lets the user choose from multiple visualizations. To improve our prototype, we identied several points for future work. One improvement worth to mention again is the scalability. If a large amount of data is loaded into the model, the application needs longer response times. Actions, such as caching or a more efcient lazy loading, could improve the responsiveness of the tool. With a couple of improvements and bug xes, the prototype would already be close to being usable and at least evaluable in a real-life software project. We believe that it could help developers answering their questions and raise their productivity.

6.2

Lessons Learned

From an organizational point of view, we denitely learned how important it is to have a knowledge base where all the information and decisions are centrally stored. With this data storage, information would be always available and could lead to a more efcient working. Moreover, pair programming is a very helpful practice. This is extremely useful, if the problem is complex and one person does not feel comfortable enough to solve the problem on his own. We often resolved problems by sitting together and discussing different ways of approaching it. A third point in organization we experienced is that developing after Scrum is not working by itself. The team must be willing to develop and organize according to the principles of agile development. There were phases during our project, where we practiced Scrum more strictly than in others. From a software developing perspective, we clearly remarked the importance of test driven development. When we wrote tests before writing the actual code, it was often of a higher quality and more stable because we had to think precisely about the problem rst. All of us made great progress in our software development skills and we used many new technologies, tools and libraries as you can see in the appendix. Interestingly, we could even learn from our own tool we developed. For example, we could immediately see that one person was committing not often. When this person committed, it was always a large amount of code, which sometimes ended in some merging troubles. After our own visualizations gave us a hint on these issues, this person took more care about how often he was committing. Finally, it is worth to note that we had a lot of fun with this project.

Bibliography
[BZ10] Andrew Begel and Thomas Zimmermann. Codebook : Discovering and Exploiting Relationships in Software Repositories. In Proceedings of the 32Nd ACM/IEEE International Conference on Software Engineering - Volume 1, pages 125134, 2010.

[dAM08] Brian de Alwis and Gail C. Murphy. Answering conceptual queries with Ferret. In Proceedings of the 13th international conference on Software engineering - ICSE 08, pages 2130, New York, New York, USA, 2008. ACM Press. [FM10] Thomas Fritz and Gail C. Murphy. Using information fragments to answer the questions developers ask. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1, volume 1, page 175, New York, New York, USA, 2010. ACM Press.

[KAP+ 07] Andrew J Ko, Forbes Ave, Pittsburgh Pa, Robert Deline, and Gina Venolia. Information Needs in Collocated Software Dev elopment Teams. In Proceedings of the 29th International Conference on Software Engineering, pages 344-353, 2007.

Appendix A

Extensibility
This section describes in detail, how the current implementation of the information fragments tool may be extended and how a new repository can be added. Theoretically, any web request that returns data in a standard format that can be parsed (e.g. JSON, XML) could be used. To run useful queries on joined data, at least one connection between the different properties of a repository is needed. 1. First, a component is needed that runs a web request and fetches the actual data. A wide variety of data that can be fetched is provided via the TfsLink project. If a tool admin wants to add a repository from other sources, such as Rational Team Concert, it is suggested, to create a small new project similar to the TfsLink one. 2. Next, an adapter has to be implemented that connects the data-fetcher with the tool. If the new repository fetches data from Tfs, the new adapter should inherit from the TfsAdapter interface. For XML-data, the XmlAdapter interface should be inherited. The GetDatamethod has to be overridden to receive the fetched data from the component created in step 1 and return a new base fragment with the lled data. The CreateBaseFragment method can be used to parse the received data and create nodes of this repository. This node class still has to be created. 3. Create a new class in Controllers that inherits from repository and species the default lters and some visual options, such as the node color. It is the easiest way to copy an existing repository class and modify the constructor and the GetDefaultFilters method. 4. Next, a new node type for this repository has to be created if an existing one cannot be reused in the folder Models. It needs to inherit from the interface INode. Again, it is suggested to just copy and paste an existing node class and modify the properties and some methods needed for the visualizations, such as GetLabel, GetLongLabel, GetToolTipLabel. 5. In Model, create at least one new parameter type. This tiny class is needed to allow the join algorithm nd connections between the different data sets from the repositories and the lter to show or hide certain connections. Depending on the properties, you may want to add other parameter types. 6. inally, the RepositoryLoader class has to be updated by adding the newly created repository to the LoadAvailableRepositories method. As soon as the project is compiled, the new repository is automatically added and available to all components!

Appendix B

Used Libraries, Frameworks, Tools


This section contains all the libraries, frameworks and tools we used in our developing process.

B.1

Web Frameworks
Version 0.91 2.0.1 2.0.3 1.10.3 2.3 4.7.2 2.3.0 2.6.2 Url http://arborjs.org http://philogb.github.io/jit http://jquery.com http://jqueryui.com http://code.google.com/p/jqueryrotate http://kineticjs.com http://knockoutjs.com http://modernizr.com

Name Arbor Jit jQuery jQuery UI jQueryRotate KineticJS Knockout Modernizr

B.2

Programming Languages
Version 5.0 2.0.30506.0 5 3 1.8.6

Name C Razor HTML CSS JavaScript

Appendix C

Contents of the CD-ROM

Abstract.txt English version of the abstract of this report. InformationFragments.pdf This report. InformationFragments.zip Source code.