Sie sind auf Seite 1von 9

Analysis is another essential feature of all BI solutions.

The difference between reports and analysis is: Reports are typically static (save for parameters) used at the operational level Analysis tends to be a lot more dynamic used at the tactical and strategic level Typical elements in analytical solutions: allow the user to dynamically explore the data in an ad-hoc manner the data is first presented at a highly aggregated level and then the user can drill down to a more detailed level

Pentaho Analysis consists of the BI Server and the following client tools and add-ons: Schema Workbench Design Studio Pentaho Administration Console for Community Edition users, and Pentaho Enterprise Console for Enterprise Edition users. The Mondrian ROLAP engine Aggregation Designer Functionality: The Pentaho User Console Web interface, which enables easy management of reports and analysis views A real-time analysis view interface A complex scheduling subsystem The ability to email a published analysis view to other users The ability to create complex analysis schemas The ability to improve ROLAP cube performance with Aggregation Designer 1. Pentaho Analysis enhances the insight and understanding of the business users by providing an analytical power to them to make optimal business decisions. Pentaho analysis is instrumental in enhancing the Business Users capability of taking optimal business decisions. It enriches their analytic power, insight and understanding. It gives them: The freedom of exploring business information by drilling into and cross tabulating data. The power to addressing complex analytical queries through a speed of response. View information multi-dimensionally by choosing specific metrics and attributes to analyze. Deploy stand-alone or integrated with other products in the Pentaho BI Suite.

The Pentaho Analyzer is noted for providing a quick understanding to business information by its non technical users. Its intuitive, interactive and analytical reporting facilitates swift and clear understanding. The features of the Analyzer which form a part of the enhanced functionality in Pentaho Analysis Enterprise Edition comprise Chart visualizations, an advanced sorting and filtering, customized totals and user defined calculations, a web based drag and drop report creation and much more. Functionality: Drag and drop analytical reporting Sort, fill and drill in data dynamically View chart visualizations Can export data to different format: PDF, Excel or a CSV file Customized totals and user-defined calculations Advanced sorting and filtering Add interactive analytical reports to pentaho dashnoards Save and share reports Pentaho Analyzer is the front-end interface of the Pentaho Analysis product line. Using Pentaho Analyzer, you can query the data in a database without having to understand how the database is structured. Pentaho Analyzer allows you to explore your data dynamically. You can drill down into the data to discover hidden details that may help you make important business decisions. The Analyzer also presents data multi-dimensionally and lets you select what dimensions and measures you want to explore. Pentaho Analyzer is an interactive analysis tool and provides you with a rich drag-anddrop user interface that makes it easy for you to create reports quickly based on your exploration of your data. You can also display Pentaho Analyzer reports in a dashboard. Pentaho Analyzer provides intuitive, interactive analytical reporting permitting nontechnical business users to understand business information quickly. Analyzer features: Web based drag and drop report creation, advanced sorting and filtering, customized totals and user defined calculations, chart visualizations and much more. 2. Pentaho DesignStudio (client tool) The Pentaho Design Studio is a collection of editors and viewers integrated into a single application that provides a graphical environment for building and testing action sequence documents. The Pentaho Design Studio (PDS) is essentially an Eclipse IDE ( integrated development environment) with a Pentaho plugin that is used to create and maintain action sequences that work within the Pentaho platform. Action sequences is a predefined set of actions which can be triggered by a users action, a schedule or any other action sequence. Action sequence can be a s simple as display the report to as complex find all overdue orders and send out a reminder email.

Creating Your First Action Sequence A burst report is a report that is run multiple times for multiple people. It is most useful if there is a parameter that will filter data specifically for each person. The following 4 components will be used: SQLLookupRule Generate the list of regions and managers to loop on UtilityComponent Used to generate the email message, subject and attachment name JFreeReportComponent Create the report EmailComponent Sends the report 1. General Stuff Stuff like the title, description, and icon that will appear for this action sequence when browsing your solution. Additionally you can indicate the logging level you would like to use for this action sequence 2. Define Process. Contains: Inputs, Outputs and Resources When this action sequence runs there are three possible places each input can come from: request are determined by looking at the URL that caused the action sequence to run from the session (usually the http session within which the action sequence is running) runtime context. Thats the place where the outputs for each action sequence are placed and made available to other action sequences. Actions -Process Actions section is a list of all the actions to be performed by this action sequence. Note that the order is important here. Anytime you run a SQL query action make sure you include the column names in the output. That way other actions know what data is available in the rule result. Results are saved in an output called rule-result Design Studio A stand alone application that hosts a set of Pentaho plug-ins used for creating, testing and administering content for the Pentaho Project. It is an Eclipse http://eclipse.org workbench, pre-configured and customized for Pentaho. Currently there is only one plug-in; the Action Sequence Editor. In many cases people use the Design Studio to refer to the this plug-in. Future releases will provide more plug-ins and more capabilities. The goal is to have it be the client UI for as much of the administration and content creation as possible. This is considered to be an Administrator, Developer or Content Creator tool. Not a typical end user tool. It is based on the Eclipse application framework, which is an open source software development project dedicated to providing a robust, full-featured, commercial-quality, industry platform for the development of highly integrated tools. Eclipse provides building blocks and a foundation for constructing and running integrated software-development tools. It allows tool builders to independently develop tools that integrate with other people's tools so seamlessly you can't tell where one tool ends and another starts. Leveraging Eclipse gives us a number of advantages including; an existing well known and well defined framework, the ability to integrate different tools while maintaining a common look and feel, reuse of existing components and a huge savings in development time.

Levels of organization within the Design Studio: Workspaces directory that contains a collection of projects, along with preferences and other settings related to the projects it contains. Projects, Folders and Files Views typically used to navigate a hierarchy of information Editors typically used to edit or browse a resource Perspectives a group of views and editors in the Design Studio window. One or more perspectives can exist in a single Design Studio window

3. Pentaho SchemaWorkbench (client tool) The Schema Workbench is the primary tool for designing, editing, and publishing Pentaho Analysis (Mondrian) OLAP schemas visually. Mondrian: is an open source OLAP server and is one of the main components of the Pentaho Business Intelligence platform. is a connector usually between a java-based OLAP frontend and a relational database executes queries written in the MDX language reads data from a relational database (RDBMS) presents the results in a multidimensional format via a Java API creates multidimensional schemes through Workbench tool. The Mondrian engine processes MDX requests with the ROLAP (Relational OLAP) schemas. These schema files are XML metadata models that are created in a specific structure used by the Mondrian engine. These XML models can be considered cube-like structures which utilize existing FACT and DIMENSION tables found in your RDBMS. It does not require that an actual physical cube is built or maintained; only that the metadata model is created. Mondrian is an OLAP (online analytical processing) engine written in Java. It reads from JDBC data sources, aggregates data in a memory cache, and implements the MDX language and the olap4j and XML/A APIs. Now that you have a physical data model in place, you must create a logical model that maps to it. A Mondrian schema is essentially an XML file that performs this mapping, thereby defining a multidimensional database structure. It provides the following functionality:

Schema editor integrated with the underlying data source for validation. (See above) Test MDX queries against schema and database Screenshot Browse underlying databases structure Screenshot\\

The most important components of a schema are cubes, measures, and dimensions:

A cube is a collection of dimensions and measures in a particular subject area. A measure is a quantity that you are interested in measuring, for example, unit sales of a product, or cost price of inventory items.

Some more definitions:

A member is a point within a dimension determined by a particular set of attribute values. The gender hierarchy has the two members 'M' and 'F'. 'San Francisco', 'California' and 'USA' are all members of the store hierarchy. A hierarchy is a set of members organized into a structure for convenient analysis. For example, the store hierarchy consists of the store name, city, state, and nation. The hierarchy allows you form intermediate sub-totals: the sub-total for a state is the sum of the sub-totals of all of the cities in that state, each of which is the sum of the sub-totals of the stores in that city. A level is a collection of members which have the same distance from the root of the hierarchy. A dimension is a collection of hierarchies which discriminate on the same fact table attribute (say, the day that a sale occurred).A dimension is an attribute, or set of attributes, by which you can divide measures into sub-categories.

Creating a Pentaho Analysis View Using the SchemaWorkbench Follow the five basic steps below when creating a Pentaho Analysis View using the Schema Workbench tocreate the ROLAP model: 1.Configure your Pentaho BI Server data source using the Pentaho Admin Console Create your Pentaho Analysis Schema Model using Schema Workbench Publish your Pentaho Analysis Schema Model to the Pentaho BI Server Use the Pentaho User Console to Create a New Analyzer Report (Enterprise Edition) or AnalysisView (Community) Execute the Pentaho Analysis view MDX stands for 'multi-dimensional expressions'. It is the main query language implemented by Mondrian. t looks a little like SQL, but don't be deceived! The structure of an MDX query is quite different from SQL. MDX is a language for querying multidimensional databases, in the same way that SQL is used to query relational databases. It was originally defined as part of the OLE DB for OLAP specification, and a similar language, mdXML, is part of the XML for Analysis specification. Schema Workbench Notes Before you start using Schema Workbench, you should be aware of the following points:

Schema Workbench by executing the /pentaho/design-tools/schemaworkbench/workbench script. On Linux and OS X, this is a .sh file; on Windows it's .bat. You must be familiar with your physical data model before you use Schema Workbench. If you don't know which are your fact tables and how your dimensions relate to them, you will not be able to make significant progress in developing a Mondrian schema. When you make a change to any field in Schema Workbench, the change will not be applied until you click out of that field such that it loses the cursor focus. Schema Workbench is designed to accommodate multiple sub-windows. By default they are arranged in a cascading fashion. However, you may find more value in a tiled format, especially if you put the JDBC Explorer window next to your Schema window so that you can see the database structure at a glance. Simply resize and move the sub-windows until they are in agreeable positions. You start Pentaho Analysis is built on the Mondrian relational online analytical processing (ROLAP) engine. ROLAP relies on a multidimensional data model that, when queried, returns a dataset that resembles a grid. The rows and columns that describe and bring meaning to the data in that grid are dimensions, and the hard numerical values in each cell are the measures or facts. In Pentaho Analyzer, dimensions are shown in yellow and measures are in blue. Pentaho Aggregation Designer (client tool) Pentaho Aggregation Designer is a graphical environment used to increase query performance of a Mondrian OLAP schema through the creation of aggregate tables. Used to improve performance when working with very large data sets: fact tables with more than 10 million records cubes with a high cardinality of levels and members Mondrian supports two aggregation techniques: "lost" dimension "collapsed" dimension Aggregate tables: is associated with just one fact table coexist with the base fact table contain pre-aggregated measures built from the fact table. Aggregate designer generates: DDL for creating tables DML for populating tables Mondrian schema that references the new aggregate tables

The Pentaho Aggregation Designer simplifies the creation and deployment of aggregate tables that improve the performance of your Pentaho Analysis (Mondrian) OLAP cubes. Pentaho Analysis is a pure, relational OLAP engine that works solely with the data stored in your relational database rather than providing its own multidimensional data storage model. This simplifies deployment and data management, but places limitations on performance when working with very large data sets (fact tables with more than 10 million records and/or cubes with a high cardinality of levels and members). To improve performance in these scenarios,

Pentaho Analysis supports aggregate tables. Aggregate tables coexist with the base fact table and contain pre-aggregated measures built from the fact table. This improves performance by enabling the Mondrian engine to fulfill certain summary level queries from the smaller aggregate table versus aggregating a large number of individual facts from the base fact table. The Pentaho Aggregation Designer provides you with a simple interface that allows you to create aggregate tables from levels within the dimensions you specify. Based on these selections, the Aggregation Designer generates the Data Definition Language (DDL) for creating the aggregate tables, the Data Manipulation Language (DML) for populating them, and an updated Mondrian schema which references the new aggregate tables. If you are unfamiliar with aggregate table design concepts, the Aggregation Designer also includes an intelligent adviser that evaluates the structure and cardinality of your OLAP cube and recommends some initial aggregate tables to create for improving performance. Pentaho Aggregation Designer is a graphical environment used to increase query performance of a Mondrian OLAP schema through the creation of aggregate tables. The components of the Pentaho Aggregation Designer workspace are shown below: Meniu Bar Tool Bar Cost/ Benefit Chart-The Cost/Benefit chart provides a high-level comparison of the benefit of all currently selected aggregates relative to their estimated cost. The benefit scale represents the relative number of queries that can be fulfilled by an aggregate table versus having to be retrieved from the base fact table. The cost scale is an indicator of the impact in terms of number of tables and disk space needed to create the selected aggregate recommendations. Aggregate Table Left Pane Right Pane Server Connection Status Status Bar

Mondrian supports two aggregation techniques: "lost" dimension completely missing from the aggregate table. "collapsed" dimension. Recall that the dimension key in the fact table refers (more or less) to the lowest level in the dimension hierarchy. Every aggregate table is associated with just one fact table. It aggregates the fact table measures over one or more of the dimensions. Impact summary The impact summary in the lower right pane provides you with information on the estimated impact for creating all of the currently selected aggregates. This summary includes the number of aggregate tables that will be created, the estimated number of rows contained in those

tables, and the estimated amount of space it will occupy on the hard drive. The impact summary is automatically updated as you select and deselect aggregates from the list of proposed aggregates

Like reporting, analysis is another essential feature of all BI solutions. Reports are typically static (save for parameters) and mainly used to support decisions that affect the business at the operational level. Analysis tends to be a lot more dynamic, and is typically used by managers to support decisions at the tactical and strategic level. One of the typical elements in analytical solutions is that they allow the user to dynamically explore the data in an ad-hoc manner. Typically, the data is first presented at a highly aggregated level, say, total sales per year, and then the user can drill down to a more detailed level, Closely related to typical analytical questions and solutions is the dimensional model. To prepare data for use with the Pentaho Analysis (and Reporting, to a certain extent) client tools, you should follow this basic workflow: Design a Star or Snowflake Schema Populate the Star/Snowflake Schema Build a Mondrian Schema Initial Testing Adjust and Repeat Until Satisfied Test for Performance Deploy to Production Typical workflow for developing a dimensional model(cube=one fact table and several dimensional tables): Collect user requirements for business logic and processes Considering the entirety of your data, break it down into subjects Isolate groups of facts into one or more fact tables Design dimensional tables that draw relationships between levels (fact groups) Determine which members of each level are useful for each dimensional table Build and publish a Mondrian (Pentaho Analysis) schema and collect feedback from users Refine your model based on user feedback, continue iterating through this list until users are productive

if you need to analyze data across two or more cubes, or need to combine information from two fact tables on the same subject but with different granularity -- then you must create a virtual cube. Virtual cubes cannot presently be created through Pentaho Data Integration's model perspective; you must use Schema Workbench instead. Virtual cubes are useful for situations where there are fact tables of different granularities (for instance, one Time fact table might be configured on a Day level, another at the Month level), or fact tables of different dimensionalities (for 12 | Pentaho BI Suite Official Documentation | Dimensional Modeling

instance one on Products, Time and Customer, another on Products, Time and Warehouse), and you need to present the results to users who don't know how the data is structured. Any common dimensions -- shared dimensions which are used by both constituent cubes -- are automatically synchronized. Dimensions which only belong to one cube are called non-conforming dimensions. You must be familiar with your physical data model before you use Schema Workbench When you make a change to any field in Schema Workbench, the change will not be applied until you click out of that field such that it loses the cursor focus.
The following Mondrian features are not yet functional in Pentaho Analyzer, but are scheduled to be added in the future: Parent-child hierarchies will render the entire set of members instead of showing them as parents and children. Ragged hierarchies currently throw an exception when using them in Analyzer Additional Mondrian features not yet available: cube captions, drill-through, parameterization A schema is the structure of the relational database and includes tables, fields, views, and more. A cube is a data structure that allows information in a database to be analyzed quickly and from multiple perspectives.

Analyzer Visualizations=maps, charts, and grids.

Das könnte Ihnen auch gefallen