Beruflich Dokumente
Kultur Dokumente
Release 0.7.0
Stefan Urbanek
CONTENTS
1 2
Introduction Installation 2.1 From sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . model Logical Model Logical Model description 4.1 Load a model . . . . 4.2 Model components . 4.3 Dimensions . . . . . 4.4 Attributes . . . . . .
3 5 5 7 9 9 9 12 14 17 17 17 19 19 20 21 25 25 25 26 27 28 29 29 31 31 34 36 41 41
3 4
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
Physical Mapping 5.1 Attribute Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Model validation 6.1 Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Warnings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aggregations and Aggregation Browsing Creating Cubes 8.1 Relational Database (SQL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Mongo Backend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Localization 9.1 Metadata Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Data Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Localized Reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7 8
10 OLAP Web Service 10.1 API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 Running and Deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 slicer - Command Line Tool 11.1 serve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
42 42 42 42 43 43 48 51 53 53 55 57 59 61
12 Cubes API 12.1 OLAP Cubes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Aggregation Browsers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 Utility functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Development Notes 13.1 Fact Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Contact and Getting Help 15 Indices and tables Python Module Index Index
ii
Cubes is a framework for Online Analytical Processing (OLAP), multidimensional analysis and aggregated cube computation. It is part of Data Brewery. Contents:
CONTENTS
CONTENTS
CHAPTER
ONE
INTRODUCTION
Focus on data analysis, not on physical data structure Cubes is a framework for: Online Analytical Processing - OLAP, mostly relational DB based - ROLAP multidimensional analysis star and snowake schema denormalisation cube computation (see Creating Cubes) Features: model Logical Model - description of how data are being analysed and reported, independent of physical data implementation hierarchical dimensions (attributes that have hierarchical dependencies, such as category-subcategory or country-region) localizable metadata and data Localization Framework has modular nature and supports multiple database backends, different ways of cube computation and ways of browsing aggregated data. relational databases with SQL through SQL alchemy document based database in MongoDB
Chapter 1. Introduction
CHAPTER
TWO
INSTALLATION
Optional requirements: SQLAlchemy for SQL backend Werkzeug for Slicer server To install cubes, you can use easy_install (from setuptools):
easy_install cubes
or pip:
pip install cubes
Main project repository at Github: https://github.com/Stiivi/cubes Bitbucket copy for mercurial users: https://bitbucket.org/Stiivi/cubes (might be lagging a little bit behind github).
Install:
cd cubes python setup.py install
Chapter 2. Installation
CHAPTER
THREE
CHAPTER
FOUR
cubes=None,
dimen-
Attributes: name - model name label - human readable name - can be used in an application description - longer human-readable description of the model cubes - dictionary of cube descriptions (see below) dimensions - dictionary of dimension descriptions (see below) locale - locale code of the model When initializing the Model object, cubes and dimensions might be dictionaries with descriptions. See Cube and Dimension for more information. add_cube(cube) Adds cube to the model and also assigns the model to the cube. If cube has a model assigned and it is not this model, then error is raised.
10
Cubes dimensions are collected to the model. If cube has a dimension with same name as one of existing models dimensions, but has different structure, an exception is raised. Dimensions in cube should be the same as in model. add_dimension(dimension) Add dimension to model. Replace dimension with same name cube(cube) Get a cube with name name or coalesce object to a cube. dimension(obj) Get dimension by name or by object is_valid(strict=False) Check whether model is valid. Model is considered valid if there are no validation errors. If you want to be sure that there are no warnings as well, set strict to True. Args: strict: If False only errors are considered fatal, if True also warnings will make model invalid. Returns: boolean ag whether model is valid or not. localizable_dictionary() Get model locale dictionary - localizable parts of the model localize(translation) Return localized version of model remove_cube(cube) Removes cube from the model remove_dimension(dimension) Remove a dimension from receiver to_dict(**options) Return dictionary representation of the model. All object references within the dictionary are name based Options: expand_dimensions - if set to True then fully expand dimension information in cubes full_attribute_names - if set to True dimension_name.attribute_name then attribute names will be written as
validate() Validate the model, check for model consistency. Validation result is array of tuples in form: (validation_result, message) where validation_result can be warning or error. Returs: array of tuples class cubes.model.Cube(name=None, model=None, label=None, measures=None, details=None, dimensions=None, mappings=None, joins=None, fact=None, key=None, description=None, **kwargs) Create a new cube Args: name (str): dimension name desc (dict): dict object containing keys label, description, dimensions, ... add_dimension(dimension) Add dimension to cube. Replace dimension with same name
11
dimension(obj) Get dimension object. If obj is a string, then dimension with given name is returned, otherwise dimension object is returned if it belongs to the cube. remove_dimension(dimension) Remove a dimension from receiver. dimension can be either dimension name or dimension object. to_dict(expand_dimensions=False, with_mappings=True, **options) Convert to dictionary Options: expand_dimensions - if set to True then fully expand dimension information validate() Validate cube. See Model.validate() for more information.
4.3 Dimensions
Dimension descriptions are stored in model dictionary under the key dimensions. The dimension description contains keys: Key name label levels attributes hierarchies hierarchy Example:
{ "name": "date", "label": "Dtum", "levels": { ... } "attributes": [ ... ] "hierarchies": { ... } }
Description dimension name human readable name - can be used in an application dictionary of hierarchy levels dictionary of dimension attributes dictionary of dimension hierarchies if dimension has only one hierarchy, you can specify it hiere.
Use either hierarchies or hierarchy, using both results in an error. Hierarchy levels are described: Key label key Description human readable name - can be used in an application key eld of the level (customer number for customer level, region code for region level, year-month for month level). key will be used as a grouping eld for aggregations. Key should be unique within level. laname of attribute containing label to be displayed (customer name for customer level, region name for bel_attribute region level, month name for month level) atlist of other additional attributes that are related to the level. The attributes are not being used for tributes aggregations, they provide additional useful information. Example of month level of date dimension:
"month": { "label": "Mesiac", "key": "month",
12
4.3. Dimensions
13
Hierarchies are described: Key label levels Description human readable name - can be used in an application ordered list of level names from top to bottom - from least detailed to most detailed (for example: from year to day, from country to city)
Example:
"hierarchies": { "default": { "levels": ["year", "month"] }, "ymd": { "levels": ["year", "month", "day"] }, "yqmd": { "levels": ["year", "quarter", "month", "day"] } }
4.4 Attributes
Measures and dimension level attributes can be specied either as rich metadata or just simply as strings. If only string is specied, then all attribute metadata will have default values, label will be equal to the attribute name. Key name label order locales Description attribute name, used in reports human readable name - can be used in an application, localizable natural order of the attribute (optional), can be asc or desc list of locales in which the attribute values are available in (optional)
The optional order is used in aggregation browsing and reporting. If specied, then all queries will have results sorted by this eld in specied direction. Level hierarchy is used to order ordered attributes. Only one ordered attribute should be specied per dimension level, otherwise the behaviour is unpredictable. This natural (or default) order can be later overriden in reports by explicitly specied another ordering direction or attribute. Explicit order takes precedence before natural order. For example, you might want to specify that all dates should be ordered by default:
"attributes" = [ {"name" = "year", "order": "asc"} ]
14
Locales is a list of locale names. Say we have a CPV dimension (common procurement vocabulary - EU procurement subject hierarchy) and we are reporting in Slovak, English and Hungarian. The attributes will be therefore specied as:
"attributes" = [ {"name" = "group_code"}, {"name" = "group_name", "order": "asc", "locales" = ["sk", "en", "hu"]} ]
group name is localized, but group code is not. Also you can see that the result will always be sorted by group name alphabeticall in ascending order. See Attribute Mappings for more information about how logical attributes are mapped to the physical sources. In reports you do not specify locale for each locaized attribute, you specify locale for whole report or browsing session. Report queries remain the same for all languages.
4.4. Attributes
15
16
CHAPTER
FIVE
PHYSICAL MAPPING
In addition to logical model denition, the model description might contain physical mapping. The mapping is optional and can be used when backend defaults is not sufcient. Serves mostly for better logical to physical mapping customisation. Key fact mappings joins Description name of a fact table (or collection or dataset, depending on backend) dictionary of mapping of logical attribute to physical attribute list of join specications
Note: Current implementation of Cubes framework requires a star or snowake schema that can be joined into fully denormalized normalized form. Therefore all localized attributes have to be stored in their own columns. You have to denormalize the data before using them in Cubes.
5.2 Joins
If you are using star or snowake schema in relational database, Cubes requires information on how to join the tables into the star/snowake. Tables are joined by matching single-column keys. Say we have a fact table named fact_contracts and dimension table with categories named dm_categories. To join them we dene following join specication: 17
There might be situiations when you would need to join one detail table more than once. Example of such situation is a dimension with list of organisations and in fact table you have two organisational references, such as receiver and donor. In this case you specify alias for detail table:
"joins" = [ { "master": "fact_contracts.receiver_id", "detail": "dm_organisation.id", "alias": "dm_receiver" } { "master": "fact_contracts.donor_id", "detail": "dm_organisation.id", "alias": "dm_donor" } ]
Note that order of joins matters, if you have snowake and would like to join deeper detail, then you have to have all required tables joined (and properely aliased, if necessary) already. In mappings you refer to table aliases, if you joined with an alias.
18
CHAPTER
SIX
MODEL VALIDATION
To validate a model do:
results = model.validate()
This will return a list of tuples (result, message) where result might be warning or error. If validation contains errors, the model can not be used without resulting in failure. If there are warnings, some functionalities might or might not fail or might not work as expected. You can validate model from command line:
slicer model validate /path/to/model
6.1 Errors
Error No mappings for cube a cube No mapping for measure a measure in cube a cube No levels in dimension a dimension No hierarchies in dimension a dimension No defaut hierarchy specied, there is more than one hierarchy in dimension a dimension Level a level in dimension a dimension has no attributes Key a key in level a level in dimension a dimension is not in attribute list Dimension a dimension is not a subclass of Dimension class Resolution Provide mappings dictionary for cube Add mapping for a measure into mappings dictionary Dene at least one dimension level. Dene at least one hierarchy. Specify a default hierarchy name or name one hierarchy as default Provide level attributes. At least one - the level key. Add key attribute into attribute list or check the key name. This might happen when model was constructed programatically. Check your model construction code.
19
6.2 Warnings
Warning No fact specied for cube a cube (factless cubes are not yet supported, using fact as default dataset/table name No mapping for dimension a dimension attribute an attribute in cube a cube (using default mapping) No default hierarchy name specied in dimension a dimension, using some autodetect default name Default hierarchy a hierarchy does not exist in dimension a dimension Level a level in dimension a dimension has no key attribute specied, rst attribute will be used: rst attribute name No cubes dened Resolution Specify a fact table/dataset, otherwise table with name fact will be used. View builder will fail if such table does not exist. Provide mapping for dimension, otherwise identity mapping will be used (dimension.attribute) Provide default_hierarchy_name. If there is only one hierarchy for dimension, the only one will be used. If there are more hierarchies, the one with name default will be used. Check that default_hierarchy refers to existing hierarchy within that dimension. Specify key attribute in the dimension level.
20
CHAPTER
SEVEN
To browse localized data, just pass locale to the browser and all results will contain localized values for localizable attributes:
browser = cubes.backends.SQLBrowser(cube, connection, "mft_contracts", locale = "sk")
21
Following aggregation code is backend-independent. Aggregate all data for year 2009:
cuboid = full_cube.slice("date", [2009]) results = cuboid.aggregate()
Results will contain one aggregated record. Drill down through a dimension:
results_cofog = cuboid.aggregate(drill_down = "cofog") results_date = cuboid.aggregate(drill_down = "date")
results_cofog will contain all aggregations for cofog dimension at level 1 within year 2009. results_date will contain all aggregations for month within year 2009. Drilling-down and aggregating through single dimension. Following function will print aggregations at each level of given dimension.
def expand_drill_down(dimension_name, path = []): dimension = cube.dimension(dimension_name) hierarchy = dimension.default_hierarchy # We are at last level, nothing to drill-down if hierarchy.path_is_base(path): return # Construct cuboid of our interest full_cube = browser.full_cube() cuboid = full_cube.slice("date", [2009]) cuboid = cuboid.slice(dimension_name, path) # Perform aggregation cells = cuboid.aggregate(drill_down = dimension_name) # Print results prefix = " " * len(path) for cell in cells: path = cell["_cell"][dimension_name] current = path[-1] print "%s%s: %.1f %d" % (prefix, current, cell["amount_sum"], cell["record_count"]) expand_drill_down(dimension_name, path)
The internal key _cell contains a dictionary with aggregated cell reference in form: {dimension: "date" = [2010, 1] }
path}, like {
Note: The output record from aggregations will change into an object instead of a dictionary, in the future. The equivalent to the _cell key will be provided as an object attribute. Assume we have two levels of date hierarhy: year, month. To get all time-based drill down:
expand_drill_down("date")
22
2008: 1200.0 60 1: 100.0 10 2: 200.0 5 3: 50.0 1 ... 2009: 2000.0 10 1: 20.0 10 ...
23
24
CHAPTER
EIGHT
CREATING CUBES
The Cubes framework provides funcitonality for denormalisation and for cube pre-computation. Currently SQL backend supports denormalisation only and mongo backend supports cube precomputation.
25
connection = pymongo.Connection() database = connection["wdmmg_dev"] # Load model and get cube model_path = "wdmmg_model.json" model = cubes.model_from_path(model_path) cube = model.cubes["wdmmg"] # Create cube builder: facts are read from collection named "entry", aggregations # are inserted into collection named "cube" builder = cubes.builders.MongoSimpleCubeBuilder(cube, database, fact_collection = "entry", cube_collection = "cube") # Compute the cube! builder.compute()
8.3 API
See Also: Module cubes.backends. More information about cube builders in different database environments. Module cubes. Logical model description - required for preaggregated cube computation.
26
CHAPTER
NINE
LOCALIZATION
Having origin in multi-lingual Europe one of the main features of the Cubes framework is ability to provide localizable results. There are three levels of localization in each analytical application: 1. Application level - such as buttons or menus 2. Metadata level - such as table header labels 3. Data level - table contents, such as names of categories or procurement types
Figure 9.1: Localization levels. The application level is out of scope of this framework and is covered in internationalization (i18n) libraries, such as gettext. What is covered in Cubes is metadata and data level. Localization in cubes is very simple: 1. Create master model denition and specify locale the model is in 2. Specify attributes that are localized (see Attribute Mappings) 3. Create model translations for each required language 4. Make cubes function or a tool create translated versions the master model To create localized report, just specify locale to the browser and create reports as if the model was not localized. See Localized Reporting.
27
If a translation of a metadata attribute is missing, then the one in master model description is used. In our case we have following les:
procurements.json procurements_en.json procurements_hu.json
Figure 9.2: Localization master model and translation les. To load a model:
28
Chapter 9. Localization
Or you can get translated version of the model by directly passing translation dictionary:
handle = open("procurements_en.json") trans = json.load(handle) handle.close() model_en = model.translate("en", trans)
29
30
Chapter 9. Localization
CHAPTER
TEN
10.1 API
10.1.1 Model
GET /model Get model metadata as JSON GET /model/dimension/<name> Get dimension metadata as JSON GET /model/dimension/<name>/levels Get list level metadata from default hierarchy of requested dimension.
10.1.2 Cube
Cube API calls have format: /cube/<cube_name>/<browser_action> where the browser action might be aggregate, facts, fact, dimension and report. GET /cube/<cube>/aggregate Return aggregation result as JSON. The result will contain keys: summary and drilldown. The summary contains one row and represents aggregation of whole cuboid specied in the cut. The drilldown contains rows for each value of drilled-down dimension. If no arguments are given, then whole cube is aggregated. Paramteres cut - specication of cuboid, for example: cut=date:2004,1|category=2|entity=12345 drilldown - dimension to be drilled down. For example drilldown=date will give rows for each value of next level of dimension date. You can explicitly specify level to drill down in form: dimension:level, such as: drilldown=date:month page - page number for paginated results pagesize - size of a page for paginated results order - list of attributes to be ordered by limit - limit number of results in form limit=5:received_amount_sum:asc limit[,measure[,order_direction]]:
31
Reply: summary - dictionary of elds/values for summary aggregation drilldown - list of drilled-down cells remainder - summary of remaining cells (not in drilldown), if limit is specied. Not implemented yet total_cell_count - number of total cells in drilldown (after limir, before pagination) If pagination is used, then drilldown will not contain more than pagesize cells. Note that not all backengs might implement total_cell_count or providing this information can be congurable therefore might be disabled (for example for performance reasons). GET /cube/<cube>/facts Return all facts (details) within cuboid. Parameters cut - see /aggregate page, pagesize - paginate results order - order results format - result format: json (default; see note below), csv elds - comma separated list of fact elds, by default all elds are returned Note: Number of facts in JSON is limited to conguration value of json_record_limit, which is 1000 by default. To get more records, either use pages with size less than record limit or use alternate result format, such as csv. GET /cube/<cube>/fact/<id> Get single fact with specied id. For example: /fact/1024 GET /cube/<cube>/dimension/<dimension> Get values for attributes of a dimension. Parameters depth - specify depth (number of levels) to retrieve. If not specied, then all levels are returned cut - see /aggregate page, pagesize - paginate results order - order results POST /cube/<cube>/report Process multiple request within one API call. The POST data should be a JSON containig report specication where keys are names of queries and values are dictionaries describing the queries. report expects Content-type header to be set to application/json. See Reports for more information. GET /cube/<cube>/search/dimension/<dimension>/<query> Search values of dimensions for query. If dimension is _all then all dimensions are searched. Returns search results as list of dictionaries with attributes: Search result dimension - dimension name level - level name depth - level depth
32
level_key - value of key attribute for level attribute - dimension attribute name where searched value was found value - value of dimension attribute that matches search query path - dimension hierarchy path to the found value level_label - label for dimension level (value of label_attribute for level) Warning: Not yet fully implemented, just proposal. GET /cube/<cube>/drilldown/<dimension>/<path> Aggregate next level of dimension. This is similar to /aggregate with drilldown=<dimension> parameter. Does not result in error when path has largest possible length, returns empty results instead and result count 0. If <path> is specied, it replaces any path specied in cut= parameter for given dimension. If <path> is not specied, it is taken from cut, where it should be represented as a point (not range nor set). In addition to /aggregate result, folloing is returned: is_leaf - Flag determining whether path refers to leaf or not. For example, this ag can be used to determine whether create links (is not last) or not (is last) dimension - name of drilled dimension path - path passed to drilldown In addition to this, each returned cell contains additional attributes: * _path - path to the cell - can be used for constructing further browsable links Note: Not yet implemented Parameters that can be used in any request: prettyprint - if set to true formatting spaces are added to json output
Dimension name is followed by colon :, each dimension cut is separated by |, and path for dimension levels is separated by a comma ,. Or in more formal way, here is the BNF for the cut:
<list> <cut> <dimension> <path> ::= ::= ::= ::= <cut> | <cut> | <list> <dimension> : <path> <identifier> <value> | <value> , <path>
Why dimension names are not URL parameters? This prevents conict from other possible frequent URL parameters that might modify page content/API result, such as type, form, source.
10.1. API
33
Following image contains examples of cuts in URLs and how they change by browsing cube aggregates:
Figure 10.1: Example of how cuts in URL work and how they should be used in application view templates.
10.2 Reports
Report queries are done either by specifying a report name in the request URL or using HTTP POST request where posted data are JSON with report specication. If report name is specied in GET request instead, then server should have a repository of named report specications. Keys: 34 Chapter 10. OLAP Web Service
queries - dictionary of named queries Query specication: query - query type: aggregate, details (list of facts), values for dimension values, facts or fact for multiple or single fact respectively Note that you have to set content type to application/json. Result is a dictionary where keys are the query names specied in report specication and values are result values from each query call. Example: report.json:
{ "summary": { "query": "aggregate" }, "by_year": { "query": "aggregate", "drilldown": ["date"], "rollup": "date" } }
Request:
curl -H "Content-Type: application/json" --data-binary "@report.json" \ "http://localhost:5000/cube/contracts/report?prettyprint=true&cut=date:2004"
Reply:
{ "by_year": { "total_cell_count": 6, "drilldown": [ { "record_count": 4390, "requested_amount_sum": 2394804837.56, "received_amount_sum": 399136450.0, "date.year": "2004" }, ... { "record_count": 265, "requested_amount_sum": 17963333.75, "received_amount_sum": 6901530.0, "date.year": "2010" } ], "remainder": {}, "summary": { "record_count": 33038, "requested_amount_sum": 2412768171.31, "received_amount_sum": 2166280591.0 } }, "summary": { "total_cell_count": null, "drilldown": {}, "remainder": {},
10.2. Reports
35
10.2.1 Roll-up
Report queries might contain rollup specication which will result in rolling-up one or more dimensions to desired level. This functionality is provided for cases when you would like to report at higher level of aggregation than the cell you provided is in. It works in similar way as drill down in serveraggregate but in the opposite direction (it is like cd .. in a UNIX shell). Example: You are reporting for year 2010, but you want to have a bar chart with all years. You specify rollup:
... "rollup": "date", ...
Roll-up can be: a string - single dimension to be rolled up one level an array - list of dimension names to be rolled-up one level a dictionary where keys are dimension names and values are levels to be rolled up-to
Run the server using the Slicer tool (see slicer - Command Line Tool):
slicer serve grants_config.json
36
Place the le in the same directory as the following WSGI script (for convenience). Create a WSGI script /var/www/wsgi/olap/procurements.wsgi:
import sys import os.path import ConfigParser CURRENT_DIR = os.path.dirname(os.path.abspath(__file__)) CONFIG_PATH = os.path.join(CURRENT_DIR, "procurements.ini") try: config = ConfigParser.SafeConfigParser() config.read(CONFIG_PATH) except Exception as e: raise Exception("Unable to load configuration: %s" % e) import cubes.server application = cubes.server.Slicer(config)
37
Reply:
{ "drilldown": {}, "remainder": {}, "summary": { "date.year": "2004", "received_amount_sum": 399136450.0, "requested_amount_sum": 2394804837.56, "record_count": 4390 } }
10.3.4 Conguration
Server conguration is stored in .ini les with sections: [server] - server related conguration, such as host, port host - host where the server runs, defaults to localhost port - port on which the server listens, defaults to 5000 log - path to a log le log_level - level of log details, from least to most: error, warn, info, debug json_record_limit - number of rows to limit when generating JSON output with iterable objects, such as facts. Default is 1000. It is recommended to use alternate response format, such as CSV, to get more records. [model] - model and cube conguration path - path to model .json le locales - comma separated list of locales the model is provided in. Currently this variable is optional and it is used only by experimental sphinx search backend. [db] - relational database conguration url - database URL in form: adapter://user:password@host:port/database schema - schema containing denormalized views for relational DB cubes view_prefix, view_suffix - prex and sufx for view or table containing cube facts, name is constructed by concatenating prex + cube name + sufx [translations] - model translation les, option keys in this section are locale names and values are paths to model translation les. See Localization for more information. Example conguration le:
[server] host: localhost port: 5001 reload: yes
38
log: /var/log/cubes.log log_level: info [db] url: postgresql://localhost/data view: contracts schema: cubes [model] path: ~/models/contracts_model.json cube: contracts locales: en,sk [translations] sk: ~/models/contracts_model-sk.json
39
40
CHAPTER
ELEVEN
or:
slicer command sub_command [sub_command_options]
Commands are: Command serve model validate model json build Description Start OLAP server Validates logical model for OLAP cubes Create JSON representation of a model (can be used) when model is a directory. Build OLAP cube from source data using model
11.1 serve
Run Cubes OLAP HTTP server. Example server conguration le config.json:
{ "port": 5000, "model": "contracts.json", "cube": "contracts", "view": "ft_contracts", "connection": "postgres://localhost/contracts" }
Note: Currently the connection can be only a SQL database connection. Access to other existing backends from this tool will be added in the future. To run local server:
41
For more information about OLAP HTTP server see OLAP Web Service
42
CHAPTER
TWELVE
CUBES API
Contents:
Attributes: name - model name label - human readable name - can be used in an application description - longer human-readable description of the model cubes - dictionary of cube descriptions (see below) dimensions - dictionary of dimension descriptions (see below) locale - locale code of the model When initializing the Model object, cubes and dimensions might be dictionaries with descriptions. See Cube and Dimension for more information.
43
add_cube(cube) Adds cube to the model and also assigns the model to the cube. If cube has a model assigned and it is not this model, then error is raised. Cubes dimensions are collected to the model. If cube has a dimension with same name as one of existing models dimensions, but has different structure, an exception is raised. Dimensions in cube should be the same as in model. add_dimension(dimension) Add dimension to model. Replace dimension with same name cube(cube) Get a cube with name name or coalesce object to a cube. dimension(obj) Get dimension by name or by object is_valid(strict=False) Check whether model is valid. Model is considered valid if there are no validation errors. If you want to be sure that there are no warnings as well, set strict to True. Args: strict: If False only errors are considered fatal, if True also warnings will make model invalid. Returns: boolean ag whether model is valid or not. localizable_dictionary() Get model locale dictionary - localizable parts of the model localize(translation) Return localized version of model remove_cube(cube) Removes cube from the model remove_dimension(dimension) Remove a dimension from receiver to_dict(**options) Return dictionary representation of the model. All object references within the dictionary are name based Options: expand_dimensions - if set to True then fully expand dimension information in cubes full_attribute_names - if set to True dimension_name.attribute_name then attribute names will be written as
validate() Validate the model, check for model consistency. Validation result is array of tuples in form: (validation_result, message) where validation_result can be warning or error. Returs: array of tuples class cubes.Dimension(name=None, label=None, levels=None, attributes=None, hierarchy=None, description=None, **desc) Create a new dimension default_hierarchy Get default hierarchy specied by default_hierarchy_name, if the variable is not set then get a hierarchy with name default flat_hierarchy(level) Return the only one hierarchy for the only one level 44 Chapter 12. Cubes API
has_details Returns True when each level has only one attribute, usually key. is_flat Return true if dimension has only one level level(obj) Get level by name. levels Get list of all dimension levels. Order is undened. to_dict(**options) Return dictionary representation of the dimension validate() Validate dimension. See Model.validate() for more information. class cubes.Hierarchy(name=None, levels=None, label=None, dimension=None) Dimension hierarchy Attributes: name: hierarchy name label: human readable name levels: ordered list of levels from dimension levels_for_path(path, drilldown=False) Returns levels for given path. If path is longer than hierarchy levels, exception is raised next_level(level) Returns next level in hierarchy after level. If level is last level, returns None path_is_base(path) Returns True if path is base path for the hierarchy. Base path is a path where there are no more levels to be added - no drill down possible. previous_level(level) Returns previous level in hierarchy after level. If level is rst level, returns Nonte rollup(path, level=None) Rolls-up the path to the level. If level is None then path is rolled-up only one level. If level is deeper than last level of path the exception is raised. If level is the same as path level, nothing happens. to_dict(**options) Convert to dictionary class cubes.Level(name=None, key=None, attributes=None, bel_attribute=None, dimension=None) Hierarchy level Attributes: name: level name label: human readable label key: key eld of the level (customer number for customer level, region code for region level, year-month for month level). key will be used as a grouping eld for aggregations. Key should be unique within level. label_attribute: name of attribute containing label to be displayed (customer_name for customer level, region_name for region level, month_name for month level) null_value=None, label=None, la-
45
attributes: list of other additional attributes that are related to the level. The attributes are not being used for ag they provide additional useful information to_dict(full_attribute_names=False, **options) Convert to dictionary class cubes.Attribute(name, label=None, locales=None, order=None, description=None, **kwargs) Create an attribute. Attributes name - attribute name, used as identier label - attribute label displayed to a user locales = list of locales that the attribute is localized to order - default order of this attribute. If not specied, then order is unexpected. Possible values are: asc/ascending or desc/descending. It is recommended and safe to use Attribute.ASC and Attribute.DESC full_name(dimension, locale=None) Return full name of an attribute as if it was part of dimension. Append locale if it is one of of attributes locales, otherwise raise an error. If no locale is specied and attribute is localized, then rst locale from list of locales is used. cubes.attribute_list(attributes) Create a list of attributes from a list of strings or dictionaries.
46
slice(dimension, path) Create another cell by slicing receiving cell through dimension at path. Receiving object is not modied. If cut with dimension exists it is replaced with new one. If path is empty list or is none, then cut for given dimension is removed. Example:
full_cube = Cell(cube) contracts_2010 = full_cube.slice("date", [2010])
Returns: new derived cell object. class cubes.PointCut(dimension, path) Object describing way of slicing a cube (cell) through point in a dimension class cubes.AggregationBrowser(cube) Class for browsing data cube aggregations Attributes cube - cube for browsing aggregate(cell, measures=None, drilldown=None, **options) Return aggregate of a cell. Subclasses of aggregation browser should implement this method. Attributes drilldown - dimensions and levels through which to drill-down, default None measures - list of measures to be aggregated. By default all measures are aggregated. Drill down can be specied in two ways: as a list of dimensions or as a dictionary. If it is specied as list of dimensions, then cell is going to be drilled down on the next level of specied dimension. Say you have a cell for year 2010 and you want to drill down by months, then you specify drilldown = ["date"]. If drilldown is a dictionary, then key is dimension or dimension name and value is last level to be drilleddown by. If the cell is at year level and drill down is: { "date": "day" } then both month and day levels are added. If there are no more levels to be drilled down, an exception is raised. Say your model has three levels of the date dimension: year, month, day and you try to drill down by date then ValueError will be raised. Retruns a :class:AggregationResult object. dimension_object(dimension) Helper function to return proper dimension object as a subclass of Dimension. Warning: Depreciated. Use cubes.Cube.dimension() Arguments dimension - a dimension object or a string, if it is a string, then dimension object is retrieved from cube fact(key) Returns a single fact from cube specied by fact key key facts(cell, **options) Return an iterable object with of all facts within cell report(cell, report) Creates multiple outputs specied in the report.
47
report is a dictionary with multiple aggregation browser queries. Keys are custom names of queries which requestor can later use to retrieve respective query result. Values are dictionaries specifying single query arguments. Each query should contain at least one required value query which contains name of the query function: aggregate, facts, fact or values. Rest of values are function specic, please refer to the respective function documentation for more information. Result is a dictionary where keys wil lbe the query names specied in report specication and values will be result values from each query call. This method provides convenient way to perform multiple common queries at once, for example you might want to have always on a page: total transaction count, total transaction amount, drill-down by year and drill-down by transaction type. Roll-up Report queries might contain rollup specication which will result in rolling-up one or more dimensions to desired level. This functionality is provided for cases when you would like to report at higher level of aggregation than the cell you provided is in. It works in similar way as drill down in AggregationBrowser.aggregate() but in the opposite direction (it is like cd .. in a UNIX shell). Example: You are reporting for year 2010, but you want to have a bar chart with all years. You specify rollup:
... "rollup": "date", ...
Roll-up can be: a string - single dimension to be rolled up one level an array - list of dimension names to be rolled-up one level a dictionary where keys are dimension names and values are levels to be rolled up-to Future In the future there might be optimisations added to this method, therefore it will become faster than subsequent separate requests. Also when used with Slicer OLAP service server number of HTTP call overhead is reduced. values(cell, dimension, depth=None, paths=None, **options) Return values for dimension with level depth depth. If depth is None, all levels are returned. Note: Currently only default hierarchy is used.
48
connection - database connection, default None if you want only to create SELECT statement dimension_table_prex - default prex for dimension tables - used if there is no mapping for dimension attribute. Say you have dimension supplier and eld name and dimension table prex dm_ then default physical mapping for that eld would be: dm_supplier.name create_view(view_name, schema=None, index=False, materialize=True) Creates a view. Arguments view_name - name of a view or a table to be created schema - target database schema index - create indexes on level key columns if True. default False materialize - create materialized view (currently as table) if True (default) denormalized_view() Returns SQLAlchemy expression representing select from denormalized view. split_field(eld) Split eld into table and eld name: before rst . is table name, everything else is eld name. If there is no ., then table name is None. table(table_name) Get a table with name table_name. If table was not yet collected (while collecting joins) then raise an exception. If alias is specied, then table will be registered as known under that alias. class cubes.backends.SQLBrowser(cube, connection=None, view_name=None, view=None, locale=None) Create a browser. Attributes cube - cube object to be browsed connection - sqlalchemy database connection object view_name - name of denormalized view (might be VIEW or TABLE) view - SLQ alchemy view/table object locale - locale to be used for localized attributes To initialize SQL browser you should provide either a connection, view_name and optionally shcema or view. aggregate(cell, measures=None, drilldown=None, order=None, **options) See cubes.browsers.cell.aggregate(). fact(key) Fetch single row based on fact key facts(cell, order=None, **options) Retruns iterable objects with facts values(cell, dimension, depth=None, order=None, **options) Get values for dimension at given path within cell class cubes.backends.MongoSimpleCubeBrowser(cube, collection, database=None, gate_ag_eld=_is_aggregate) Create a browser. Attributes aggreschema=None,
49
cube - cube object to be browsed collection - MongoDB collection object or name of a collection database - MongoDB database. Has to be specied if collection is a name aggregate_ag_eld - eld to identify aggregated records. _is_aggregate collcetion is generated by cubes.build.MongoSimpleCubeBuilder aggregate(cell, measures=None, drill_down=None) See cubes.browsers.cell.aggregate(). selector_object(cell, drill_dimension=None) Return a dictionary object for nding specied cell. If drill_dimension is set, then selector for all descendants of cell through drill dimension is returned. class cubes.backends.MongoSimpleCubeBuilder(cube, database, fact_collection, cube_collection=None, measures=None, aggregate_ag_eld=_is_aggregate, required_dimensions=[date]) Creates simple cube builder in mongo. See MongoSimpleCubeBuilder.compute() for more information about computation algorithm Attributes cube - description of a cube from logical model fact_collection - either name or mongo collection containing facts (should correspond) to cube denition cube_collection - name or mongo collection where computed cell aggregates will be stored. By default it is the same collection as fact collection. Make sure to properely set aggregate_ag_eld. measures - list of attributes that are going to be aggregated. By default it is [amount] aggregate_ag_eld - name of eld (key) that distincts fact elds from aggregated records. Should be used when fact collection and cube collection is the same. By default it is _is_aggregate. required_dimensions - dimensions that are required for all cells. By default: [date] compute() Compute a multidimensional cube. Computed aggregations for cells can be stored either in separate collection or in the same source - fact collection. Attribute aggregate_ag_eld is used to distinct between facts and aggregated cells. Algorithm: 1.Compute all dimension combinations (for all levels if there are any hierarchies). Each combination is called selector and is represented by a list of tuples: (dimension, levels). For more information see: cubes.util.compute_dimension_cell_selectors(). 2.Compute aggregations for each point within dimension selector. Use MongoDB group function (alternative to map-reduce). 3.Each record for aggregated cell is stored in target collection (see above). This is naive non-optimized method of cube computation: no aggregations are reused for computation. compute_cell(selector) Compute aggregation for cell specied by selector. cell is computed using MongoDB aggregate function. Computed records are inserted into cube_collection and they contain: 50 Chapter 12. Cubes API By default it is
key elds used for grouping aggregated measures sufxed with _sum, for example: amount_sum record count in record_count cell selector as _selector (congurable) with dimension names as keys and current dimension levels as values, for example: {date: [year, month] } cell reference as _cell (congurable) with dimension names as keys and level keys forming dimension paths as values, for example: {date: [2010, 10] } Arguments selector is a list of tuples: (dimension, level_names) Note: Only sum aggregation is being computed. Other aggregations might be implemented in the future, such as average, min, max, rank, ... class cubes.backends.SlicerBrowser(url, cube) Create a browser. Attributes cube - name of a cube url - base url of Cubes Slicer OLAP server aggregate(cell, measures=None, drilldown=None) See cubes.browsers.Cell.aggregate(). fact(key) Fetch single row based on fact key
51
Example 2: Take dimensions from example 1 and add requirement for dimension A (might be date usually). then the youtput will contain dimension A in each returned tuple. Tuples without dimension A will be ommited. Output:
(A, (A, (A, (A, (a)) (a)), (B, (b)) (a)), (C, (c)) (a)), (B, (b)), (C, (c))
Example 3: If there are multiple hierarchies, then all levels are combined. Say we have D with d1, d2, B with b1, b2, and C with c. D (as date) is required: Output:
(D, (D, (D, (D, (D, (D, (D, (D, (D, (D, (d1)) (d1, d2)) (d1)), (d1, d2)), (d1)), (d1, d2)), (d1)), (d1, d2)), (d1)), (d1, d2)),
(b1)) (b1)) (b1, b2)) (b1, b2)) (b1)), (b1)), (b1, b2)), (b1, b2)),
cubes.util.combine_node_levels(nodes) Get all possible combinations between each level from each node. It is a cartesian product of rst node levels and all combinations of the rest of the levels cubes.util.combine_nodes(all_nodes, required_nodes=[]) Create all combinations of nodes, if required_nodes are specied, make them present in each combination. cubes.util.expand_dictionary(record, separator=.) Return expanded dictionary: treat keys are paths separated by separator, create sub-dictionaries as necessary cubes.util.get_localizable_attributes(obj) Returns a dictionary with localizable attributes of obj. cubes.util.localize_attributes(attribs, translations) Localize list of attributes. translations should be a dictionary with keys as attribute names, values are dictionaries with localizable attribute metadata, such as label or description. cubes.util.localize_common(obj, trans) Localize common attributes: label and description cubes.util.node_level_points(node) Get all level points within given node. Node is described as tuple: (object, levels) where levels is a list or a tuple
52
CHAPTER
THIRTEEN
DEVELOPMENT NOTES
This chapter contains notes related to Cubes development, such as: unresolved design decisions suggestions proposals for changes explaination for certain design decisions Ive included this document as part of documentation to get more feedback or to help understanding why certain things are done in certain way at the time being.
53
54
CHAPTER
FOURTEEN
55
56
CHAPTER
FIFTEEN
57
58
c
cubes, 43 cubes.backends, 48 cubes.util, 51
m
model, 7
59
60
INDEX
A
add_cube() (cubes.Model method), 43 add_cube() (cubes.model.Model method), 9 add_dimension() (cubes.Model method), 44 add_dimension() (cubes.model.Cube method), 11 add_dimension() (cubes.model.Model method), 11 aggregate() (cubes.AggregationBrowser method), 47 aggregate() (cubes.backends.MongoSimpleCubeBrowser method), 50 aggregate() (cubes.backends.SlicerBrowser method), 51 aggregate() (cubes.backends.SQLBrowser method), 49 AggregationBrowser (class in cubes), 47 all_cuboids() (in module cubes.util), 51 Attribute (class in cubes), 46 attribute_list() (in module cubes), 46
E
expand_dictionary() (in module cubes.util), 52
F
fact() (cubes.AggregationBrowser method), 47 fact() (cubes.backends.SlicerBrowser method), 51 fact() (cubes.backends.SQLBrowser method), 49 facts() (cubes.AggregationBrowser method), 47 facts() (cubes.backends.SQLBrowser method), 49 at_hierarchy() (cubes.Dimension method), 44 full_name() (cubes.Attribute method), 46
Cell (class in cubes), 46 get_localizable_attributes() (in module cubes.util), 52 combine_node_levels() (in module cubes.util), 52 H combine_nodes() (in module cubes.util), 52 compute() (cubes.backends.MongoSimpleCubeBuilder has_details (cubes.Dimension attribute), 45 method), 50 Hierarchy (class in cubes), 45 compute_cell() (cubes.backends.MongoSimpleCubeBuilder method), 50 I create_view() (cubes.backends.SQLDenormalizer IgnoringDictionary (class in cubes.util), 51 method), 49 is_at (cubes.Dimension attribute), 45 Cube (class in cubes.model), 11 is_valid() (cubes.Model method), 44 cube() (cubes.Model method), 44 is_valid() (cubes.model.Model method), 11 cube() (cubes.model.Model method), 11 cubes (module), 43 L cubes.backends (module), 48 Level (class in cubes), 45 cubes.util (module), 51 level() (cubes.Dimension method), 45 cut_for_dimension() (cubes.Cell method), 46 levels (cubes.Dimension attribute), 45 levels_for_path() (cubes.Hierarchy method), 45 D load_model() (in module cubes), 43 default_hierarchy (cubes.Dimension attribute), 44 localizable_dictionary() (cubes.Model method), 44 denormalized_view() (cubes.backends.SQLDenormalizer localizable_dictionary() (cubes.model.Model method), 11 method), 49 localize() (cubes.Model method), 44 Dimension (class in cubes), 44 localize() (cubes.model.Model method), 11 dimension() (cubes.Model method), 44 localize_attributes() (in module cubes.util), 52 dimension() (cubes.model.Cube method), 11 localize_common() (in module cubes.util), 52 61
M
Model (class in cubes), 43 Model (class in cubes.model), 9 model (module), 7 MongoSimpleCubeBrowser (class in cubes.backends), 49 MongoSimpleCubeBuilder (class in cubes.backends), 50 multi_slice() (cubes.Cell method), 46
N
next_level() (cubes.Hierarchy method), 45 node_level_points() (in module cubes.util), 52
P
path_is_base() (cubes.Hierarchy method), 45 PointCut (class in cubes), 47 previous_level() (cubes.Hierarchy method), 45
R
remove_cube() (cubes.Model method), 44 remove_cube() (cubes.model.Model method), 11 remove_dimension() (cubes.Model method), 44 remove_dimension() (cubes.model.Cube method), 12 remove_dimension() (cubes.model.Model method), 11 report() (cubes.AggregationBrowser method), 47 rollup() (cubes.Cell method), 46 rollup() (cubes.Hierarchy method), 45
S
selector_object() (cubes.backends.MongoSimpleCubeBrowser method), 50 setnoempty() (cubes.util.IgnoringDictionary method), 51 slice() (cubes.Cell method), 46 SlicerBrowser (class in cubes.backends), 51 split_eld() (cubes.backends.SQLDenormalizer method), 49 SQLBrowser (class in cubes.backends), 49 SQLDenormalizer (class in cubes.backends), 48
T
table() (cubes.backends.SQLDenormalizer method), 49 to_dict() (cubes.Dimension method), 45 to_dict() (cubes.Hierarchy method), 45 to_dict() (cubes.Level method), 46 to_dict() (cubes.Model method), 44 to_dict() (cubes.model.Cube method), 12 to_dict() (cubes.model.Model method), 11
V
validate() (cubes.Dimension method), 45 validate() (cubes.Model method), 44 validate() (cubes.model.Cube method), 12 validate() (cubes.model.Model method), 11 values() (cubes.AggregationBrowser method), 48 values() (cubes.backends.SQLBrowser method), 49 62 Index