Beruflich Dokumente
Kultur Dokumente
(PDAM)
Submission1
Figure 1 – SQL to automatically populate date dimension table with last 10 years
Figure 2 - Date dimension output of INSERT For second schema (Figure 8) the
fact subject is the total number of
drugs and the cost. The fact table (Figure 3) is structured based on the following reasoning:
Figure 9 - Query 1
In Figure 11 we can observe the output from a sample data implementation with 2 counties, 2 cities and 1
treatment per year, and few records in order to verify the output calculations.
In Figure 12 is the same query but with a sample data of 3 counties, 10 cities and tens of treatments per year
with over 10.000 records. In the sample I used one unit per city but because the unit_key is unique implementing more
than one unit per city won’t affect the result (Figure 10).
Figure 13 - Query 2
Query no 3
The following query (Figure 16) is listing the top 10 units from two counties, with the highest income over the
year. The output is listed in DESC order on the average treatment cost. Also the query returns the highest and lowest
cost of an individual treatment. In Figure 16 we can see the treatment table data where lowest cost is 100 and highest
is 500.
Query no 4
The following query is returning the total number of patients (Figure 18) sorted as
patient type (Inpatient/Outpatient). The query calculates the total times when a patient was
admitted into the hospital, the total number of drugs prescribed for that patient, total price of
drugs for each patient and the grand total.
Figure 19 - Query 4
Figure 21 - Query 5
The following query (Figure 23) calculates the changes in percentage for given two years. It is grouping the
drug type (category), SUMs the total number of issued drugs per type/year and total costs for the year and calculates
if it is any increase/decrease of total number and total cost. The change percentage is calculated applying following
formula: C% = 2ndValue – 1stValue * 100 ÷ 1stValue
(e.g. 2009’s Calcium total number of drugs is 5; 2010’s Calcium total number of drugs is 17; Applying above formula
will result: 17-5*100/5=240 which means a 240% increase. If the number is positive, is an increase, if it is negative is
a decrease). The correct calculations can be observed in the output screenshot (Figure 24).
Note: This query can be improved and simplified by using common table expression syntax WITH (CTE) but
this functionality is available for MySQL version 8.0 onwards (MySQL.com, 2018) and on the development/testing
environment it was installed MySQL 5.6.
Bibliography
Ballard, C., Farrell, D. M., Gupta, A., Mazuela, C., & Vohnik, S. (2006). Dimensional Modeling: In a Business Inteligence
Environment. NY: International Business Machines Corporation IBM Corp.
MySQL.com. (2018). Aggregate (GROUP BY) Function Descriptions. Retrieved Nov 30, 2018, from MySQL Oracle
Corporation: https://dev.mysql.com/doc/refman/8.0/en/group-by-functions.html
MySQL.com. (2018). WITH Syntax (Common Table Expressions). Retrieved Dec 07, 2008, from MySQL Oracle
Corporation: https://dev.mysql.com/doc/refman/8.0/en/with.html
Nguyen, T., Tjoa, A., & Trujillo, J. (2005). Data Warehousing and Knowledge Discovery: A Chronological View of
Research Challenges. DaWaK 2005, LNCS, 530-535.
Appendices
Query 1
Query 2
SELECT
dim_date.year AS 'Year',
dim_patient.patient_gender AS 'Gender',
FROM fact_total_treatment_cost
ON fact_total_treatment_cost.date_key = dim_date.date_key
ON fact_total_treatment_cost.unit_key = dim_location.unit_key
ON fact_total_treatment_cost.patient_key = dim_patient.patient_key
ON fact_total_treatment_cost.admission_key = dim_admission.admission_key
WITH ROLLUP;
Query 3
Query 4
SELECT
COALESCE(dim_patient.patient_key, 'Grand Total') AS 'Patient ID',
SUM(CASE WHEN patient_type = 'Inpatient' THEN 1 ELSE 0 END) AS 'Inpatient',
SUM(CASE WHEN patient_type = 'Outpatient' THEN 1 ELSE 0 END) AS 'Outpatient',
SUM(fact_drugs.total_no_drug) AS 'Total No of Drugs',
CONCAT('£ ', SUM(fact_drugs.total_cost_drug)) AS 'Total Drugs Cost'
FROM fact_drugs
INNER JOIN dim_patient
ON fact_drugs.patient_key = dim_patient.patient_key
INNER JOIN dim_prescription
ON fact_drugs.prescription_key = dim_prescription.prescription_key
GROUP BY dim_patient.patient_key WITH ROLLUP
Query 5
SELECT
A.*,
CONCAT(ROUND((CASE WHEN (A.total_no_drug IS NULL OR
B.total_no_drug IS NULL OR
B.total_no_drug = 0) THEN 0 ELSE (A.total_no_drug - B.total_no_drug) * 100 /
B.total_no_drug END), 2), ' %') AS 'Drug No Diff',
CONCAT(ROUND((CASE WHEN (A.total_cost_drug IS NULL OR
B.total_cost_drug IS NULL OR
B.total_cost_drug = 0) THEN 0 ELSE (A.total_cost_drug - B.total_cost_drug) * 100 /
B.total_cost_drug END), 2), ' %') AS 'Total Cost Diff'
FROM (SELECT
SUBSTR(year, 1, 4) year, drug_type,
SUM(total_no_drug) total_no_drug,
SUM(total_cost_drug) total_cost_drug
FROM fact_drugs fd
INNER JOIN dim_date dd
ON fd.date_key = dd.date_key
INNER JOIN dim_drug drug
ON fd.drug_key = drug.drug_key
WHERE dd.year BETWEEN 2009 AND 2010
GROUP BY dd.year, drug.drug_type) A
LEFT JOIN (SELECT
SUBSTR(year, 1, 4) year, dim_drug.drug_type,
SUM(total_no_drug) total_no_drug,
SUM(total_cost_drug) total_cost_drug
FROM fact_drugs fd
INNER JOIN dim_date
ON fd.date_key = dim_date.date_key
INNER JOIN dim_drug
ON fd.drug_key = dim_drug.drug_key
WHERE dim_date.year BETWEEN 2009 AND 2010
GROUP BY dim_date.year,
dim_drug.drug_type) B
ON A.YEAR = (B.YEAR + 1)
AND A.drug_type = B.drug_type