Dimension Model Implementation
Example: 01
I will discuss Star Schema and how to create one using credit-debit transaction dataset in Snowflake data warehouse. This post is more suitable to those who are new to Star Schema or Dimensional Modeling. But if you are a seasoned data modeler, you are more than welcome to read this post and please leave your feedback in the comment section.
So how do we build a Star Schema? As proposed by Kimball, there are 4 steps in designing of a dimensional model.
-
Select the business process. The first step is to identify the business process that you want to model. Model the processes that are most significant or relevant to the business first.
-
Declare the grain. Grain refers to the level of detail of the information that you will store in the fact table. The grain should be at the most atomic or lowest level possible. For example, A line item on a grocery receipt. The grocery owner might want to ask questions such as “what are the items that sold the best during the day in our grocery store?”, and to answer this question we need to dig into line-item level instead of the order-level.
-
Identify the dimensions. You can identify the dimensions by looking at the descriptive information or attributes that exist in your business process and provide context to your measurable events. For example: payment method, customers, locations, etc.
-
Identify the facts. Facts are the quantitative measures in your business process that are always in numeric. For example: price, minutes, speed, etc. You should identify/select the measures that are true to your selected grain.
The dataset has 23 columns but for simplicity, I will exclude 4 irrelevant columns. Here are the descriptions of the columns.
Columns | Descriptions |
---|---|
TRANSACTION_REFERENCE | The transaction identifier for each transaction made by consumer |
USER_REFERENCE | The user identifier of the consumer |
AGE_BAND | The consumer age range |
SALARY_BAND | The consumer salary range. |
POSTCODE | The postcode of where the consumer lives. |
LSOA | Geographical hierarchy: small areas that has similar population size (average of approximately 1,500 residents or 650 households). |
MSOA | Geographical hierarchy: medium areas where the minimum population size is 5000 (average of 7200). |
DERIVED_GENDER | The consumer gender identity. |
TRANSACTION_DATE | The transaction date made by the consumer. |
ACCOUNT_REFERENCE | The consumer bank account identifier. |
PROVIDER_GROUP_NAME | The consumer's bank for executing his/her transactions. |
ACCOUNT_TYPE | The account type: current, savings, etc. |
CREDIT_DEBIT | Type of transaction made by consumer: debit or credit. |
AMOUNT | The amount of transaction. |
AUTO_PURPOSE_TAG_NAME | The transaction purpose. |
MERCHANT_NAME | The merchant's name. |
MERCHANT_BUSINESS_LINE | The merchant's business category. |
ACCOUNT_CREATED_DATE | The date of when the account first created. |
ACCOUNT_LAST_REFRESHED | The date of when the account last updated. |
DIM_USERS
DIM_USERS will store users’ demographic information such as user id, age, salary, gender and address.
CREATE OR REPLACE TABLE analytics.marts_credit_debit_transaction.dim_users AS (
SELECT
DISTINCT user_reference AS user_id,
age_band,
salary_band,
postcode,
LSOA,
MSOA,
derived_gender AS gender
FROM raw_credit_debit_transaction.public_listing.transactions
);
DIM_ACCOUNTS
DIM_ACCOUNTS stores account level attributes such as account id, bank name and account type.
CREATE OR REPLACE TABLE analytics.marts_credit_debit_transaction.dim_accounts AS (
SELECT
DISTINCT account_reference AS account_id,
provider_group_name AS bank_name,
account_type,
account_created_date,
account_last_refreshed
FROM raw_credit_debit_transaction.public_listing.transactions
);
DIM_MERCHANTS
All information about the merchants such as merchant’s name and business category will be stored in DIM_MERCHANTS. The dataset does not provide merchant identifier, so in this case I have decided to create a surrogate key for merchant’s key identifier.
CREATE OR REPLACE TABLE analytics.marts_credit_debit_transaction.dim_merchants AS (
SELECT
DISTINCT HASH(merchant_name, merchant_business_line)::VARCHAR AS merchant_id,
merchant_name,
merchant_business_line
FROM raw_credit_debit_transaction.public_listing.transactions
);
DIM_TRANSACTIONS
DIM_TRANSACTIONS stores information on transaction attributes such as transaction type and purpose.
CREATE OR REPLACE TABLE analytics.marts_credit_debit_transaction.dim_transactions AS (
SELECT
transaction_reference AS transaction_id,
credit_debit AS transaction_type,
auto_purpose_tag_name AS transaction_purpose
FROM raw_credit_debit_transaction.public_listing.transactions
);
DIM_DATES
I decided to create a data dimension that stores all date related parsed values such as day of the month, day name, month of the year, month name, etc. This will be handy when we need to generate time based reports.
CREATE OR REPLACE TABLE analytics.marts_credit_debit_transaction.dim_dates AS (
SELECT
DISTINCT transaction_date,
DAY(transaction_date)::VARCHAR as day_of_month,
DAYNAME(transaction_date) as day_name,
MONTH(transaction_date)::VARCHAR as month_of_year,
MONTHNAME(transaction_date) as month_name,
YEAR(transaction_date)::VARCHAR as year
FROM raw_credit_debit_transaction.public_listing.transactions
);
FCT_TRANSACTIONS
FCT_TRANSACTIONS will store the numeric information (transaction amount) and foreign keys that connect it to the dimension tables. To note, I also add transaction date column as a way to connect the FCT_TRANSACTIONS to DIM_DATES table. A better approach is to use surrogate key to generate date identifier but that is outside the scope of this post.
CREATE OR REPLACE TABLE analytics.marts_credit_debit_transaction.fct_transactions AS (
SELECT
transaction_date AS transaction_date,
transaction_reference AS transaction_id,
user_reference AS user_id,
account_reference AS account_id,
HASH(merchant_name, merchant_business_line)::VARCHAR AS merchant_id,
amount::NUMBER as amount
FROM raw_credit_debit_transaction.public_listing.transactions
);
References
- https://danischnider.wordpress.com/2022/11/10/star-schema-design-in-oracle-fundamentals/
- https://blog.devgenius.io/implementing-star-schema-in-snowflake-data-warehouse-1f890cdda952