Skip to content

Power BI

Concepts

The five major building blocks of Power BI are:

  • Dashboards
  • Reports
  • Workbooks
  • Datasets
  • Dataflows

They're all organized into workspaces, and they're created on capacities. It's important to understand capacities and workspaces before we dig into the five building blocks, so let's start there.

Capacity

Capacities are a core Power BI concept representing a set of resources (storage, processor, and memory) used to host and deliver your Power BI content. Capacities are either shared or reserved.

A shared capacity is shared with other Microsoft customers, while a reserved capacity is reserved for a single customer.

Workspaces

How to share powerBI content?

A workspace is like a shared folder between a team of users. This can be a place to share some of the Power BI content

There are two types of licenses for Power BI; capacity-based and user-based licensing. The capacity-based licensing is usually better for organizations with more than hundreds of users, and user-based licensing is good for small to medium size businesses

There are two types of workspaces:

  • My workspace (user based)
  • Workspaces (Capacity based)

My workspace

It is the personal workspace for any Power BI customer to work with your own content. Only you have access to your My workspace. You can share dashboards and reports from your My Workspace.

Workspaces

Workspaces are used to collaborate and share content with colleagues. You can add colleagues to your workspaces and collaborate on dashboards, reports, workbooks, and datasets.

Warning

Each workspace member needs a Power BI Pro or Premium Per User (PPU) license.

Dashboard

Tldr

A Power BI dashboard is a single page, often called a canvas, that uses visualizations to tell a story. Because it is limited to one page, a well-designed dashboard contains only the most-important elements of that story.

  • It is a single screen with tiles of interactive visuals, text, and graphics. A dashboard collects your most important metrics, on one screen, to tell a story or answer a question.

  • if a business user is given permissions to the report, they can build their own dashboards too.

  • As the capacity must share resources, limitations are imposed to ensure "fair play", such as the maximum model size (1 GB) and maximum daily refresh frequency (8 times per day).

The visualizations on a dashboard come from reports and each report is based on one dataset. In fact, one way to think of a dashboard is as an entryway into the underlying reports and datasets. Selecting a visualization takes you to the report that was used to create it.

Visualization:

It is a type of chart built by Power BI designers. The visuals display the data from reports and datasets.

How many datasets are feasible?

All of the visualizations in a report come from a single dataset. Power BI Desktop can combine more than one data source into a single dataset in a report, and that report can be imported into Power BI.

Report

Report is one or more pages of interactive visuals, text, and graphics that together make up a single report. Power BI bases a report on a single dataset.

Tile

A tile is a snapshot of your data, pinned to a dashboard by a designer. Designers can create tiles from a report, dataset, dashboard, the Q&A question box, Excel, SQL Server Reporting Services (SSRS), and more.

DAX

Data Analysis Expressions (DAX) is a programming language that is used throughout Microsoft Power BI for creating calculated columns, measures, and custom tables. It is a collection of functions, operators, and constants that can be used in a formula, or expression, to calculate and return one or more values.

using DAX to calculate columns

DAX allows you to augment the data that you bring in from different data sources by creating a calculated column that didn't originally exist in the data source. This feature should be used sparingly

DAX and MScript

Power BI has 2 mighty hands, DAX (powered by SQL Server Analysis Service) and M Script (powered by Power Query), to perform the ETL jobs and visualization related calculations. Both of these engines are efficient with their way of performing calculations.

where not to use DAX and MScript?

When it comes to performing calculations in an optimized way, both under-perform compared to database engines, since they are in-memory calculation engines best suited to play with data. They lack indexing of data, which database handles while storing of data. Also, database engine re-uses query results of last few queries, by keeping track of changing data, which is something complex for both DAX & Power Query. As the data size grows, performance difference is quite noticeable.

Calculated column and measure

The fundamental difference between a calculated column and a measure is that a calculated column creates a value for each row in a table. For example, if the table has 1,000 rows, it will have 1,000 values in the calculated column. Calculated column values are stored in the Power BI .pbix file. Each calculated column will increase the space that is used in that file and potentially increase the refresh time.

Measures are calculated on demand. Power BI calculates the correct value when the user requests it. Measures do not add to the overall disk space of the Power BI .pbix file.

Dataflow

Tldr

Dataflow is a collection of Power Query queries that are scheduled and executed together.

It's up to you how you organize the staged data in dataflows. For example, if you need to stage some tables from Dynamics 365, you can create one dataflow that has a query for each table you want to stage. So, dataflows allow you to logically group related Power Query queries

Query folding

Query folding is the ability for a Power Query query to generate a single query statement that retrieves and transforms source data.

Query folding may occur for an entire Power Query query, or for a subset of its steps. When query folding cannot be achieved—either partially or fully—the Power Query mashup engine must compensate by processing data transformations itself. This process can involve retrieving source query results, which for large datasets is very resource intensive and slow.

PowerBI Service

There are two hosting options for Power BI reports

  1. Cloud-based hosting (called Power BI Service or website): In the Power BI Service, organizations are separated using Tenants. Tenants can be managed by Azure Active Directory or Office 365. Under tenants, there will be users. These users are Azure Active Directory users.

  2. On-premises hosting (called Power BI Report Server)


Storage mode

Today PowerBI offers 4 different storage modes for tables:

Import mode

In this mode, Power BI connects with underlying data source & downloads entire data from the datasource. This data is stored in Power BI model (in an in-memory cache). Fresh copy of this data can be downloaded by pressing Refresh button. PBIX file internally stores model data in compressed format. This published datset model on Power BI Service, internally is stored on Common Data Model, which is sort of Azure Managed SQL Server instance in the backend.

Which is the fastest method

The fastest method is Import mode. Essentially, this mode allows you to load the data once into Power BI, where it is then stored. Power BI uses the in-memory VertiPaq/xVelocity engine, which is exceptionally fast and delivers results almost immediately. It offers full DAX and PowerQuery support. A quick rule of thumb is that you can typically expect about 10x compression when importing data into Power BI

The drawback is that it only permits you to refresh data 8 times per day.

DirectQuery mode

With DirectQuery datasets, no data is imported into Power BI. Instead, your Power BI dataset is simply metadata (e.g., tables, relationships) of how your model is structured to query the data source. Data is only brought into Power BI reports and dashboards at query-time (e.g., a user runs a report, or changes filters).

Pros: - Dataset size limits do not apply as all data still resides in the underlying database. - Datasets do not require a scheduled refresh as data is only retrieved at query-time.

Cons: - Typically, query-time performance will suffer, even when querying a cloud data warehouse like Snowflake. - Concurrency (e.g., multiple users running reports) against DirectQuery datasets could cause performance issues against the underlying database.

Dual mode

Dudata is imported into cache memory, but can also be served directly from the data source at the query time

Hybrid tables/ Composite mode

Tldr

  • Cold data in Import mode, Hot data in DirectQuery.
  • Hybrid tables can only be applied on a table that incremental refresh is set on it.
  • Hybrid tables are partitioned, so their most recent partition is a DirectQuery from the data source, and their historical data is imported into other partitions.
  • If your data source doesn’t support query folding, For example, it is a CSV file. Then Power BI, when connected to it, reads the entire data anyway.
  • Incremental Refresh doesn’t need a Premium or PPU license. You can even set it up using a Power BI Pro license. However, Hybrid tables require a Power BI Premium capacity or PPU.

Composite models aka hybid tables attempt to combine the best aspects from Import and DirectQuery modes into a single dataset. With Composite models, data modelers can configure the storage mode for each table in the model


PowerBI Architecture

Front-end cluster

The overall technical architecture consists of two clusters:

  • a Web Front End (WFE) cluster
  • a Back End cluster

AAD

The WFE cluster manages connectivity and authentication. Power BI relies on Azure Active Directory (AAD) to manage account authentication and management.

ATM

Power BI uses the Azure Traffic Manager (ATM) to direct user traffic to the nearest data center. Which data center is used is determined by the DNS record of the client attempting to connect. The DNS Service can communicate with the Azure Traffic Manager to find the nearest data center with a Power BI deployment

CDN

Power BI uses the Azure Content Delivery Network (CDN) to deliver the necessary static content and files to end users based on their geographical locale. The WFE cluster nearest to the user manages the user login and authentication and provides an access token to the user once authentication is successful. The ASP.NET component within the WFE cluster parses the request to determine which organization the user belongs to, and then consults the Power BI Global Service

Global Service

The Global Service is implemented as a single Azure Table that is shared among all worldwide WFE and Back End clusters. This service maps users and customer organizations to the datacenter that hosts their Power BI tenant. The WFE specifies to the browser which backend cluster houses the organization's tenant.

Once a user is authenticated, subsequent client interactions occur with the backend cluster directly and the WFE cluster is not used.

Back-end cluster

The backend cluster manages all actions the user does in Power BI Service, including visualizations, dashboards,datasets, reports, data storage, data connections, data refresh, and others.

Gateway and APIM

The Gateway Role acts as a gateway between user requests and the Power BI service. As you can see in the diagram, only the Gateway Role and Azure API Management (APIM) services are accessible from the public Internet.

When an authenticated user connects to the Power BI Service, the connection and any request by the client is accepted and managed by the Gateway Role, which then interacts on the user's behalf with the rest of the Power BI Service. For example, when a client attempts to view a dashboard, the Gateway Role accepts that request, and then sends a request to the Presentation Role to retrieve the data needed by the browser to render the dashboard.

Data Storage

As far as data storage in the cloud goes, Power BI uses two primary repositories for storing and managing data. Data that is uploaded from users or generated by dataflows is stored in Azure BLOB storage, but all the metadata definitions (dashboards, reports, recent data sources, workspaces, organizational information, tenant information) are stored in Azure SQL Database.


Performance Optimization of PowerBI

check this ZerbaBI report

Analyze the results

  • Power BI Performance Analyzer: The natural starting point is Power BI Performance Analyzer, a built-in feature of Power BI Desktop. Select the View ribbon, and then select Performance Analyzer to display the Performance Analyzer pane.

  • DAX Studio: You can analyze performance using in another tool called DAX Studio. This is a a great tool for performance analysis.

Using direct query?

There's another situation that can slow down your report. You might be creating tables that use DirectQuery as the connection mode. This means that the data is not loaded into Power BI. Instead, it is loaded using a query that runs on a database server. This means additional time to fetch the data.

This shows up as another item on the report, since Power BI first needs to fetch the data, process it using a PowerQuery and pass it on to a DAX command.

Avoid Joins

People essentially just load all the tables into Power BI and start creating relationships. Then, they need to write complex DAX formulas to tie everything together. To make everything run faster, however, you need to combine tables and merge them before you even load them into Power BI.

De-normalize data model

Combine or append the tables that are similar in structure and used for the same purpose. For example, fact tables like sales, actuals, plan and different forecasts can be combined or appended into a single table.

Incremental Refresh

When you load data from the source into the destination (Power BI), there are two methods: Full Load or Incremental Refresh. Full Load means fetching the entire dataset each time and wiping out the previous data.

Incremental refresh is an important feature to consider when working with large tables that you would like to import into memory. With Incremental Refresh, partitions are automatically created on your Power BI table based on the amount of history to retain, as well as the partition size you would like to set. Since data modelers can define the size of each partition, refreshes will be faster, more reliable, and able to build history over time.

When to use incremental refresh/

Consider you have a large dataset including 20 years of data. From that dataset, probably the data from 20 years ago won’t change anymore, or even the data from 5 years ago, sometimes even a year ago. So why re-processing it again? Why re-loading data that doesn’t update? Incremental Refresh is the process of loading only part of the data that might change and adding it to the previous dataset, which is no longer changing.

When setting up Incremental Refresh, keep in mind the following:

  • You must create RangeStart and RangeEnd date/time parameters. These parameters must be set to a date/time data type and must be named RangeStart and RangeEnd.
  • The initial refresh in the Power BI service will take the longest due to the creation of partitions and loading of historical data. Subsequent refreshes will only process the latest partition(s) based on how the feature was configured.

Partitioning

Incremental Load will split the table into partitions. The quantity of the partitions will be based on the settings applied at the time of Incremental refresh. For example, if you want to have the last year’s data refreshed only, a yearly partition will likely be created for every year, and the one for the current year will be refreshed on a scheduled basis.

Hybrid tables

Are you using hybrid tables?

Query folding

Are you using query folding and is ti finishing completely?

Avoid row-level security

Last but not least, avoid row-level security, which can be a performance killer.


Dimentional modelling

It is a set of guidelines to design database table structure for easier and faster data retrieval.

Remember

Data is stored in de-normalized form here.

  • Dimentions: Descriptive entity.

  • Facts: Quantitative entity value. They are mostly values.

Dimentional model vs Normal form models

Syntax Normal form models Dimentional
Purpose Transactional systems For reporting
Structure Complex Less comples (Denormalized data)
Operations Insert, upate and delete Mostly Select (read only)
Bitmap Index Not used Heavily used

Various kinds of dimentional modelling are =:

Star Schema

In this data model, you use a fact table, which is the table like sales that contains facts, meaning your measures. The fact table is related to your dimensions, which are things like your salespeople, products, business units, customers and so on. The fact table then has one-to-many relationships to your dimension tables.

Snowflake schema

In a snowflake schema, each property of entries in a dimension table is assigned to a new table, creating what seems like a snowflake.

Snowflake and PowerBI

We need to used datawarehouse in front of PowerBI

Data storage mode

For larger datasets, DirectQuery and Hybrid tables is a good option. Snowflake will handle the heavy lifting.

Best Practices when using DW with PowerBI

  • Make sure the data visualizations are use case driven, this includes the timing of the data, number of visualizations and other components.
  • Data Model Design: How you design your data model will have a major impact on query performance in Power BI.

  • Ideally, use a Star Schema Design for your data in Power BI with Facts and Dimensions. It is also important to have a more relational, normalized Data Vault type of an ODS model in Snowflake to get the best performance in Power BI.

  • Use Import mode for dimension tables and Direct query mode for Fact tables.

  • Use aggregations in Power BI for pre-aggregated data to get better query performance

  • Keep the dashboard simple by limiting the number of data points, visuals and queries in a page to a minimum. This will also help query performance as there will be lesser number of queries to run while refreshing the visualization.

  • Use Query reduction to limit the number of queries generated. This is especially helpful when using slicers in your visualization, where you only want the filters applied when the “Apply” button is used.

  • To make modeling easier, use the Assume referential integrity property on relationships. While the default property on a relationship in Power BI is to generate a left outer join, by using the “Assume referential integrity” property, you can force an inner join. This can make the queries faster. This property is available only in a Direct Query mode.

  • Use bi-directional filter on relationships with discretion. More bi-directional filters mean you will generate more SQL queries. These can increase the complexity of the model and increase compute costs in Snowflake.

  • Pruning micro partitions on large data sets in Snowflake, creating additional tables with larger data with alternate sets of keys driven by the end users ad-hoc query needs can result in better performance

Auth flow

  1. The user logs into Power BI service using Microsoft Azure Active Directory (Azure AD).
  2. (Optional) If the identity provider is not Azure AD, then Azure AD verifies the user through SAML authentication before logging the user into the Power BI service.
  3. When the user connects to Snowflake, the Power BI service asks Azure AD to give it a token for Snowflake
  4. The Power BI service uses the embedded Snowflake driver to send the Azure AD token to Snowflake as part of the connection string.
  5. Snowflake validates the token, extracts the username from the token, maps it to the Snowflake user, and creates a Snowflake session for the Power BI service using the user’s default role.

Was this page helpful?
-->