How to make a reproducible data sample in PowerBI using Python?

This is a self-answered post. Why? Because many questions in Power BI go unanswered because of lacking data samples. Also, many seem to wonder how to edit data tables in Power BI using Python. And, of course, the world needs a more wide-spread usage of Python in Power BI. Some think that you have to apply a Python snippet to an existing table loaded elsewhere. My answer to this post will show you how to build a (fairly big) data sample with a few lines of code in an otherwise empty Power BI file.

So, how can you build a data sample and make changes to it using Python in Power BI?

Here is Solutions:

We have many solutions to this problem, But we recommend you to use the first solution because it is tested & true solution that will 100% work for you.

Solution 1

I’ll show you how to build a dataset of 10000 rows that contains both categorical and numerical values. I’m using the Python libraries numpy and pandas for the data generation and table operations, respectively. The snippet below simply draws a random element from two lists 10000 times to build two columns with a few street and city names, and adds a list of random numbers into the mix. Then I’m using pandas to organize the data in a dataframe. Using Python in the Power BI Power Query Editor, your input has to be a table, and your output has to be a pandas dataframe.

Python snippet:

import numpy as np
import pandas as pd

np.random.seed(123)
streets=['Broadway', 'Bowery', 'Houston Street']
cities=['New York', 'Chicago', 'Baltimore']

rows = 1000

lst_cities=np.random.choice(cities,rows).tolist()
lst_streets=np.random.choice(streets,rows).tolist()
lst_numbers= np.random.randint(low=0, high=100, size=rows).tolist()
df_dataset=pd.DataFrame({'City':lst_cities,
                      'Street':lst_streets,
                      'ID':lst_numbers})
df_metadata = pd.DataFrame([df_dataset.shape])

Power BI:

In Power BI Desktop, click Enter Data to go to the Power Query Editor. In the following dialog window, do absolutely nothing but clicking OK. The result is an empty table and two steps under Applied steps:

enter image description here

Now, use Transform > Run Python Script, insert the snippet above and click OK to get this:

enter image description here

You now have a preliminary table with 2 columns and 3 rows. And this is a pretty neat detail of the implementation of Python in Power BI. These are three different datasets that are made available to you after running your snippet. Dataset is constructed by default, but is empty since we started out with an empty table. If we started out with some other data, the first line of the Run Python Script explains the purpose of this table # 'dataset' holds the input data for this script. And it is constructed in the form of a pandas dataframe. The last table df_metadata is only a brief description of the dataset we’re really interested in: df_dataset, but I’ve added it to the mix in order to illustrate that all dataframes made by you in your snippet will be available to you. You chose which table to continue working on by clicking Table next to the name.

enter image description here

And that’s it! You now have a table of mixed datatypes to keep working on either using Python or Power BI itself:

enter image description here

From here you can:

  1. Keep working on your table using any menu option
  2. Insert another Python script
  3. Duplicate your original dataframe and keep working on another version by creating a Reference by right-clicking Table under Queries:

enter image description here

Solution 2

Great explanation vestland. I’m adding another way without python.


1. Open Power Query and select "Enter data"

How to make a reproducible data sample in PowerBI using Python?


2. Enter or paste data

You can enter your data manually or – way easier and faster – copy it from Excel or any other structured table into the GUI.

How to make a reproducible data sample in PowerBI using Python?


3. Load data and copy M-Code

When you have loaded the data, open the advanced editor and copy the full code:

How to make a reproducible data sample in PowerBI using Python?

And copy the code into your question, like:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUTICYmOlWJ1oJRMgyxSIzcA8cyDLAogtlWJjAQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Col1 = _t, Col2 = _t, Col3 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Col1", Int64.Type}, {"Col2", Int64.Type}, {"Col3", Int64.Type}})
in
    #"Changed Type"

The binary at the beginning of the code contains the entire structure of the table. This makes it very easy for everyone who wants to help.


Note: Use and implement solution 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply