Sergey Botyan

BI Developer, Vizuators

Boost your Tableau calculations with Python

Why do we need calculations in Tableau? They help when the source does not contain the data necessary for visualization, but it can be obtained using calculations.

Python is more versatile in data processing and calculations but worse for visualization. So there are several tasks for which it makes sense to do pre-processing in Python and then visualize in a Tableau.


Tableau is also limited in table calculations. For one of the projects, we had to eventually calculate the ranks between the columns in Python, clinging it to the Prep. Besides, many advanced types of analysis in the scoreboard are not possible, for example, log-linear.


Why did we choose Python
to interact with Tableau?

To solve problems in the field of Machine learning, which is a trend in the development of BI systems.
Use existing Python developments (your own or available in numerous communities) in the Tableau environment.
Python has more options for working with dates, text, and data structures. Tableau complements all this by creating interactive dashboards.

The advantages of Python for BI cases

Python has a low entry threshold, and the most straightforward scripts can be written almost from scratch.
Concise and close to natural language (understandable). It's interactive (interpreter).
There are a large number of ready-made tools which can adapt to meet your needs.
A considerable amount of literature, reference materials, as well as communities where you can get tips and tricks. And another great advantage: libraries!

Top Python libraries
for data science

NumPy
https://numpy.org/
Responsible for representing and processing data in the form of arrays.
Pandas
https://pandas.pydata.org/
Responsible for the presentation and processing of one-dimensional vectors (Series) and two-dimensional data frames (Data frame).
Scikit-learn
https://scikit-learn.org/
Built on top of the NumPy, SciPy, and Matplotlib libraries.
Free and available for commercial use.

TabPy installation & setup

TabPy (the Tableau Python Server) is an Analytics Extension implementation that expands Tableau's capabilities by allowing users to execute Python scripts and saved functions via Tableau's table calculations.
The TabPy documentation strongly recommends running the server in a virtual environment.

We use the venv module for this. About is included in the number of standard modules and does not require installation.

To perform further operations, you must:
• get access to the command line (execute - cmd);
• have Python installed.
To create a virtual environment, use the command:
python –m venv ‘name of the virtual environment’

For example: python –m venv myenv

Activation: myenv\Scripts\activate
Deactivation: deactivate
Watch our two setup videos:

TabPy module connection

Python virtual environment

Writing a Python script
Using TabPy, Tableau can run a Python script on the fly and display the results as a Visualization. Users can control data being sent to TabPy by interacting in their Tableau worksheet, dashboard, or stories using parameters.
Profit prediction
To predict the profit, we calculate the expected profit values ​​using the predict() method, which takes the independent variable(s) as an argument and returns the predicted values.
x_full = pd.DataFrame(_arg1, _arg2)
p = regr.predict(x)
Functions for calculated in Python fields
SCRIPT_INT        for integer data (1, 2, 3, 5, 8);
SCRIPT_REAL       for floating-point data (3,14, 2,72, 6,07);
SCRIPT_STR        for categorical data (men/women);
SCRIPT_BOOL       for boolean data (True/False)

Let's build a regression Model!

How do we train a regression model in Python?
The fit() method of the LinearRegression() class takes an independent variable (or variables in the form of a DataFrame) and a dependent variable as arguments.
x = pd.DataFrame(_arg4, _arg5)
y = _arg3
regr = LinearRegression()
regr.fit(x,y)

Python function argument
We define that the result of the predicted values ​​will be a floating point number. use the function SCRIPT_REAL(), which will look like:

SCRIPT_REAL(“PYTHON CODE…”,
SUM([Sales]) # forecasting
SUM([Discount]) # forecasting
SUM([filtered Profit]) # training
SUM([filtered Sales]) # training
SUM([filtered Discount])) # training

Passing calculation
results to Tableau

The result of the predict() method is a vector, and Tableau only understands returned lists (list).
So before returning
results in Tableau, we convert them
to list format with pandas tolist() method
p = p.tolist()
return p

How to use the results
in Tableau?

When using a calculated field in a visualization,
it is essential to remember:
• the detail of this field (as well as the data sent and received from TabPy) is equal to the detail of the visualization;

SCRIPT functions are tabular calculations and require correction of the used field when changing the visualization.

Otherwise, the fields received from TabPy can be used on visualizations as usual and used in other calculated fields (given that they are already aggregated).

How do we find out
about errors in scripts?

We use error messages to correct our calculations. And here, it is necessary to distinguish between errors caused by Tableau and errors in Tableau-Python communication and script execution.

Script execution error

Tableau error
And we did not touch yet Python Scripts and Tableau Prep
Python works both directly with Tableau and Tableau Prep. There are a few more possibilities for using Python in the Prep. It's especially true for ETL processes.
The connection mechanism is precisely the same. In both cases, the interaction is carried out through the TabPy module. But the scripts are already written a little differently. Prep works purely with Python scripts, while Tableau requires you to "wrap" them in Tableau's formula.

Explore our possibilities

Use proved Tableau expertise
Choose consultation type

By clicking the button, you agree to the processing of personal data. We use it to stay connected. Read more