# Pandas Interface
# Why ?
Today, Python is one of the most popular languages for data analysis and data science. One of the reasons for Python's success in these areas is Pandas, the leading package for data manipulation with Python. This package has quickly become a must and is used by a very large number of people around the world.
It is therefore essential to be able to create visualizations directly from a Pandas dataframe. This can be done for example with Seaborn, a famous Python package for data visualization. Thanks to the interface presented in this section, it is also possible to do the same thing with ipychart.
# Usage
This interface allows you to quickly create charts from a pandas dataframe, without having to use the low-level syntax of Chart.js. We will use, in the rest of this section, a slightly processed version (extraction of the title from the name column) of the famous titanic dataset (opens new window). Let's start by loading this dataset with Pandas:
import pandas as pd
titanic = pd.read_csv('titanic.csv')
titanic.head()
PassengerId | Survived | Pclass | Title | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Mr | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Mrs | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Miss | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Mrs | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Mr | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
Concretely, to use ipychart's Pandas interface, we will have to call some function directly from the ipychart package. For example, to draw a bar chart using the titanic dataset, you need to execute:
import ipychart as ipc
ipc.barplot(data=titanic, x='Embarked', y='Age', hue='Survived')
The dataset is always passed through the data
parameter. The x
and y
parameters are the columns to use for the x and y axis. The hue
parameter is used to color the bars.
TIP
The hue argument allows you to display a third (categorical) column of your dataframe on the chart.
# Charts
You can find here all the functions of the ipychart package for usage with a pandas dataframe, each one corresponding to a type of chart. Each function returns a Chart object, i.e. an instance of the Chart class of ipychart package.
All functions have two parameters in common: dataset_options
and options
. The dataset_options
parameter allows you to set the options for each dataset, as with the Chart class. If you don't use the hue
parameter, the chart will have only one dataset and you will have to pass a dictionary. Otherwise, the Chart will have N datasets (each one corresponding to a distinct value of the column selected in the hue
parameter) and you must pass a list of dictionaries. In the same way, you can use the options
parameter to customize the Chart, like when you use the Chart class.
# Count
TIP
This chart can only be created from a single column of a Pandas dataframe.
The count chart shows the count of observations in each categorical bin using bars. To draw it, you must call the count method:
ipc.countplot(data: pd.DataFrame,
x: str,
hue: str = None,
horizontal: bool = False,
dataset_options: dict = {},
options: dict = None,
colorscheme: str = None,
zoom: bool = True) -> ipc.Chart
- data : pd.DataFrame
Data used to draw the chart. - x : str
Column of the dataframe used as datapoints for x Axis. - hue (optionnal): str
Grouping variable that will produce points with different colors. - horizontal (optionnal): bool
Draw the bar chart horizontally. Defaults to False. - dataset_options (optional): dict
These are options directly related to the dataset object (i.e. options concerning your data). - options (optional): dict
All options to configure the chart. This dictionary corresponds to the "options" argument of Chart.js. - colorscheme (optional): str
Colorscheme to use when drawing the chart. List of available colorscheme: link. - zoom (optional): bool
Allow the user to zoom on the Chart once it is created. Defaults to True.
Example:
ipc.countplot(data=titanic, x='Embarked')
# Dist
TIP
This chart can only be created from a single column of a Pandas dataframe.
Fit and plot a univariate kernel density estimate on a line chart. This chart is useful to have a representation of the distribution of the data. To draw it, you must call the dist method:
ipc.distplot(data: pd.DataFrame,
x: str,
bandwidth: Union[float, str] = 'auto',
gridsize: int = 1000,
dataset_options: dict = {},
options: dict = None,
colorscheme: str = None,
zoom: bool = True,
**kwargs) -> ipc.Chart
- data : pd.DataFrame
Data used to draw the chart. - x : str
Column of the dataframe used as datapoints for x Axis. - bandwidth (optionnal): float, str
Parameter which affect how “smooth” the resulting curve is. If set to 'auto', the optimal bandwidth is found using gridsearch. - gridsize (optionnal): int
Number of discrete points in the evaluation grid. - dataset_options (optional): dict
These are options directly related to the dataset object (i.e. options concerning your data). - options (optional): dict
All options to configure the chart. This dictionary corresponds to the "options" argument of Chart.js. - colorscheme (optional): str
Colorscheme to use when drawing the chart. List of available colorscheme: link. - zoom (optional): bool
Allow the user to zoom on the Chart once it is created. Defaults to True. - kwargs (optional): dict
Other keyword arguments are passed down to scikit-learn's KernelDensity class.
Example:
ipc.distplot(data=titanic, x='Age')
# Line
A line chart is a way of plotting data points on a line. Often, it is used to show a trend in the data, or the comparison of two data sets. To draw it, you must call the line method:
ipc.lineplot(data: pd.DataFrame,
x: str,
y: str,
hue: str = None,
agg: str = 'mean',
dataset_options: [dict, list] = {},
options: dict = None,
colorscheme: str = None,
zoom: bool = True) -> ipc.Chart
- data : pd.DataFrame
Data used to draw the chart. - x : str
Column of the dataframe used as datapoints for x Axis. - y : str
Column of the dataframe used as datapoints for y Axis. - hue (optionnal): str
Grouping variable that will produce points with different colors. - agg (optionnal): str
The aggregator used to gather data (ex: 'median' or 'mean'). - dataset_options (optional): dict
These are options directly related to the dataset object (i.e. options concerning your data). - options (optional): dict
All options to configure the chart. This dictionary corresponds to the "options" argument of Chart.js. - colorscheme (optional): str
Colorscheme to use when drawing the chart. List of available colorscheme: link. - zoom (optional): bool
Allow the user to zoom on the Chart once it is created. Defaults to True.
Example:
datalabels_arguments = {
"display": True,
"borderWidth": 1,
"anchor": "end",
"align": "end",
"borderRadius": 5,
"color": "#fff",
}
ipc.lineplot(
data=titanic,
x="Pclass",
y="Age",
hue="Sex",
dataset_options={"fill": False, "datalabels": datalabels_arguments},
colorscheme="office.Parallax6",
)
# Bar
A bar chart provides a way of showing data values represented as vertical bars. It is sometimes used to show a trend in the data, and the comparison of multiple data sets side by side. To draw it, you must call the bar method:
ipc.barplot(data: pd.DataFrame,
x: str,
y: str,
hue: str = None,
horizontal: bool = False,
agg: str = 'mean',
dataset_options: Union[dict, list] = {},
options: dict = None,
colorscheme: str = None,
zoom: bool = True) -> ipc.Chart
- data : pd.DataFrame
Data used to draw the chart. - x : str
Column of the dataframe used as datapoints for x Axis. - y : str
Column of the dataframe used as datapoints for y Axis. - hue (optionnal): str
Grouping variable that will produce points with different colors. - horizontal (optional): bool
Draw the bar chart horizontally. - agg (optionnal): str
The aggregator used to gather data (ex: 'median' or 'mean'). - dataset_options (optional): dict
These are options directly related to the dataset object (i.e. options concerning your data). - options (optional): dict
All options to configure the chart. This dictionary corresponds to the "options" argument of Chart.js. - colorscheme (optional): str
Colorscheme to use when drawing the chart. List of available colorscheme: link. - zoom (optional): bool
Allow the user to zoom on the Chart once it is created. Defaults to True.
Example:
ipc.barplot(
data=titanic,
x="Pclass",
y="Fare",
hue="Sex",
colorscheme="office.Parallax6",
)
# Radar
A radar chart is a way of showing multiple data points and the variation between them. They are often useful for comparing the points of two or more different data sets. To draw it, you must call the radar method:
ipc.radarplot(data: pd.DataFrame,
x: str,
y: str,
hue: str = None,
agg: str = 'mean',
dataset_options: Union[dict, list] = {},
options: dict = None,
colorscheme: str = None) -> ipc.Chart
- data : pd.DataFrame
Data used to draw the chart. - x : str
Column of the dataframe used as datapoints for x Axis. - y : str
Column of the dataframe used as datapoints for y Axis. - hue (optionnal): str
Grouping variable that will produce points with different colors. - agg (optionnal): str
The aggregator used to gather data (ex: 'median' or 'mean'). - dataset_options (optional): dict
These are options directly related to the dataset object (i.e. options concerning your data). - options (optional): dict
All options to configure the chart. This dictionary corresponds to the "options" argument of Chart.js. - colorscheme (optional): str
Colorscheme to use when drawing the chart. List of available colorscheme: link.
Example:
ipc.radarplot(
data=titanic,
x='Title',
y='Fare',
colorscheme='office.Yellow6'
)
# Doughnut, Pie & Polar Area
Doughnut and pie charts are excellent at showing the relational proportions between data. Polar Area charts are similar to doughnut and pie charts, but each segment has the same angle - the radius of the segment differs depending on the value. To draw one of these charts, you must call the doughnut method, the pie method or the polararea method:
ipc.doughnutplot(data: pd.DataFrame,
x: str,
y: str,
agg: str = 'mean',
dataset_options: Union[dict, list] = {},
options: dict = None,
colorscheme: str = None) -> ipc.Chart
ipc.pieplot(data: pd.DataFrame,
x: str,
y: str,
agg: str = 'mean',
dataset_options: Union[dict, list] = {},
options: dict = None,
colorscheme: str = None) -> ipc.Chart
ipc.polarplot(data: pd.DataFrame,
x: str,
y: str,
agg: str = 'mean',
dataset_options: Union[dict, list] = {},
options: dict = None,
colorscheme: str = None) -> ipc.Chart
- data : pd.DataFrame
Data used to draw the chart. - x : str
Column of the dataframe used as datapoints for x Axis. - y : str
Column of the dataframe used as datapoints for y Axis. - agg (optionnal): str
The aggregator used to gather data (ex: 'median' or 'mean'). - dataset_options (optional): dict
These are options directly related to the dataset object (i.e. options concerning your data). - options (optional): dict
All options to configure the chart. This dictionary corresponds to the "options" argument of Chart.js. - colorscheme (optional): str
Colorscheme to use when drawing the chart. List of available colorscheme: link.
Example:
ipc.polarplot(
data=titanic,
x="Title",
y="Fare",
colorscheme="brewer.SetThree5"
)
# Scatter
Scatter charts are based on basic line charts with the x axis changed to a linear axis. To draw it, you must call the scatter method:
ipc.scatterplot(data: pd.DataFrame,
x: str,
y: str,
hue: str = None,
dataset_options: Union[dict, list] = {},
options: dict = None,
colorscheme: str = None,
zoom: bool = True) -> ipc.Chart
- data : pd.DataFrame
Data used to draw the chart. - x : str
Column of the dataframe used as datapoints for x Axis. - y : str
Column of the dataframe used as datapoints for y Axis. - hue (optionnal): str
Grouping variable that will produce points with different colors. - dataset_options (optional): dict
These are options directly related to the dataset object (i.e. options concerning your data). - options (optional): dict
All options to configure the chart. This dictionary corresponds to the "options" argument of Chart.js. - colorscheme (optional): str
Colorscheme to use when drawing the chart. List of available colorscheme: link. - zoom (optional): bool
Allow the user to zoom on the Chart once it is created. Defaults to True.
Example:
ipc.scatterplot(
data=titanic,
x="Age",
y="Fare",
hue="Survived",
colorscheme="tableau.ColorBlind10",
)
# Bubble
A bubble chart is used to display three-dimension data. The location of the bubble is determined by the first two dimensions and the corresponding horizontal and vertical axes. The third dimension is represented by the radius of the individual bubbles. To draw it, you must call the bubble method:
ipc.bubbleplot(data: pd.DataFrame,
x: str,
y: str,
r: str,
hue: str = None,
dataset_options: Union[dict, list] = {},
options: dict = None,
colorscheme: str = None,
zoom: bool = True) -> ipc.Chart
- data : pd.DataFrame
Data used to draw the chart. - x : str
Column of the dataframe used as datapoints for x Axis. - y : str
Column of the dataframe used as datapoints for y Axis. - r : str
Column of the dataframe used as radius for bubbles. - hue (optionnal): str
Grouping variable that will produce points with different colors. - dataset_options (optional): dict
These are options directly related to the dataset object (i.e. options concerning your data). - options (optional): dict
All options to configure the chart. This dictionary corresponds to the "options" argument of Chart.js. - colorscheme (optional): str
Colorscheme to use when drawing the chart. List of available colorscheme: link. - zoom (optional): bool
Allow the user to zoom on the Chart once it is created. Defaults to True.
Example:
ipc.bubbleplot(
data=titanic,
x="Age",
y="Fare",
r="Pclass",
hue="Survived",
colorscheme="office.Headlines6",
)