Creating Statistical Plots with the Seaborn Python Library

choubertsprojects

VPN offers!

1. NordVPN

2. Surfshark

3. ExpressVPN

The Seaborn Python Library is a data visualization library that makes it easy to apply statistical plots and analyses to your datasets. This article will walk through how you can easily create graphs of data using the seaborn library, including scatterplots, box-and-whisker plots, histograms, and more.

The “seaborn python examples” is a library that allows users to create statistical plots. The seaborn library is used in many different ways, such as creating histograms and boxplots.

Creating Statistical Plots with the Seaborn Python Library

Creating statistical graphs in Python may be time-consuming, particularly if you’re doing it by hand. You can simplify your work and produce stunning charts fast and with fewer lines of code with the Seaborn Python data visualization module.

Creating stunning statistical graphs for your data is a breeze using Seaborn. Using real-life examples, this article will teach you how to utilize this powerful library.

Prerequisites

This will be a hands-on presentation in this course. If you want to follow along, make sure you have the following items on hand:

  • Python and Anaconda must be installed on a Windows or Linux PC. On a Windows 10 PC, Anaconda 2021.11 with Python 3.9 will be used in this lesson.

How Do You Install Python 3.6? Related:How Do You Install Python 3.6?

What is the Seaborn Python Library, and what does it do?

The Seaborn Python library is a Matplotlib-based data visualization toolkit for Python. For constructing statistical charts and graphs, Seaborn provides a comprehensive collection of high-level capabilities. The ability of Seaborn to interface with Pandas Dataframe objects enables you to easily view data.

A DataFrame is a container for tabular data, such as that found in a table, spreadsheet, or CSV file (comma-separated values).

Seaborn works with Pandas DataFrames and turns data into Matplotlib-compatible code beneath the hood.

While there are numerous high-quality plots to choose from, you’ll learn about the three most frequent built-in Seaborn plot families in this lesson to get you started.

  • Plots with a relationship.
  • Plots of distribution.
  • Plots using Categorical Data.

There are many more plots in Seaborn, and this guide will not be able to cover them all. The Seaborn API documentation and tutorial are fantastic places to start learning about the many types of Seaborn plots.

Getting Started with JupyterLab and Seaborn Python

You’ll need to set up a Jupyter Lab environment before you can begin your Seaborn adventure. You’ll also be working on a particular dataset in conjunction with this lesson for consistency with the examples.

JupyterLab is an online program that combines code, rich text, graphs, and other material into a single document. Notebooks may also be shared with others online or used as executable documents.

Follow these steps to get started setting up your environment.

1. On your PC, launch Anaconda Navigator.

a. On a Windows computer: Click Start —> Anaconda3 —> Anaconda Navigator.

Getting Anaconda Navigator to Run on Windows Getting Anaconda Navigator to Run on Windows

a. On a Linux computer, use the terminal to run the anaconda-navigator command.

2. In the Anaconda Navigator, locate and launch the JupyterLab program. This will start a web browser instance of JupyterLab.

JupyterLab for Lunch JupyterLab for Lunch

3. Open the File Browser sidebar after opening JypyterLab and create a new folder named ATA Seaborn under your profile or home directory. Your project directory will be in this new location.

In JupyterLab, create a new project directory. In JupyterLab, create a new project directory.

4. Now, open a new tab in your browser and download the Pokemon dataset. Make sure the ata pokemon.csv file is saved to the project directory you established, in this case ATA Seaborn.

5. Double-click the ATA Seaborn folder on the JupyterLab. The ata pokemon.csv file should now be in that folder.

Getting started with the project directory Getting started with the project directory

6. To create a new notebook, click the Python 3 button in the Notebook section of the Launcher tab.

Making a brand-new notepad Making a brand-new notepad

7. Rename the new notebook Untitled.ipynb by clicking it and pressing F2. Replace ata pokemon.ipynb with ata pokemon.ipynb.

Changing the notebook's name Changing the notebook’s name

8. Finally, give your journal a title. This phase is optional, however it is suggested to help your project stand out.

Select Markdown from the Code dropdown option on the toolbar of your notebook.

Changing the cell type to Markdown is a good idea. Changing the cell type to Markdown is a good idea.

9. In the markdown box, type “# Pokemon Data Visualization” and hit the Shift + Enter keys.

Adding a heading Adding a heading

The cell type option switches to Code automatically, and the notebook’s title changes to Pokemon Data Visualization at the top.

The notebook has a title and is ready to receive orders. The notebook has a title and is ready to receive orders.

Finally, use the Ctrl + S keys to save your work.

Make sure you save your work on a regular basis. You should save your work often to prevent losing anything if your internet connection fails. To save your progress, use CTRL+S whenever you make a modification. You may also use the toolbar’s Save button.

The Pandas and Seaborn Python Libraries are imported.

Importing the appropriate libraries is a common first step in Python coding. You’ll be using the Pandas and Seaborn Python libraries in this project.

Copy the code below and paste it into the command cell on your notebook to import Pandas and Seaborn.

Remember to hit Shift + Enter to execute the code or instructions in the command cell.

Seaborn libraries should be imported as sns. # import pandas as pd import pandas libraries

Then, to apply the Seaborn default theme aesthetics to the plots you’ll be creating, use the command below.

Seaborn comes with five pre-installed themes. Darkgrid (default), whitegrid, dark, white, and ticks are the options.

The default theme is applied after importing the libraries. The default theme is applied after importing the libraries.

Using the Sample Dataset to Import

Let’s import the data from the dataset into your JupyterLab environment now that you’ve set up your JupyterLab setup.

To import the data, use the pd.read csv() function in the cell. To specify the file to import surrounded in double-quotes, the dataset filename must be within the parenthesis.

The ata pokemon.csv file will be imported and the dataset will be saved to the pokemon variable using the command below.

pd.read csv(“ata pokemon.csv”) = pokemon

2. To see the first five rows of the imported dataset, use the pokemon.head() function.

The output will be as follows.

The dataset is being imported and previewed. The dataset is being imported and previewed.

3. Double-click on the ata_pokemon.csv file on the left to inspect every individual row. The output will be as follows.

As you can see, this dataset is quite easy to deal with since each observation is listed in a single row, and all numerical data is separated into different columns.

Analyzing the data Analyzing the data

Now, to aid in the analysis, let’s ask some questions regarding the dataset.

  • What’s the connection between attack and health?
  • What is the Attack distribution?
  • What’s the connection between Type and Attack?
  • What is the Attack Distribution for each Type?
  • What is each Type’s average, or mean, Attack?
  • What is the total number of Pokemon in each type?

Many of these concerns are concerned with numerical and categorical data connections. Categorical data is non-numerical data, such as the Type of Pokemon in this example dataset.

You may use Seaborn to examine data that contains both categorical and numerical data, unlike Matplotlib, which is geared for making graphs with solely numerical data.

How to Manage and Read CSV Files in Python

Making Plots for Relationships

You’ve imported a dataset, right? What comes next? You’ll now take your imported data and create statistical charts using it. To identify the link between HP and Attack statistics, start by developing relational or relationship graphing.

When determining probable correlations between variables in your dataset, relationship graphing is useful. Scatter plots and line plots are two types of plots available in Seaborn for tracing out correlations.

Plotting using lines

Creating a line plot requires you to call the Seaborn Python lineplot() function. This function takes three parameters — data=<data source>, x='<x-axis value>’, and y='<y-axis value>‘.

Copy and paste the command below into a Jupyter command cell. This command utilizes the pokemon object as the data source, the HP column data for the x-axis, and the Attack data for the y-axis that you previously imported.

sns.lineplot(data=pokemon, x=’HP’, y=’Attack’) sns.lineplot(data=pokemon, x=’HP’, y=’Attack’)

The line plot, as you can see below, does a poor job of displaying information that may be easily analyzed. When displaying an x-axis that follows a continuous variable such as time, a line plot is preferable.

You’re graphing a discrete variable called HP in this example. As a result, the line plot strays all over the place. It’s also more difficult to detect a pattern.

Plotting using lines Plotting using lines

Plotting Scatter

Trying out new things to discover what works is an element of exploratory data analysis. And you’ll discover that certain plots might provide you with more insights than others.

So, what makes a relationship plot better than a line plot? Scatter plots are a kind of scatter plot.

To create a scatter plot, use the sns.scatterplot function and provide three parameters: data=pokemon, x=HP, and y=Attack.

To construct a scatterplot for the pokemon dataset, use the following command.

sns.scatterplot(data=pokemon, x=’HP’, y=’Attack’) sns.scatterplot(data=pokemon, x=’HP’, y=’Attack’)

The scatter plot reveals that there may be a general positive association between HP (x-axis) and Attack (y-axis), with one outlier, as seen in the following result.

In general, when a character’s HP rises, so does his or her Attack. Pokemon with more health points are usually more powerful.

Plotting Scatter Plotting Scatter

Plotting Scatter with Legends

While the scatter plot has provided a more comprehensible data representation, you can enhance the graph even further by including a legend to break out the type distribution.

In the next example, use the sns.scatterplot() method once again. However, this time add the hue=’Type’ keyword, which will generate a legend displaying the various Pokemon Types. Run the command below in your Jupyter notebook tab.

sns.scatterplot(data=pokemon, x=’HP’, y=’Attack’, hue=’Type’) sns.scatterplot(data=pokemon, x=’HP’, y=’Attack’, hue=’Type’)

The scatter plot in the result below now has distinct colors. Because of the visual differences provided by the legend, analyzing the categorical features of your data is now much easier.

Plotting Scatter with Hue Plotting Scatter with Hue

Even better, you may use the sns.relplot() method with the col=Type and col wrap keyword parameters to further split down the plot.

To produce a plot for each Pokemon Type in a multi-plot grids style, use the script below in Jupyter.

sns.relplot(data=pokemon, x=’HP’, y=’Attack’, hue=’Type’, col=’Type’, col wrap=3) sns.relplot(data=pokemon, x=’HP’, y=’Attack’, hue=’Type’, col=’Type’, col wrap=3)

You may see from the graph below that HP and Attack are typically positively connected. Pokemon with greater HP are usually more powerful.

Plotting Scatter with Type and Col Plotting Scatter with Type and Col

Do you think adding colors and legends to a narrative makes it more interesting?

Creating Plots of Distribution

You built a scatterplot in the previous section. This time, we’ll utilize a distribution plot to learn more about the Attack and HP distributions for each Pokemon Type.

Plotting Histograms

The histogram may be used to display the distribution of a variable. The variable in your example dataset is the Pokemon’s Attack.

Use the sns.histplot() method to make a histogram plot. The arguments for this function are data=pokemon and x=’Attack.’ Copy and execute the command below in Jupyter.

sns.histplot(data=pokemon, x=’Attack’) sns.histplot(data=pokemon, x=’Attack’) sns.hist

Histogram of Pokemon Attacks Histogram of Pokemon Attacks

Seaborn automatically selects the best bin size for you when constructing a histogram. You may wish to experiment with various bin sizes to see how the data is distributed in different shaped groups.

Add the bins=x parameter to the command to provide a fixed or custom bin size, where x is the custom bin size. To make a histogram with a bin size of 10, use the command below.

sns.histplot(data=pokemon, x=’Attack, bins=10) sns.histplot(data=pokemon, x=’Attack’, bins=10)

The Pokemon Attack looks to have a bimodal distribution in the previous histogram you created (two big humps.)

When you look at your bin size of 10, though, the groups are more segmented. You can see that the distribution is more unimodal, with a rightward bias.

Histogram of Pokemon Attacks with a bin size of 10 Histogram of Pokemon Attacks with a bin size of 10

Plotting of Kernel Density Estimation (KDE)

Kernel density estimation charting is another technique to show distribution. KDE is similar to a histogram, except instead of columns, it uses curves.

The benefit of utilizing a KDE plot is that the probability curve, which shows traits like central tendency, modality, and skew, allows you to draw faster assumptions about how the data is distributed.

Call the sns.kdeplot() method with the parameters data=pokemon, x=’Attack’ to produce a KDE plot. To see the KDE plot in action, run the code below in Jupyter.

sns.kdeplot(data=pokemon, x=’Attack’) sns.kdeplot(data=pokemon, x=’Attack’) sns.kde

The skewing of the KDE plot is comparable to that of a histogram with a bin size of 10.

KDE PlotKDE Plot

Why not combine the histogram and KDE since they are so similar? By adding the keyword kde=’true’ parameter to the preceding command, Seaborn allows you to overlay the KDE over a histogram, as seen below.

sns.histplot(data=pokemon, x=’Attack’, bins=10, kde=’true’) sns.histplot(data=pokemon, x=’Attack’, bins=10, kde=’true’)

The output will be as follows. According to the histogram below, most Pokemon have an Attack point distributed between 50 and 120. Isn’t that a nice spread!

Histogram of Pokemon Attacks with KDE overlay Histogram of Pokemon Attacks with KDE overlay

To break down each attack distribution by Type, use the col keyword in the displot() method to construct a multi-grid plot for each Type.

sns.displot(data=pokemon, x=’Attack’, col=’Type’, bins=10, col wrap=3) sns.displot(data=pokemon, x=’Attack’, col=’Type’, bins=10, col wrap=3)

The output will be as follows.

Histogram with many grids Histogram with many grids

Plotting Categorical Data

It’s helpful to have different histograms for each type group. Histograms, on the other hand, may not provide you with a clear image. So, let’s take a look at some of Seaborn’s category charts to see how they may help you dig further into the attacks data depending on Pokemon kinds.

Plotting in Strips

You attempted to depict the Attack data according to a categorical variable in the preceding scatter plots and histograms (Type). You’ll create a strip plot this time, which is a collection of scatter plots organized by category.

Call the sns.stripplot() method with three arguments: data=pokemon, x=’Type,’ and y=’Attack’ to make your category strip plot. To create the category strip plot, run the code below in Jupyter.

sns.stripplot(data=pokemon, x=’Type’, y=’Attack’) sns.stripplot(data=pokemon, x=’Type’, y=’Attack’)

You now have a strip plot with all of the observations organized by Type. However, have you seen how the x-axis labels have all been smushed together? Isn’t that not very helpful?

Plotting in Strips Plotting in Strips

You’ll need to use a separate function called catplot to correct the x-axis labels ().

Run the sns.catplot() function in your Jupyter notebook command cell, passing in five arguments: kind=’strip’, data=pokemon, x=’Type’, y=’Attack’, andaspect=2, as shown below.

sns.catplot(kind=’strip’, data=pokemon, x=’Type’, y=’Attack’, aspect=2) sns.catplot(kind=’strip’, data=pokemon, x=’Type’, y=’Attack’, aspect=2)

The resultant pot now displays the x-axis labels in full width, making your analysis easier.

Plotting in Strips with the catplot() function Plotting in Strips with the catplot() function

Plotting in Boxes

Another subfamily of plots in the catplot() method will let you visualize data distribution using a category variable. The box plan is one of them.

Run the sns.catplot() method with the following arguments: data=pokemon, kind=’box, x=’Type, y=’Attack, and aspect=2′ to produce a box plot.

The aspect argument determines how far apart the x-axis labels are spaced. A wider spread is indicated by a greater value.

sns.catplot(data=pokemon, kind=’box’, x=’Type’, y=’Attack’, aspect=2) sns.catplot(data=pokemon, kind=’box’, x=’Type’, y=’Attack’, aspect=2)

This output provides a summary of the data distribution. You may receive data distributed for each Pokemon Type on one plot by using the catplot() method.

Outliers are shown by the black diamond markings. A line in the centre, rather than a box plot, indicates that there is only one observation for that Pokemon Type.

Each of these box and whisker graphs has a five-number summary. The median value or center tendency of Attack points is shown by the line in the centre of the box.

The first and third quartiles, as well as the whiskers, reflect the maximum and lowest values.

Plotting in Boxes Plotting in Boxes

Plotting with a violin

The violin plot is another method of showing the distribution. The violin plot is similar to a KDE mix and a box plot. Box plots are similar to violin plots.

To create a violin plot, replace the kind value to violin, while the rest are the same as when you ran the Plotting in Boxes command. Run the code below to create a violin plot.

sns.catplot(kind=’violin’, data=pokemon, x=’Type’, y=’Attack’, aspect=2) sns.catplot(kind=’violin’, data=pokemon, x=’Type’, y=’Attack’, aspect=2)

As a consequence, the median, first, and third quartiles are all included in the violin plot. The violin plot summarizes the data in a similar way to the box plot.

Plotting with a violin Plotting with a violin

Returning to the original topic, what is the attack distribution for each Pokemon type?

The box plot reveals that the minimum Attack points are 0 to 10, and the highest is 110.

Normal Type Pokemon seem to have a median attack point of about 75. The first and third quartiles seem to be approximately 55 and 105, respectively.

Plotting Using Bars

The bar plot is a member of Seaborn’s categorical estimate family that displays the mean or average values for each data category.

To make a bar plot, use Jupyter’s sns.catplot() function with the following arguments: kind=’bar’, data=pokemon, x=’Type’, y=’Attack’, and aspect=2.

sns.catplot(kind=’bar’,data=pokemon,x=’Type’,y=’Attack’,aspect=2)

The black lines on each bar are error bars that show uncertainty, similar to outliers in the data. As you can see in the table below, the average values are:

  • For Water-type Pokemon, it’s about 90.
  • Grass has a score of about 60.
  • Electricity is at 75 percent.
  • Rock might be 70 years old.
  • Within 75, the ground.
  • And so on.

Plotting Using Bars Plotting Using Bars

Plotting by Count

What if you want to plot the Pokemon count rather of the mean/average data? With the Seaborn Python package, you can achieve it with the count plot.

Replace the type value with count in the code below to get a count plot. The count plot, unlike the bar plot, only requires one data axis. Specify either the x-axis or the y-axis solely, depending on the plot orientation you wish to produce.

The following command generates a count plot with the type variable on the x-axis.

sns.catplot(kind=’count’, data=pokemon, x=’Type’, aspect=2) sns.catplot(kind=’count’, data=pokemon, x=’Type’, aspect=2) sns.catplo

A count plot similar to the one below will appear. The most prevalent kinds of Pokemon, as you can see, are:

  • Typical (6).
  • Psychic abilities (5).
  • a body of water (4).
  • Grassland (4).
  • And so on.

Plotting by Count Plotting by Count

Conclusion

You’ve learned how to use the Seaborn Python module to build statistical graphs programmatically in this lesson. Which charting approach do you believe will be best for your dataset?

Why not start working on new plots on your own now that you’ve gone through the examples and tried developing plots using Seaborn? Maybe you should start with the Iris dataset or collect your sample data first?

Try out some of the other Seaborn built-in layouts and color palettes while you’re at it! Thank you for taking the time to read this, and have a good time!

The “seaborn plots” is a library that allows you to create statistical plots with Python. It is available on the Seaborn GitHub repository.

Frequently Asked Questions

How do you plot in Python Seaborn?

A: The recommended way to plot in Python Seaborn is by using pandas DataFrame.

What is Seaborn library in Python used for?

A: The seaborn library is used for statistical and scientific computing.

What are libraries required to draw plots with Seaborn?

A: Libraries are used to create graphs with Seaborn. They represent variables, which are the values that data is collected or measured for in your dataset and can be manipulated by you. You must have a variable called x in order to draw plots on a graph of the data.

Related Tags

  • seaborn tutorial
  • seaborn documentation
  • seaborn scatter plot
  • import seaborn as sns
  • seaborn github