Creating animation graph with matplotlib FuncAnimation in Jupyter Notebook

Photo by Firmbee.com on Unsplash

When fitting values to a line using Linear Regression, it can be very helpful to illustrate how the line fits the data as more data are added. In this article, you’ll learn how to create a Matplotlib animation, this article extends the topic from the previous article “animating a simple sine wave in Jupyter Notebook” to fit values to a line using Linear Regression.

Matplotlib is one of the most popular plotting libraries for exploratory data analysis. Plotting a static graph should work well in most cases, but when you are running simulations or doing time-series data analysis, basic plots…


Tips and tricks to transform numerical data into categorical data

Photo by v2osk on Unsplash

Numerical data is common in data analysis. Often you have numerical data that is continuous, very large scales, or highly skewed. Sometimes, it can be easier to bin those data into discrete intervals. This is very helpful to perform descriptive statistics when values are divided into meaningful categories. For example, age group instead of the exact age, weight class instead of the exact weight, grade level instead of the exact score.

Pandas has 2 built-in functions cut() and qcut() for transforming numerical data into categorical data.

  • cut() bins data into discrete intervals based on bin edges
  • qcut() bins data into…

Creating animation graph with matplotlib FuncAnimation in Jupyter Notebook

Photo by Isaac Smith on Unsplash

Matplotlib is one of the most popular plotting libraries for exploratory data analysis. It’s the default plotting backend in Pandas and other popular plotting libraries are based on it, for instance, seaborn. Plotting a static graph should work well in most cases, but when you are running simulations or doing time-series data analysis, basic plots may not always be enough. You may want to show an animation that helps you understand how the state changes over time.

In this article, you’ll learn how to create animations using matplotlib in Jupyter Notebook. This article is structured as follows:

  1. Interactive Plot in…

Pandas tips and tricks to help you get started with Data Analysis

Photo by Victoriano Izquierdo on Unsplash

Pandas is one of the most important libraries for Data Manipulation and Analysis. It not only offers data structure & operations for manipulating data but also prints the result in a pretty tabular form with labeled rows and columns.

In most cases, the default settings of Pandas display should work well, but you may want your data to be displayed in some format that other than its default one. Pandas has an options system that allows you to customize display-related options. Display options can be configured using either methods or attributes as follows:

# Use methods
import pandas as pd

Tip and tricks to improve your Google Colab Experience

Photo by Ehimetalor Akhere Unuabona on Unsplash

Colab (short for Colaboratory) is a free platform from Google that allows users to code in Python. Colab is essentially the Google version of a Jupyter Notebook. Some of the advantages of Colab over Jupyter include zero configuration, free access to GPUs & CPUs, and seamless sharing of code.

More and more people are using Colab to take the advantage of the high-end computing resources without being restricted by their price. Loading data is the first step in any data science project. Often, loading data into Colab require some extra setups or coding. In this article, you’ll learn the 7…


Pandas tips and tricks to help you get started with data analysis

Photo by Anastasiia Chepinska on Unsplash

A MultiIndex (also known as a hierarchical index) DataFrame allows you to have multiple columns acting as a row identifier and multiple rows acting as a header identifier. With MultiIndex, you can do some sophisticated data analysis, especially for working with higher dimensional data. Accessing data is the first step when working on a MultiIndex DataFrame.

In this article, you’ll learn how to access data in a MultiIndex DataFrame. This article is structured as follows:

  1. Selecting data via the first level index
  2. Selecting data via multi-level index
  3. Select a range of data using slice
  4. Selecting all content using slice(None)
  5. Using…

Pandas tips and tricks to help you get started with Data Analysis

Photo by Sanah Suvarna on Unsplash

When doing data analysis, it is important to ensure correct data types. Otherwise, you may get unexpected results or errors. Datetime is a common data type in data science projects and the data is often saved as numbers or strings. During data analysis, you will likely need to explicitly convert them to a datetime type.

This article will discuss how to convert numbers and strings to a datetime type. More specifically, you will learn how to use the Pandas built-in methods to_datetime() and astype() to deal with the following common problems:

  1. Converting numbers to datetime
  2. Converting strings to datetime
  3. Handling…

Pandas tips and tricks to help you get started with Data Analysis

Photo by Ross Findon on Unsplash

When doing data analysis, it is important to ensure correct data types. Otherwise, you may get unexpected results or errors. In the case of Pandas, it will correctly infer data types in many cases and you can move on with your analysis without any further thought on the topic.

Despite how well pandas works, at some point in your data analysis process you will likely need to explicitly convert data from one type to another. This article will discuss how to change data to a numeric type. …


Pandas tips and tricks to help you get started with data analysis

Photo by Susan Q Yin on Unsplash

In data preprocessing and analysis, you will often need to figure out whether you have duplicate data and how to deal with them.

In this article, you’ll learn the two methods, duplicated() and drop_duplicates(), for finding and removing duplicate rows, as well as how to modify their behavior to suit your specific needs. This article is structured as follows:

  1. Finding duplicate rows
  2. Counting duplicate and non-duplicate rows
  3. Extracting duplicate rows with loc
  4. Determining which duplicates to mark with keep
  5. Dropping duplicate rows

For demonstration, we will use a subset from the Titanic dataset available on Kaggle.

import pandas as pd

Pandas tips and tricks to help you get started with data analysis

Photo by Clay Banks on Unsplash

When it comes to select data on a DataFrame, Pandas loc and iloc are two top favorites. They are quick, fast, easy to read, and sometimes interchangeable.

In this article, we’ll explore the differences between loc and iloc, take a looks at their similarities, and check how to perform data selection with them. We will go over the following topics:

  1. Differences between loc and iloc
  2. Selecting via a single value
  3. Selecting via a list of values
  4. Selecting a range of data via slice
  5. Selecting via conditions and callable
  6. loc and iloc are interchangeable when labels are 0-based integers

Please check…

B. Chen

Machine Learning practitioner | Formerly health informatics at University of Oxford | Ph.D. | https://www.linkedin.com/in/bindi-chen-aa55571a/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store