FuncAnimationin Jupyter Notebook
When fitting values to a line using Linear Regression, it can be very helpful to illustrate how the line fits the data as more data are added. In this article, you’ll learn how to create a Matplotlib animation, this article extends the topic from the previous article “animating a simple sine wave in Jupyter Notebook” to fit values to a line using Linear Regression.
Matplotlib is one of the most popular plotting libraries for exploratory data analysis. Plotting a static graph should work well in most cases, but when you are running simulations or doing time-series data analysis, basic plots…
Numerical data is common in data analysis. Often you have numerical data that is continuous, very large scales, or highly skewed. Sometimes, it can be easier to bin those data into discrete intervals. This is very helpful to perform descriptive statistics when values are divided into meaningful categories. For example, age group instead of the exact age, weight class instead of the exact weight, grade level instead of the exact score.
Pandas has 2 built-in functions
qcut() for transforming numerical data into categorical data.
cut()bins data into discrete intervals based on bin edges
qcut()bins data into…
FuncAnimationin Jupyter Notebook
Matplotlib is one of the most popular plotting libraries for exploratory data analysis. It’s the default plotting backend in Pandas and other popular plotting libraries are based on it, for instance, seaborn. Plotting a static graph should work well in most cases, but when you are running simulations or doing time-series data analysis, basic plots may not always be enough. You may want to show an animation that helps you understand how the state changes over time.
In this article, you’ll learn how to create animations using matplotlib in Jupyter Notebook. This article is structured as follows:
Pandas is one of the most important libraries for Data Manipulation and Analysis. It not only offers data structure & operations for manipulating data but also prints the result in a pretty tabular form with labeled rows and columns.
In most cases, the default settings of Pandas display should work well, but you may want your data to be displayed in some format that other than its default one. Pandas has an options system that allows you to customize display-related options. Display options can be configured using either methods or attributes as follows:
# Use methods
import pandas as pd…
Colab (short for Colaboratory) is a free platform from Google that allows users to code in Python. Colab is essentially the Google version of a Jupyter Notebook. Some of the advantages of Colab over Jupyter include zero configuration, free access to GPUs & CPUs, and seamless sharing of code.
More and more people are using Colab to take the advantage of the high-end computing resources without being restricted by their price. Loading data is the first step in any data science project. Often, loading data into Colab require some extra setups or coding. In this article, you’ll learn the 7…
A MultiIndex (also known as a hierarchical index) DataFrame allows you to have multiple columns acting as a row identifier and multiple rows acting as a header identifier. With MultiIndex, you can do some sophisticated data analysis, especially for working with higher dimensional data. Accessing data is the first step when working on a MultiIndex DataFrame.
In this article, you’ll learn how to access data in a MultiIndex DataFrame. This article is structured as follows:
When doing data analysis, it is important to ensure correct data types. Otherwise, you may get unexpected results or errors. Datetime is a common data type in data science projects and the data is often saved as numbers or strings. During data analysis, you will likely need to explicitly convert them to a datetime type.
This article will discuss how to convert numbers and strings to a datetime type. More specifically, you will learn how to use the Pandas built-in methods
astype() to deal with the following common problems:
When doing data analysis, it is important to ensure correct data types. Otherwise, you may get unexpected results or errors. In the case of Pandas, it will correctly infer data types in many cases and you can move on with your analysis without any further thought on the topic.
Despite how well pandas works, at some point in your data analysis process you will likely need to explicitly convert data from one type to another. This article will discuss how to change data to a numeric type. …
In data preprocessing and analysis, you will often need to figure out whether you have duplicate data and how to deal with them.
In this article, you’ll learn the two methods,
drop_duplicates(), for finding and removing duplicate rows, as well as how to modify their behavior to suit your specific needs. This article is structured as follows:
For demonstration, we will use a subset from the Titanic dataset available on Kaggle.
import pandas as pd…
When it comes to select data on a DataFrame, Pandas
iloc are two top favorites. They are quick, fast, easy to read, and sometimes interchangeable.
In this article, we’ll explore the differences between
iloc, take a looks at their similarities, and check how to perform data selection with them. We will go over the following topics:
ilocare interchangeable when labels are 0-based integers