When doing data analysis, it is important to ensure correct data types. Otherwise, you may get unexpected results or errors. In the case of Pandas, it will correctly infer data types in many cases and you can move on with your analysis without any further thought on the topic.
Despite how well pandas works, at some point in your data analysis process you will likely need to explicitly convert data from one type to another. This article will discuss how to change data to a numeric type. …
In data preprocessing and analysis, you will often need to figure out whether you have duplicate data and how to deal with them.
In this article, you’ll learn the two methods, duplicated()
and drop_duplicates()
, for finding and removing duplicate rows, as well as how to modify their behavior to suit your specific needs. This article is structured as follows:
loc
keep
For demonstration, we will use a subset from the Titanic dataset available on Kaggle.
import pandas as pd…
When it comes to select data on a DataFrame, Pandas loc
and iloc
are two top favorites. They are quick, fast, easy to read, and sometimes interchangeable.
In this article, we’ll explore the differences between loc
and iloc
, take a looks at their similarities, and check how to perform data selection with them. We will go over the following topics:
loc
and iloc
loc
and iloc
are interchangeable when labels are 0-based integersIn exploratory data analysis, we often would like to analyze data by some categories. In SQL, the GROUP BY
statement groups row that has the same category values into summary rows. In Pandas, SQL’s GROUP BY
operation is performed using the similarly named groupby()
method. Pandas’ groupby()
allows us to split data into separate groups to perform computations for better analysis.
In this article, you’ll learn the “group by” process (split-apply-combine) and how to use Pandas’s groupby()
function to group data and perform operations. This article is structured as follows:
groupby()
and how to access groups information?In data analysis, we may work on a dataset that has no column names or column names contain some unwanted characters (e.g. space), or maybe we just want to rename columns to have better names. These all require us to rename columns in a Pandas DataFrame.
In this article, you’ll learn 5 different approaches to do that. This article is structured as follows:
columns
attributerename()
functioncolumns.str.replace()
methodset_axis()
For demonstration, we will use a subset from the Titanic dataset available…
Reading data is the first step in any data science project. As a machine learning practitioner or a data scientist, you would have surely come across JSON (JavaScript Object Notation) data. JSON is a widely used format for storing and exchanging data. For example, NoSQL database like MongoDB store the data in JSON format, and REST API’s responses are mostly available in JSON.
Although this format works well for storing and exchanging data, it needs to be converted into a tabular form for further analysis. You are likely to deal with 2 types of JSON structure, a JSON object or…
Numerical data is common in data analysis. Often you have numerical data that is continuous, or very large scales, or is highly skewed. Sometimes, it can be easier to bin values into discrete intervals. This is helpful to perform descriptive statistics when values are divided into meaningful categories. For example, we can divide the age into Toddler, Child, Adult, and Elder.
Pandas’ built-in cut()
function is a great way to transform numerical data into categorical data. In this article, you’ll learn how to use it to deal with the following common tasks.
DataFrame and Series are two core data structures in Pandas. DataFrame is a 2-dimensional labeled data with rows and columns. It is like a spreadsheet or SQL table. Series is a 1-dimensional labeled array. It is sort of like a more powerful version of the Python list. Understanding Series is very important, not only because it is one of the core data structures, but also because it is the building blocks of a DataFrame.
In this article, you’ll learn the most commonly used data operations with Pandas Series and should help you get started with Pandas. …
The activation functions are at the very core of Deep Learning. They determine the output of a model, its accuracy, and computational efficiency. In some cases, activation functions have a major effect on the model’s ability to converge and the convergence speed.
In this article, you’ll learn why ReLU is used in Deep Learning and the best practice to use it with Keras and TensorFlow 2.
In artificial neural networks (ANNs), the activation function is a mathematical “gate” in between the input feeding the current neuron and its output going to the next layer [1].
The activation functions are at the very core of Deep Learning. They determine the output of a model, its accuracy, and computational efficiency. In some cases, activation functions have a major effect on the model’s ability to converge and the convergence speed.
In this article, you’ll learn the following most popular activation functions in Deep Learning and how to use them with Keras and TensorFlow 2.
Machine Learning practitioner | Formerly health informatics at University of Oxford | Ph.D.