When doing data analysis, it is important to ensure correct data types. Otherwise, you may get unexpected results or errors. In the case of Pandas, it will correctly infer data types in many cases and you can move on with your analysis without any further thought on the topic.

Despite how well pandas works, at some point in your data analysis process you will likely need to explicitly convert data from one type to another. This article will discuss how to change data to a numeric type. …

In data preprocessing and analysis, you will often need to figure out whether you have duplicate data and how to deal with them.

In this article, you’ll learn the two methods, `duplicated()`

and `drop_duplicates()`

, for finding and removing duplicate rows, as well as how to modify their behavior to suit your specific needs. This article is structured as follows:

- Finding duplicate rows
- Counting duplicate and non-duplicate rows
- Extracting duplicate rows with
`loc`

- Determining which duplicates to mark with
`keep`

- Dropping duplicate rows

For demonstration, we will use a subset from the Titanic dataset available on Kaggle.

import pandas as pd…

When it comes to select data on a DataFrame, Pandas `loc`

and `iloc`

are two top favorites. They are quick, fast, easy to read, and sometimes interchangeable.

In this article, we’ll explore the differences between `loc`

and `iloc`

, take a looks at their similarities, and check how to perform data selection with them. We will go over the following topics:

- Differences between
`loc`

and`iloc`

- Selecting via a single value
- Selecting via a list of values
- Selecting a range of data via slice
- Selecting via conditions and callable
`loc`

and`iloc`

are interchangeable when labels are 0-based integers

In exploratory data analysis, we often would like to analyze data by some categories. In SQL, the `GROUP BY`

statement groups row that has the same category values into summary rows. In Pandas, SQL’s `GROUP BY`

operation is performed using the similarly named `groupby()`

method. Pandas’ `groupby()`

allows us to split data into separate groups to perform computations for better analysis.

In this article, you’ll learn the “group by” process (split-apply-combine) and how to use Pandas’s `groupby()`

function to group data and perform operations. This article is structured as follows:

- What is Pandas
`groupby()`

and how to access groups information? - …

In data analysis, we may work on a dataset that has no column names or column names contain some unwanted characters (e.g. space), or maybe we just want to rename columns to have better names. These all require us to rename columns in a Pandas DataFrame.

In this article, you’ll learn 5 different approaches to do that. This article is structured as follows:

- Passing a list of names to
`columns`

attribute - Using
`rename()`

function - Renaming columns while reading a CSV file
- Using
`columns.str.replace()`

method - Renaming columns via
`set_axis()`

For demonstration, we will use a subset from the Titanic dataset available…

Reading data is the first step in any data science project. As a machine learning practitioner or a data scientist, you would have surely come across JSON (JavaScript Object Notation) data. JSON is a widely used format for storing and exchanging data. For example, NoSQL database like MongoDB store the data in JSON format, and REST API’s responses are mostly available in JSON.

Although this format works well for storing and exchanging data, it needs to be converted into a tabular form for further analysis. You are likely to deal with 2 types of JSON structure, a JSON object or…

Numerical data is common in data analysis. Often you have numerical data that is continuous, or very large scales, or is highly skewed. Sometimes, it can be easier to bin values into discrete intervals. This is helpful to perform descriptive statistics when values are divided into meaningful categories. For example, we can divide the age into Toddler, Child, Adult, and Elder.

Pandas’ built-in `cut()`

function is a great way to transform numerical data into categorical data. In this article, you’ll learn how to use it to deal with the following common tasks.

- Discretizing into equal-sized bins
- Adding custom bins
- Adding…

**DataFrame** and **Series** are two core data structures in Pandas. **DataFrame** is a 2-dimensional labeled data with rows and columns. It is like a spreadsheet or SQL table. **Series** is a 1-dimensional labeled array. It is sort of like a more powerful version of the Python **list**. Understanding Series is very important, not only because it is one of the core data structures, but also because it is the building blocks of a DataFrame.

In this article, you’ll learn the most commonly used data operations with Pandas **Series **and should help you get started with Pandas. …

The activation functions are at the very core of Deep Learning. They determine the output of a model, its accuracy, and computational efficiency. In some cases, activation functions have a major effect on the model’s ability to converge and the convergence speed.

In this article, you’ll learn why ReLU is used in Deep Learning and the best practice to use it with Keras and TensorFlow 2.

- Problems with Sigmoid and Tanh activation functions
- What is Rectified Linear Unit (ReLU)
- Training a deep neural network using ReLU
- Best practice to use ReLU with He initialization
- Comparing to models with Sigmoid and…

In artificial neural networks (ANNs), the **activation function** is a mathematical “gate” in between the input feeding the current neuron and its output going to the next layer [1].

The activation functions are at the very core of Deep Learning. They determine the output of a model, its accuracy, and computational efficiency. In some cases, activation functions have a major effect on the model’s ability to converge and the convergence speed.

In this article, you’ll learn the following most popular activation functions in Deep Learning and how to use them with Keras and TensorFlow 2.

- Sigmoid (Logistic)
- Hyperbolic Tangent (Tanh)
- …

Machine Learning practitioner | Formerly health informatics at University of Oxford | Ph.D.