How to remove rows in python

9 tricks to master Pandas drop() and speed up your data analysis

How to remove rows in python

Photo by Bernard Hermant on Unsplash

Data manipulation refers to the process of adjusting data to make it organised and easier to read. Frequently, there is data that is unusable and can interfere with what matters. Unnecessary or inaccurate data should be cleaned and deleted.

Source from solvexia.com [1]

Delete one or many rows/columns from a Pandas DataFrame can be achieved in multiple ways. Among them, the most common one is the drop() method. The method seems fairly straightforward to use, but there are still some tricks you should know to speed up your data analysis.

In this article, you’ll learn Pandas drop() tricks to deal with the following use cases:

  1. Delete a single row
  2. Delete multiple rows
  3. Delete rows based on row position and custom range
  4. Delete a single column
  5. Delete multiple columns
  6. Delete columns based on column position and custom range
  7. Working with MultiIndex DataFrame
  8. Do operation in place with inplace=True
  9. Suppress error with error='ignore'

Please check out the Notebook for source code. More tutorials are available from Github Repo.

1. Delete a single row

By default, Pandas drop() will remove the row based on their index values. Most often, the index value is an 0-based integer value per row. Specifying a row index will delete it, for example, delete the row with the index value 1.:

df.drop(1)# It's equivalent to
df.drop(labels=1)

How to remove rows in python

delete a single row using Pandas drop() (Image by author)

Note that the argument axis must be set to 0 for deleting rows (In Pandas drop(), the axis defaults to 0, so it can be omitted). If axis=1 is specified, it will delete columns instead.

Alternatively, a more intuitive way to delete a row from DataFrame is to use the index argument.

# A more intuitive way
df.drop(index=1)

How to remove rows in python

delete a single row using Pandas drop() (Image by author)

2. Delete multiple rows

Pandas drop() can take a list to delete multiple rows:

df.drop([1,2])# It's equivalent to
df.drop(labels=[1,2])

How to remove rows in python

delete multiple rows using Pandas drop() (Image by author)

Similarly, a more intuitive way to delete multiple rows is to pass a list to the index argument:

# A more intuitive way
df.drop(index=[1,2])

How to remove rows in python

delete multiple rows using Pandas drop() (Image by author)

3. Delete rows based on row position and custom range

The DataFrame index values may not be in ascending order, sometimes they can be any other values, for example, datetime or string labels. For these cases, we can delete rows based on their row position, for instance, delete the 2nd row, we can call df.index[1] and pass it to the index argument:

df.drop(index=df.index[1])

How to remove rows in python

delete rows based on row position (Image by author)

To delete the last row, we can use shortcuts such as -1 which identifies the last index:

df.drop(index=df.index[-1])

How to remove rows in python

delete rows based on row position (Image by author)

We can also use the slice technique to select a range of rows, for instance

  • Delete the last 2 rows df.drop(index=df.index[-2:])
  • Delete every other row df.drop(index=df.index[::2])

How to remove rows in python

delete rows based on row position (Image by author)

If you want to learn more about the slice technique and how to use row index for selecting data, you can check out this article:

4. Delete a single column

Similar to delete rows, Pandas drop() can be used to delete columns by specifying the axis argument to 1:

df.drop('math', axis=1)# It's equivalent to
df.drop(labels='math', axis=1)

How to remove rows in python

delete a single column using Pandas drop() (Image by author)

A more intuitive way to delete a column from DataFrame is to use the columns argument.

# A more intuitive way
df.drop(columns='math')

How to remove rows in python

delete a single column using Pandas drop() (Image by author)

5. Delete multiple columns

Similarly, we can pass a list to delete multiple columns:

df.drop(['math', 'physics'], axis=1)# It's equivalent to
df.drop(labels=['math', 'physics'], axis=1)

How to remove rows in python

delete multiple columns using Pandas drop() (Image by author)

A more intuitive way to delete multiple columns is to pass a list to the columns argument:

# A more intuitive way
df.drop(columns=['math', 'physics'])

How to remove rows in python

delete multiple columns using Pandas drop() (Image by author)

6. Delete columns based on column position and custom range

We can delete a column based on its column position, for instance, delete the 2nd column, we can call df.column[1] and pass it to the columns argument:

df.drop(columns=df.columns[1])

How to remove rows in python

delete columns based on column position (Image by author)

To delete the last column, we can use shortcuts such as -1 which identifies the last index:

df.drop(columns=df.columns[-1])

How to remove rows in python

delete columns based on column position (Image by author)

Similarly, we can also use the slice technique to select a range of columns, for instance

  • Delete the last 2 columns df.drop(columns=df.columns[-2:])
  • Delete every other column df.drop(columns=df.columns[::2])

How to remove rows in python

delete columns based on column position (Image by author)

7. Working with MultiIndex

A MultiIndex (also known as a hierarchical index) DataFrame allows us to have multiple columns acting as a row identifier and multiple rows acting as a header identifier:

How to remove rows in python

(image by author)

When calling Pandas drop() on a MultiIndex DataFrame, it will remove the level 0 index and column by default.

# Delete all Oxford rows
df.drop(index='Oxford')
# Delete all Day columns
df.drop(columns='Day')

How to remove rows in python

Pandas drop() in MultiIndex (image by author)

To specify a level to be removed, we can set the level argument:

# remove all 2019-07-04 row at level 1
df.drop(index='2019-07-04', level=1)
# Drop all Weather column at level 1
df.drop(columns='Weather', level=1)

How to remove rows in python

Pandas drop() in MultiIndex (image by author)

In some cases, we would like to delete a specific index or column combination. To do that, we can pass a tuple to the index or columns argument:

# drop the index combination 'Oxford' and '2019-07-04'
df.drop(index=('Oxford', '2019-07-04'))
# drop the column combination 'Day' and 'Weather'
df.drop(columns=('Day', 'Weather'))

How to remove rows in python

Pandas drop() in MultiIndex (image by author)

If you want to learn more about accessing data in a MultiIndex DataFrame, please check out this article:

8. Do operation in place with inplace=True

By default, the Pandas drop() return a copy of the result without affecting the given DataFrame. We can set the argument inplace=True to do the operation in place to avoid additional reassignment and reduce memory usage.

9. Suppress error with error='ignore'

You may notice that the Pandas drop() will throw an error when the given rows or columns don’t exist. We can set the argument error='ignore' to suppress the error.

Conclusion

In this article, we have covered 9 use cases about deleting rows and columns using the Pandas drop(). The method itself is very straightforward to use and it’s one of the top favorite methods for manipulating data in data Preprocessing.

Thanks for reading. Please check out the Notebook for the source code and stay tuned if you are interested in the practical aspect of machine learning. More tutorials are available from the Github Repo.

References

[1] 5 Tips for Data manipulation

How do I remove rows from a DataFrame in Python?

To drop a row or column in a dataframe, you need to use the drop() method available in the dataframe. You can read more about the drop() method in the docs here. Rows are labelled using the index number starting with 0, by default. Columns are labelled using names.

How do you remove unwanted rows in Python?

To delete a row from a DataFrame, use the drop() method and set the index label as the parameter.

How do you delete columns and rows in Python?

The drop function allows the removal of rows and columns from your DataFrame, and once you've used it a few times, you'll have no issues. The Pandas “drop” function is used to delete columns or rows from a Pandas DataFrame.

How do I delete multiple rows in a DataFrame in Python?

To delete rows and columns from DataFrames, Pandas uses the “drop” function. To delete a column, or multiple columns, use the name of the column(s), and specify the “axis” as 1. Alternatively, as in the example below, the 'columns' parameter has been added in Pandas which cuts out the need for 'axis'.