How do i remove rows from a dataframe in python?

9 tricks to master Pandas drop() and speed up your data analysis

Photo by Bernard Hermant on Unsplash

Data manipulation refers to the process of adjusting data to make it organised and easier to read. Frequently, there is data that is unusable and can interfere with what matters. Unnecessary or inaccurate data should be cleaned and deleted.

Source from solvexia.com [1]

Delete one or many rows/columns from a Pandas DataFrame can be achieved in multiple ways. Among them, the most common one is the drop() method. The method seems fairly straightforward to use, but there are still some tricks you should know to speed up your data analysis.

In this article, you’ll learn Pandas drop() tricks to deal with the following use cases:

  1. Delete a single row
  2. Delete multiple rows
  3. Delete rows based on row position and custom range
  4. Delete a single column
  5. Delete multiple columns
  6. Delete columns based on column position and custom range
  7. Working with MultiIndex DataFrame
  8. Do operation in place with inplace=True
  9. Suppress error with error='ignore'

Please check out the Notebook for source code. More tutorials are available from Github Repo.

1. Delete a single row

By default, Pandas drop() will remove the row based on their index values. Most often, the index value is an 0-based integer value per row. Specifying a row index will delete it, for example, delete the row with the index value 1.:

df.drop(1)# It's equivalent to
df.drop(labels=1)
delete a single row using Pandas drop() (Image by author)

Note that the argument axis must be set to 0 for deleting rows (In Pandas drop(), the axis defaults to 0, so it can be omitted). If axis=1 is specified, it will delete columns instead.

Alternatively, a more intuitive way to delete a row from DataFrame is to use the index argument.

# A more intuitive way
df.drop(index=1)
delete a single row using Pandas drop() (Image by author)

2. Delete multiple rows

Pandas drop() can take a list to delete multiple rows:

df.drop([1,2])# It's equivalent to
df.drop(labels=[1,2])
delete multiple rows using Pandas drop() (Image by author)

Similarly, a more intuitive way to delete multiple rows is to pass a list to the index argument:

# A more intuitive way
df.drop(index=[1,2])
delete multiple rows using Pandas drop() (Image by author)

3. Delete rows based on row position and custom range

The DataFrame index values may not be in ascending order, sometimes they can be any other values, for example, datetime or string labels. For these cases, we can delete rows based on their row position, for instance, delete the 2nd row, we can call df.index[1] and pass it to the index argument:

df.drop(index=df.index[1])
delete rows based on row position (Image by author)

To delete the last row, we can use shortcuts such as -1 which identifies the last index:

df.drop(index=df.index[-1])
delete rows based on row position (Image by author)

We can also use the slice technique to select a range of rows, for instance

  • Delete the last 2 rows df.drop(index=df.index[-2:])
  • Delete every other row df.drop(index=df.index[::2])
delete rows based on row position (Image by author)

If you want to learn more about the slice technique and how to use row index for selecting data, you can check out this article:

4. Delete a single column

Similar to delete rows, Pandas drop() can be used to delete columns by specifying the axis argument to 1:

df.drop('math', axis=1)# It's equivalent to
df.drop(labels='math', axis=1)
delete a single column using Pandas drop() (Image by author)

A more intuitive way to delete a column from DataFrame is to use the columns argument.

# A more intuitive way
df.drop(columns='math')
delete a single column using Pandas drop() (Image by author)

5. Delete multiple columns

Similarly, we can pass a list to delete multiple columns:

df.drop(['math', 'physics'], axis=1)# It's equivalent to
df.drop(labels=['math', 'physics'], axis=1)
delete multiple columns using Pandas drop() (Image by author)

A more intuitive way to delete multiple columns is to pass a list to the columns argument:

# A more intuitive way
df.drop(columns=['math', 'physics'])
delete multiple columns using Pandas drop() (Image by author)

6. Delete columns based on column position and custom range

We can delete a column based on its column position, for instance, delete the 2nd column, we can call df.column[1] and pass it to the columns argument:

df.drop(columns=df.columns[1])
delete columns based on column position (Image by author)

To delete the last column, we can use shortcuts such as -1 which identifies the last index:

df.drop(columns=df.columns[-1])
delete columns based on column position (Image by author)

Similarly, we can also use the slice technique to select a range of columns, for instance

  • Delete the last 2 columns df.drop(columns=df.columns[-2:])
  • Delete every other column df.drop(columns=df.columns[::2])
delete columns based on column position (Image by author)

7. Working with MultiIndex

A MultiIndex (also known as a hierarchical index) DataFrame allows us to have multiple columns acting as a row identifier and multiple rows acting as a header identifier:

(image by author)

When calling Pandas drop() on a MultiIndex DataFrame, it will remove the level 0 index and column by default.

# Delete all Oxford rows
df.drop(index='Oxford')
# Delete all Day columns
df.drop(columns='Day')
Pandas drop() in MultiIndex (image by author)

To specify a level to be removed, we can set the level argument:

# remove all 2019-07-04 row at level 1
df.drop(index='2019-07-04', level=1)
# Drop all Weather column at level 1
df.drop(columns='Weather', level=1)
Pandas drop() in MultiIndex (image by author)

In some cases, we would like to delete a specific index or column combination. To do that, we can pass a tuple to the index or columns argument:

# drop the index combination 'Oxford' and '2019-07-04'
df.drop(index=('Oxford', '2019-07-04'))
# drop the column combination 'Day' and 'Weather'
df.drop(columns=('Day', 'Weather'))
Pandas drop() in MultiIndex (image by author)

If you want to learn more about accessing data in a MultiIndex DataFrame, please check out this article:

8. Do operation in place with inplace=True

By default, the Pandas drop() return a copy of the result without affecting the given DataFrame. We can set the argument inplace=True to do the operation in place to avoid additional reassignment and reduce memory usage.

9. Suppress error with error='ignore'

You may notice that the Pandas drop() will throw an error when the given rows or columns don’t exist. We can set the argument error='ignore' to suppress the error.

Conclusion

In this article, we have covered 9 use cases about deleting rows and columns using the Pandas drop(). The method itself is very straightforward to use and it’s one of the top favorite methods for manipulating data in data Preprocessing.

Thanks for reading. Please check out the Notebook for the source code and stay tuned if you are interested in the practical aspect of machine learning. More tutorials are available from the Github Repo.

References

[1] 5 Tips for Data manipulation

How do I remove rows from a DataFrame?

To drop a row or column in a dataframe, you need to use the drop() method available in the dataframe. You can read more about the drop() method in the docs here. Rows are labelled using the index number starting with 0, by default. Columns are labelled using names.

How do I delete multiple rows in a DataFrame in Python?

To delete rows and columns from DataFrames, Pandas uses the “drop” function. To delete a column, or multiple columns, use the name of the column(s), and specify the “axis” as 1. Alternatively, as in the example below, the 'columns' parameter has been added in Pandas which cuts out the need for 'axis'.

How do I delete 10 rows in pandas?

Delete Top N Rows of DataFrame Using drop() drop() method is also used to delete rows from DataFrame based on column values (condition). Use axis param to specify what axis you would like to delete. By default axis = 0 meaning to delete rows. Use axis=1 or columns param to delete columns.

How do I drop multiple rows in a DataFrame?

Delete a Multiple Rows by Index Position in DataFrame As df. drop() function accepts only list of index label names only, so to delete the rows by position we need to create a list of index names from positions and then pass it to drop(). As default value of inPlace is false, so contents of dfObj will not be modified.