In this article, we will see how to delete duplicate rows from a table and keep only one using various ways, mainly: We will start by preparing the data. We will be creating a table sales_team_emails and inserting the data into it. Through the table, let us see by doing select * and keeping the rows ordered by sales_person_email. Output:- The output shows that the table sales_team_emails contains duplicate values in the column sales_person_email. Advertisements Know more about How to Find Duplicate Rows in MySQL. Delete the duplicate rows but keep latest : using GROUP BY and MAXOne way to delete the duplicate rows but retaining the latest ones is by using MAX() function and GROUP BY clause. Observe the below query and output. DELETE FROM sales_team_emails WHERE sales_person_id NOT IN ( SELECT * FROM ( SELECT MAX(sales_person_id) FROM sales_team_emails GROUP BY sales_person_email ) AS s_alias ); Action Output Message : DELETE FROM sales_team_emails WHERE sales_person_id NOT IN ( SELECT * FROM ( SELECT MAX(sales_person_id) FROM sales_team_emails GROUP BY sales_person_email ) AS s_alias ) 4 row(s) affected 0.0061 sec. Select * on sales_team_emailstable to view the output. Output : Explanation:- As we can see in this output, we have successfully deleted the duplicate rows. Here we are using the inner query, MAX() function, and GROUP BY clause.
Delete the duplicate rows but keep latest : using JOINSAnother way to achieve the goal is to use joins to delete the old entries from the table and preserve the latest entry in the table sales_team_emails comparing the sales_person_id column. Observe the below query : DELETE s1 FROM sales_team_emails s1, sales_team_emails s2 WHERE s1.sales_person_id < s2.sales_person_id AND s1.sales_person_email = s2.sales_person_email; Action Output Message : DELETE s1 FROM sales_team_emails s1, sales_team_emails s2 WHERE s1.sales_person_id < s2.sales_person_id AND s1.sales_person_email = s2.sales_person_email 4 row(s) affected 0.0053 sec Select * on sales_team_emails table to view the output. Output : Explanation:- As we can see in this output, we have successfully deleted the duplicate rows, and the ones with higher sales_person_id ( sales_person_id is the primary key) are retained. Here we are doing a self join on the same table sales_team_emails, which is deleting duplicate records by keeping one copy, the one with a higher value of sales_person_id. Delete the duplicate row but keep oldest : using JOINSJOINS can be used to keep the oldest entry of duplicate rows but delete the ones who entered late into the table. The solution is similar to the one in the above section with a little change in the WHERE clause. Observe the below query and its output. DELETE s1 FROM sales_team_emails s1, sales_team_emails s2 WHERE s1.sales_person_id > s2.sales_person_id AND s1.sales_person_email = s2.sales_person_email; Action Output Message: DELETE s1 FROM sales_team_emails s1, sales_team_emails s2 WHERE s1.sales_person_id > s2.sales_person_id AND s1.sales_person_email = s2.sales_person_email 4 row(s) affected 0.0025 sec Select * on sales_team_emails table to view the output. Output:- Explanation:- The output shows the oldest rows preserved but the latest duplicate rows deleted. The only change is in the WHERE condition “WHERE s1.sales_person_id > s2.sales_person_id” Delete the duplicate row but keep oldest : using ROW_NUMBER()Another approach to delete the duplicate rows retaining the oldest entries in the table is using ROW_NUMBER () function and PARTITION BY clause. Observe the below query, its output, and explanation. DELETE FROM sales_team_emails WHERE sales_person_id IN ( SELECT sales_person_id FROM ( SELECT sales_person_id, ROW_NUMBER() OVER ( PARTITION BY sales_person_email ORDER BY sales_person_email) AS row_num FROM sales_team_emails ) s_alias WHERE row_num > 1 ); Action Output Message : DELETE FROM sales_team_emails WHERE sales_person_id IN ( SELECT sales_person_id FROM ( SELECT sales_person_id, ROW_NUMBER() OVER ( PARTITION BY sales_person_email ORDER BY sales_person_email) AS row_num FROM sales_team_emails ) s_alias WHERE row_num > 1 ) 4 row(s) affected 0.0017 sec Output :- figure 1.5Explanation:- As we can see in the output, the latest duplicate rows got deleted. Here we are using inner queries and ROW_NUMBER() function.
We hope this article helped you with deleting the duplicate rows. Good Luck !!.
How find and delete duplicate rows in MySQL?How to Remove Duplicate Rows in MySQL. Setting Up Test Database. Create Test Database. Add Table and Data. Display the Contents of the Dates Table.. Display Duplicate Rows.. Removing Duplicate Rows. Option 1: Remove Duplicate Rows Using INNER JOIN. Option 2: Remove Duplicate Rows Using an Intermediate Table.. How do I find duplicates and deletes in SQL?SQL Delete Duplicate Rows using Group By and Having Clause
According to Delete Duplicate Rows in SQL, for finding duplicate rows, you need to use the SQL GROUP BY clause. The COUNT function can be used to verify the occurrence of a row using the Group by clause, which groups data according to the given columns.
How do I remove duplicate results in MySQL?The go to solution for removing duplicate rows from your result sets is to include the distinct keyword in your select statement. It tells the query engine to remove duplicates to produce a result set in which every row is unique.
How do I find and delete duplicate rows?Remove duplicate values. Select the range of cells that has duplicate values you want to remove. Tip: Remove any outlines or subtotals from your data before trying to remove duplicates.. Click Data > Remove Duplicates, and then Under Columns, check or uncheck the columns where you want to remove the duplicates. ... . Click OK.. |