利用临时表清除数据库重复数据
本文内容遵从CC版权协议, 可以随意转载, 但必须以超链接形式标明文章原始出处和作者信息及版权声明网址: http://www.penglixun.com/tech/database/use_tmp_table_clear_duplicate_data.html
上周遇到一个问题,开发忘了告诉DBA需要唯一索引,导致线上一个库出现了大量重复数据,需要立即清除,重复数据只保留一条,于是采用了临时表的方案。
首先查看重复数据的数量:
SELECT c1,c2 FROM tbl GROUP BY c1,c2 HAVING count(ID)> 1 |
然后创建一张临时表,把上述结果存下来,这就是存在重复的数据各选出一条:
CREATE TABLE tbl_tmp_1 SELECT * FROM tbl GROUP BY c1,c2 HAVING count(ID)> 1 |
然后通过上述临时表与原表关联,获取全部存在重复的数据:
CREATE TABLE tbl_tmp_2 SELECT b.* FROM tbl_tmp_1 a, tbl b WHERE a.c1 = b.c1 AND a.c2D = b.c2; |
验证一下有没有选错,即有没有不重复的数据被选出来了:
SELECT * FROM tbl_tmp_2 GROUP BY c1, c2 HAVING count(*) = 1; SELECT b.* FROM tbl_tmp_2 a, rbl b WHERE a.c1= b.c2 AND a.c2= b.c2 AND a.id=b.id; |
全量删除所有重复数据:
DELETE FROM tbl, tbl_tmp_2 USING tbl INNER JOIN tbl_tmp_2 ON tbl.id = tbl_tmp_2.id; |
将原重复数据中的一条都插入数据库中:
INERT INTO tbl SELECT * FROM tbl_tmp_1; |
都做完就可以加上唯一索引了:
ALTER TABLE tbl ADD UNIQUE uk_tbl_c1_c2 (`c1`,`c2`) ; |
过来做个记号,同类博客
[回复]
Identify and remove duplicate records
Identify duplicate records:
select COL1, COL2, COL#, COUNT(*)
from .
group by COL1, COL2, COL#
having count(*) > 1;
Remove duplicate records:
delete from . a
where rowid < (
select max(rowid)
from . b
where b.COL1 = a.COL1
and b.COL2 = a.COL2
and b.COL# = a.COL#);
[回复]
P.Linux 回复:
七月 29th, 2010 at 09:33
This is MySQL, not Oracle. So we haven’t RowID
[回复]
hi,
如果有主键ID的话,这样怎么样呢?
create table tb2 …. //same as tb1
INSERT INTO tb2 //primary key order
SELECT * FROM tb1
INNER JOIN (
SELECT MAX(id) AS rowID
FROM tb1
GROUP BY c1,c2
) AS uniques
ON tb1.id = uniques.rowID;
drop tb1
renmae tb2
[回复]