利用临时表清除数据库重复数据

7月 25th, 2010 | Posted by | Filed under 数据库

本文内容遵从CC版权协议, 可以随意转载, 但必须以超链接形式标明文章原始出处和作者信息及版权声明
网址: http://www.penglixun.com/tech/database/use_tmp_table_clear_duplicate_data.html

上周遇到一个问题,开发忘了告诉DBA需要唯一索引,导致线上一个库出现了大量重复数据,需要立即清除,重复数据只保留一条,于是采用了临时表的方案。

首先查看重复数据的数量:

SELECT  c1,c2 FROM tbl GROUP BY c1,c2 HAVING count(ID)> 1

然后创建一张临时表,把上述结果存下来,这就是存在重复的数据各选出一条:

CREATE TABLE tbl_tmp_1 
SELECT * FROM tbl GROUP BY c1,c2 HAVING count(ID)> 1

然后通过上述临时表与原表关联,获取全部存在重复的数据:

CREATE TABLE tbl_tmp_2
SELECT b.* 
FROM tbl_tmp_1 a, tbl b
WHERE a.c1 = b.c1
AND a.c2D = b.c2;

验证一下有没有选错,即有没有不重复的数据被选出来了:

SELECT *
FROM tbl_tmp_2 
GROUP BY c1, c2
HAVING count(*) = 1;
SELECT b.* 
FROM tbl_tmp_2 a, rbl b
WHERE a.c1= b.c2
AND a.c2= b.c2
AND a.id=b.id;

全量删除所有重复数据:

DELETE FROM 
tbl, tbl_tmp_2 USING tbl
INNER JOIN tbl_tmp_2
ON tbl.id = tbl_tmp_2.id;

将原重复数据中的一条都插入数据库中:

INERT INTO tbl
SELECT * FROM tbl_tmp_1;

都做完就可以加上唯一索引了:

ALTER TABLE tbl
ADD UNIQUE uk_tbl_c1_c2 (`c1`,`c2`) ;
标签:
  1. blinux
    7月 28th, 201009:39

    过来做个记号,同类博客

    [回复]

  2. ops805
    7月 28th, 201022:01

    Identify and remove duplicate records

    Identify duplicate records:

    select COL1, COL2, COL#, COUNT(*)
    from .
    group by COL1, COL2, COL#
    having count(*) > 1;

    Remove duplicate records:
    delete from . a
    where rowid < (
    select max(rowid)
    from . b
    where b.COL1 = a.COL1
    and b.COL2 = a.COL2
    and b.COL# = a.COL#);

    [回复]

    P.Linux 回复:

    This is MySQL, not Oracle. So we haven’t RowID

    [回复]

  3. lyz
    2月 23rd, 201116:22

    hi,
    如果有主键ID的话,这样怎么样呢?

    create table tb2 …. //same as tb1

    INSERT INTO tb2 //primary key order
    SELECT * FROM tb1
    INNER JOIN (
    SELECT MAX(id) AS rowID
    FROM tb1
    GROUP BY c1,c2
    ) AS uniques
    ON tb1.id = uniques.rowID;

    drop tb1
    renmae tb2

    [回复]