题目描述

编写一个 SQL 查询,来删除 Person 表中所有重复的电子邮箱,重复的邮箱里只保留 Id 最小 的那个。

Person 表:

Id Email
1 john@example.com
2 bob@example.com
3 john@example.com

例如,在运行你的查询语句之后,上面的 Person 表应返回以下几行:

Id Email
1 john@example.com
2 bob@example.com
  • 执行 SQL 之后,输出是整个 Person 表。
  • 使用 delete 语句。

相关知识

  1. DELETE FROM table_name WHERE column_name = value

解题过程

  1. 分析题意

    • 统计每个Email出现的次数和第一次出现的Id
    • 删除Email的Id不等于第一次出现的Id的记录
  2. 编写sql

    DELETE 
    FROM 
        Person
    WHERE
        (Email,Id) NOT IN (SELECT Email ,MIN(Id) FROM Person GROUP BY Email)
    
  3. 试运行,报错You can’t specify target table ‘Person’ for update in FROM clause,意思是不能够在同一语句中先SELECT同一表中的某些值,再UPDATE这个表,即不能通过一张表的某个字段的值更新这张表的某些字段,需要使用中间表进行解决

  4. 修改sql,但是结果一行都没有删掉

    DELETE 
    FROM 
        Person p
    WHERE
        (p.Email,p.Id) 
        NOT IN 
        (
            SELECT 
                t.Email,t.MinId 
            FROM 
                (SELECT Email ,MIN(Id) AS MinId FROM Person GROUP BY Email) t
        )
    
  5. 将表中Email和MIN(Id)互换,可以正确删除,提交,运行通过

    DELETE 
    FROM 
        Person p
    WHERE
        (p.Id,p.Email) 
        NOT IN 
        (
            SELECT 
                t.MinId,t.Email
            FROM 
                (SELECT MIN(Id) AS MinId,Email FROM Person GROUP BY Email) t
        )
    

    这里没弄清楚为什么字符串类型在前面就不可以

  6. 将NOT IN改成只有Id的NOT IN,可以正确删除,提交,运行通过

    DELETE 
    FROM 
        Person p
    WHERE
        p.Id
        NOT IN 
        (
            SELECT 
                t.MinId 
            FROM 
                (SELECT MIN(Id) AS MinId FROM Person GROUP BY Email) t
        )
    
  7. 改用NOT EXISTS编写sql

    WITH CTE AS(SELECT Email,MIN(Id) AS MinId FROM Person GROUP BY Email)
    DELETE
    FROM
        Person p
    WHERE
        NOT EXISTS(
            SELECT
                1
            FROM
                CTE c
            WHERE
                (c.MinId = p.Id) AND (c.Email =p.Email)
        )
    
  8. 提交,运行通过

学习总结

此题没有学到任何内容