题目描述

Trips 表中存所有出租车的行程信息。每段行程有唯一键 Id,Client_Id 和 Driver_Id 是 Users 表中 Users_Id 的外键。Status 是枚举类型,枚举成员为 (‘completed’, ‘cancelled_by_driver’, ‘cancelled_by_client’)。

Trips表 (Primary Key: Id Foreign Key: Client_Id Driver_Id)

Id Client_Id Driver_Id City_Id Status Request_at
1 1 10 1 completed 2013-10-01
2 2 11 1 cancelled_by_driver 2013-10-01
3 3 12 6 completed 2013-10-01
4 4 13 6 cancelled_by_client 2013-10-01
5 1 10 1 completed 2013-10-02
6 2 11 6 completed 2013-10-02
7 3 12 6 completed 2013-10-02
8 2 12 12 completed 2013-10-03
9 3 10 12 completed 2013-10-03
10 4 13 12 cancelled_by_driver 2013-10-03

Users 表存所有用户。每个用户有唯一键 Users_Id。Banned 表示这个用户是否被禁止,Role 则是一个表示(‘client’, ‘driver’, ‘partner’)的枚举类型。

Users_Id Banned Role
1 No client
2 Yes client
3 No client
4 No client
10 No driver
11 No driver
12 No driver
13 No driver

写一段 SQL 语句查出 2013年10月1日 至 2013年10月3日 期间非禁止用户的取消率。基于上表,你的 SQL 语句应返回如下结果,取消率(Cancellation Rate)保留两位小数。

取消率的计算方式如下:(被司机或乘客取消的非禁止用户生成的订单数量) / (非禁止用户生成的订单总数)

Day Cancellation Rate
2013-10-01 0.33
2013-10-02 0.00
2013-10-03 0.50

相关知识

  1. 枚举类型

解题过程

  1. 分析题意

    • 去除被ban用户生成的订单
    • 使用聚合函数,统计被取消的订单数和所有未被ban的订单数,其比值作为Cancellation Rate
    • 即1-COUNT(completed)/COUNT(ALL),难点就在于一个COUNT部分,一个COUNT整体
  2. 编写sql

    WITH CTE AS (SELECT * FROM Trips JOIN Users ON Trips.Client_Id=Users.Users_Id WHERE Users.Banned='No')
    SELECT
        c.Request_at AS `Day`,
        CAST((1-(c.completednum)/(a.allnum)) AS DECIMAL(4,2))AS `Cancellation Rate`
    FROM
        (SELECT Request_at, COUNT(1) AS completednum FROM CTE WHERE `Status`='completed' GROUP BY Request_at ) c,
        (SELECT Request_at, COUNT(1) AS allnum FROM CTE GROUP BY Request_at) a
    WHERE
        c.Request_at=a.Request_at
    
    • 基本样例运行通过

    • 提交,有一个测试样例没通过

    • input

      {
          "headers":{
              "Trips":[
                  "Id",
                  "Client_Id",
                  "Driver_Id",
                  "City_Id",
                  "Status",
                  "Request_at"
              ],
              "Users":[
                  "Users_Id",
                  "Banned",
                  "Role"
              ]
          },
          "rows":{
              "Trips":[
                  [
                      "1",
                      "1",
                      "10",
                      "1",
                      "cancelled_by_client",
                      "2013-10-01"
                  ]
              ],
              "Users":[
                  [
                      "1",
                      "No",
                      "client"
                  ],
                  [
                      "10",
                      "No",
                      "driver"
                  ]
              ]
          }
      }
      
    • my output

      {
          "headers":[
              "Day",
              "Cancellation Rate"
          ],
          "values":[
           
          ]
      }
      
    • standard output

      {
          "headers":[
              "Day",
              "Cancellation Rate"
          ],
          "values":[
              [
                  "2013-10-01",
                  1
              ]
          ]
      }
      
    • 分析发现,这是因为这个表中没有complete的元组,导致内连接中的c表一行数据也没有,从而连接后一行数据也没有

      • 改用右连接
      • 因为completednum可能被补充成了NULL,所以需要进行判空处理
  3. 改写SQL

    WITH CTE AS (SELECT * FROM Trips JOIN Users ON Trips.Client_Id=Users.Users_Id WHERE Users.Banned='No')
    SELECT
        a.Request_at AS `Day`,
        IFNULL(CAST((1-(c.completednum)/(a.allnum)) AS DECIMAL(4,2)),1.00) AS `Cancellation Rate`
    FROM
        (SELECT Request_at, COUNT(1) AS completednum FROM CTE WHERE `Status`='completed' GROUP BY Request_at ) c 
        RIGHT JOIN
        (SELECT Request_at, COUNT(1) AS allnum FROM CTE GROUP BY Request_at) a
        ON c.Request_at=a.Request_at
    
    • 2中的样例可以正常运行了,但是有新的样例没有通过,因为日期不是2013年10月1日 至 2013年10月3日
  4. 改写SQL

    WITH CTE AS (SELECT * FROM Trips JOIN Users ON Trips.Client_Id=Users.Users_Id WHERE Users.Banned='No')
    SELECT
        a.Request_at AS `Day`,
        IFNULL(CAST((1-(c.completednum)/(a.allnum)) AS DECIMAL(4,2)),1.00) AS `Cancellation Rate`
    FROM
        (SELECT Request_at, COUNT(1) AS completednum FROM CTE WHERE `Status`='completed' GROUP BY Request_at ) c 
        RIGHT JOIN
        (SELECT Request_at, COUNT(1) AS allnum FROM CTE GROUP BY Request_at) a
        ON c.Request_at=a.Request_at
    WHERE
        a.Request_at IN ("2013-10-01","2013-10-02","2013-10-03")
    
  5. 提交,运行通过

  6. 但是运行效率不高,修改sql,将最后一句的IN修改为BETWEEN

    WITH CTE AS (SELECT * FROM Trips JOIN Users ON Trips.Client_Id=Users.Users_Id WHERE Users.Banned='No')
    SELECT
        a.Request_at AS `Day`,
        IFNULL(CAST((1-(c.completednum)/(a.allnum)) AS DECIMAL(4,2)),1.00) AS `Cancellation Rate`
    FROM
        (SELECT Request_at, COUNT(1) AS completednum FROM CTE WHERE `Status`='completed' GROUP BY Request_at ) c 
        RIGHT JOIN
        (SELECT Request_at, COUNT(1) AS allnum FROM CTE GROUP BY Request_at) a
        ON c.Request_at=a.Request_at
    WHERE
        a.Request_at between '2013-10-01' and '2013-10-03'
    

学习总结

  1. 当限定值在一个连续的区间内,能用BETWEEN就不要用IN,IN的效率比BETWEEN低