题目描述
Trips 表中存所有出租车的行程信息。每段行程有唯一键 Id,Client_Id 和 Driver_Id 是 Users 表中 Users_Id 的外键。Status 是枚举类型,枚举成员为 (‘completed’, ‘cancelled_by_driver’, ‘cancelled_by_client’)。
Trips
表 (Primary Key: Id Foreign Key: Client_Id Driver_Id)
Id | Client_Id | Driver_Id | City_Id | Status | Request_at |
---|---|---|---|---|---|
1 | 1 | 10 | 1 | completed | 2013-10-01 |
2 | 2 | 11 | 1 | cancelled_by_driver | 2013-10-01 |
3 | 3 | 12 | 6 | completed | 2013-10-01 |
4 | 4 | 13 | 6 | cancelled_by_client | 2013-10-01 |
5 | 1 | 10 | 1 | completed | 2013-10-02 |
6 | 2 | 11 | 6 | completed | 2013-10-02 |
7 | 3 | 12 | 6 | completed | 2013-10-02 |
8 | 2 | 12 | 12 | completed | 2013-10-03 |
9 | 3 | 10 | 12 | completed | 2013-10-03 |
10 | 4 | 13 | 12 | cancelled_by_driver | 2013-10-03 |
Users 表存所有用户。每个用户有唯一键 Users_Id。Banned 表示这个用户是否被禁止,Role 则是一个表示(‘client’, ‘driver’, ‘partner’)的枚举类型。
Users_Id | Banned | Role |
---|---|---|
1 | No | client |
2 | Yes | client |
3 | No | client |
4 | No | client |
10 | No | driver |
11 | No | driver |
12 | No | driver |
13 | No | driver |
写一段 SQL 语句查出 2013年10月1日 至 2013年10月3日 期间非禁止用户的取消率。基于上表,你的 SQL 语句应返回如下结果,取消率(Cancellation Rate)保留两位小数。
取消率的计算方式如下:(被司机或乘客取消的非禁止用户生成的订单数量) / (非禁止用户生成的订单总数)
Day | Cancellation Rate |
---|---|
2013-10-01 | 0.33 |
2013-10-02 | 0.00 |
2013-10-03 | 0.50 |
相关知识
- 枚举类型
解题过程
-
分析题意
- 去除被ban用户生成的订单
- 使用聚合函数,统计被取消的订单数和所有未被ban的订单数,其比值作为Cancellation Rate
- 即1-COUNT(completed)/COUNT(ALL),难点就在于一个COUNT部分,一个COUNT整体
-
编写sql
WITH CTE AS (SELECT * FROM Trips JOIN Users ON Trips.Client_Id=Users.Users_Id WHERE Users.Banned='No') SELECT c.Request_at AS `Day`, CAST((1-(c.completednum)/(a.allnum)) AS DECIMAL(4,2))AS `Cancellation Rate` FROM (SELECT Request_at, COUNT(1) AS completednum FROM CTE WHERE `Status`='completed' GROUP BY Request_at ) c, (SELECT Request_at, COUNT(1) AS allnum FROM CTE GROUP BY Request_at) a WHERE c.Request_at=a.Request_at
-
基本样例运行通过
-
提交,有一个测试样例没通过
-
input
{ "headers":{ "Trips":[ "Id", "Client_Id", "Driver_Id", "City_Id", "Status", "Request_at" ], "Users":[ "Users_Id", "Banned", "Role" ] }, "rows":{ "Trips":[ [ "1", "1", "10", "1", "cancelled_by_client", "2013-10-01" ] ], "Users":[ [ "1", "No", "client" ], [ "10", "No", "driver" ] ] } }
-
my output
{ "headers":[ "Day", "Cancellation Rate" ], "values":[ ] }
-
standard output
{ "headers":[ "Day", "Cancellation Rate" ], "values":[ [ "2013-10-01", 1 ] ] }
-
分析发现,这是因为这个表中没有complete的元组,导致内连接中的c表一行数据也没有,从而连接后一行数据也没有
- 改用右连接
- 因为completednum可能被补充成了NULL,所以需要进行判空处理
-
-
改写SQL
WITH CTE AS (SELECT * FROM Trips JOIN Users ON Trips.Client_Id=Users.Users_Id WHERE Users.Banned='No') SELECT a.Request_at AS `Day`, IFNULL(CAST((1-(c.completednum)/(a.allnum)) AS DECIMAL(4,2)),1.00) AS `Cancellation Rate` FROM (SELECT Request_at, COUNT(1) AS completednum FROM CTE WHERE `Status`='completed' GROUP BY Request_at ) c RIGHT JOIN (SELECT Request_at, COUNT(1) AS allnum FROM CTE GROUP BY Request_at) a ON c.Request_at=a.Request_at
- 2中的样例可以正常运行了,但是有新的样例没有通过,因为日期不是2013年10月1日 至 2013年10月3日
-
改写SQL
WITH CTE AS (SELECT * FROM Trips JOIN Users ON Trips.Client_Id=Users.Users_Id WHERE Users.Banned='No') SELECT a.Request_at AS `Day`, IFNULL(CAST((1-(c.completednum)/(a.allnum)) AS DECIMAL(4,2)),1.00) AS `Cancellation Rate` FROM (SELECT Request_at, COUNT(1) AS completednum FROM CTE WHERE `Status`='completed' GROUP BY Request_at ) c RIGHT JOIN (SELECT Request_at, COUNT(1) AS allnum FROM CTE GROUP BY Request_at) a ON c.Request_at=a.Request_at WHERE a.Request_at IN ("2013-10-01","2013-10-02","2013-10-03")
-
提交,运行通过
-
但是运行效率不高,修改sql,将最后一句的IN修改为BETWEEN
WITH CTE AS (SELECT * FROM Trips JOIN Users ON Trips.Client_Id=Users.Users_Id WHERE Users.Banned='No') SELECT a.Request_at AS `Day`, IFNULL(CAST((1-(c.completednum)/(a.allnum)) AS DECIMAL(4,2)),1.00) AS `Cancellation Rate` FROM (SELECT Request_at, COUNT(1) AS completednum FROM CTE WHERE `Status`='completed' GROUP BY Request_at ) c RIGHT JOIN (SELECT Request_at, COUNT(1) AS allnum FROM CTE GROUP BY Request_at) a ON c.Request_at=a.Request_at WHERE a.Request_at between '2013-10-01' and '2013-10-03'
学习总结
- 当限定值在一个连续的区间内,能用BETWEEN就不要用IN,IN的效率比BETWEEN低