大数据SQL题43 间断连续登录用户问题
现要求统计各用户最长的连续登录天数,间断一天也算作连续,例如:一个用户在1,3,5,6登录,则视为连续6天登录。现有各用户的登录记录表(login_events)如下,表中每行数据表达的信息是一个用户何时登录了平台。
·
原题链接:http://practice.atguigu.cn/#/question/43/desc?qType=SQL
题目需求
现有各用户的登录记录表(login_events)如下,表中每行数据表达的信息是一个用户何时登录了平台。
| user_id | login_datetime |
|---|---|
| 100 | 2021-12-01 19:00:00 |
| 100 | 2021-12-01 19:30:00 |
| 100 | 2021-12-02 21:01:00 |
现要求统计各用户最长的连续登录天数,间断一天也算作连续,例如:一个用户在1,3,5,6登录,则视为连续6天登录。期望结果如下:
| user_id(用户id) | max_day_count(最大连续天数) |
|---|---|
| 100 | 3 |
| 101 | 6 |
| 102 | 3 |
| 104 | 3 |
| 105 | 1 |
解题思路
- 使用类似于42题的区间划分方法
SELECT user_id,
MAX(day_count) AS max_day_count
FROM
(
SELECT user_id,
DATEDIFF(MAX(login_date),MIN(login_date)) + 1 AS day_count
FROM
(
SELECT user_id,
login_date,
SUM(is_first) OVER (PARTITION BY user_id ORDER BY login_date) AS interval_id
FROM
(
SELECT user_id,
login_date,
IF(DATEDIFF(login_date,last_login_date) > 2,1,0) AS is_first
FROM
(
SELECT user_id,
login_date,
LAG(login_date,1,'1970-01-01') OVER (PARTITION BY user_id ORDER BY login_date) AS last_login_date
FROM
(
SELECT user_id,
date(login_datetime) AS login_date
FROM login_events
GROUP BY user_id,
date(login_datetime)
) t1
) t2
) t3
) t4
GROUP BY user_id,
interval_id
) t5
GROUP BY user_id
- 将符合条件的间断天数补齐
-- 将符合条件的间断天数补齐
WITH new_login AS
(
SELECT user_id,
new_login_date AS login_date
FROM
(
SELECT user_id,
login_date,
lead(login_date,1,'9999-12-31') OVER (PARTITION BY user_id ORDER BY login_date) AS next_login_date
FROM
(
SELECT user_id,
date_format(login_datetime,'yyyy-MM-dd') login_date
FROM login_events
GROUP BY user_id,
date_format(login_datetime,'yyyy-MM-dd') --按照用户和日期去重
) t1
) t2 LATERAL VIEW explode(IF(DATEDIFF(next_login_date, login_date) = 2, array(login_date, date_add(login_date, 1)), array(login_date))) tmp AS new_login_date
)
-- 随后再使用其他连续天数划分的处理方法
SELECT user_id,
MAX(day_count) AS max_day_count
FROM
(
SELECT user_id,
DATEDIFF(MAX(login_date),MIN(login_date)) + 1 AS day_count
FROM
(
SELECT user_id,
login_date,
SUM(is_first) OVER (PARTITION BY user_id ORDER BY login_date) AS interval_id
FROM
(
SELECT user_id,
login_date,
IF(DATEDIFF(login_date,last_login_date) > 1,1,0) AS is_first
FROM
(
SELECT user_id,
login_date,
LAG(login_date,1,'1970-01-01') OVER (PARTITION BY user_id ORDER BY login_date) AS last_login_date
FROM new_login
) t1
) t2
) t3
GROUP BY user_id,
interval_id
) t4
GROUP BY user_id
更多推荐


所有评论(0)