原题链接:http://practice.atguigu.cn/#/question/43/desc?qType=SQL

题目需求

现有各用户的登录记录表(login_events)如下,表中每行数据表达的信息是一个用户何时登录了平台。

user_id login_datetime
100 2021-12-01 19:00:00
100 2021-12-01 19:30:00
100 2021-12-02 21:01:00

现要求统计各用户最长的连续登录天数,间断一天也算作连续,例如:一个用户在1,3,5,6登录,则视为连续6天登录。期望结果如下:

user_id(用户id) max_day_count(最大连续天数)
100 3
101 6
102 3
104 3
105 1

解题思路

  1. 使用类似于42题的区间划分方法
SELECT  user_id,
        MAX(day_count) AS max_day_count
FROM
(
	SELECT  user_id,
	        DATEDIFF(MAX(login_date),MIN(login_date)) + 1 AS day_count
	FROM
	(
		SELECT  user_id,
		        login_date,
		        SUM(is_first) OVER (PARTITION BY user_id ORDER BY  login_date) AS interval_id
		FROM
		(
			SELECT  user_id,
			        login_date,
			        IF(DATEDIFF(login_date,last_login_date) > 2,1,0) AS is_first
			FROM
			(
				SELECT  user_id,
				        login_date,
				        LAG(login_date,1,'1970-01-01') OVER (PARTITION BY user_id ORDER BY  login_date) AS last_login_date
				FROM
				(
					SELECT  user_id,
					        date(login_datetime) AS login_date
					FROM login_events
					GROUP BY  user_id,
					          date(login_datetime)
				) t1
			) t2
		) t3
	) t4
	GROUP BY  user_id,
	          interval_id
) t5
GROUP BY  user_id
  1. 将符合条件的间断天数补齐
-- 将符合条件的间断天数补齐
WITH new_login AS
(
	SELECT  user_id,
	        new_login_date AS login_date
	FROM
	(
		SELECT  user_id,
		        login_date,
		        lead(login_date,1,'9999-12-31') OVER (PARTITION BY user_id ORDER BY  login_date) AS next_login_date
		FROM
		(
			SELECT  user_id,
			        date_format(login_datetime,'yyyy-MM-dd') login_date
			FROM login_events
			GROUP BY  user_id,
			          date_format(login_datetime,'yyyy-MM-dd') --按照用户和日期去重
		) t1
	) t2 LATERAL VIEW explode(IF(DATEDIFF(next_login_date, login_date) = 2, array(login_date, date_add(login_date, 1)), array(login_date))) tmp AS new_login_date
) 
-- 随后再使用其他连续天数划分的处理方法
SELECT  user_id,
        MAX(day_count) AS max_day_count
FROM
(
	SELECT  user_id,
	        DATEDIFF(MAX(login_date),MIN(login_date)) + 1 AS day_count
	FROM
	(
		SELECT  user_id,
		        login_date,
		        SUM(is_first) OVER (PARTITION BY user_id ORDER BY  login_date) AS interval_id
		FROM
		(
			SELECT  user_id,
			        login_date,
			        IF(DATEDIFF(login_date,last_login_date) > 1,1,0) AS is_first
			FROM
			(
				SELECT  user_id,
				        login_date,
				        LAG(login_date,1,'1970-01-01') OVER (PARTITION BY user_id ORDER BY  login_date) AS last_login_date
				FROM new_login
			) t1
		) t2
	) t3
	GROUP BY  user_id,
	          interval_id
) t4
GROUP BY  user_id
Logo

永洪科技,致力于打造全球领先的数据技术厂商,具备从数据应用方案咨询、BI、AIGC智能分析、数字孪生、数据资产、数据治理、数据实施的端到端大数据价值服务能力。

更多推荐