DWS层建设实战-2
Day08_DWS层建设实战-商品主题统计宽表
知识点01:课程内容大纲与学习目标
#课程内容大纲
1、DWS层构建
商品主题统计宽表的实现
核心:表关系梳理、字段抽取、结果合并
用户主题统计宽表的实现思路
#学习目标
掌握主题需求的分析
掌握表关系梳理与字段抽取
知识点02:DWS层搭建--商品主题宽表
- 主题需求
- 指标
下单次数、下单件数、下单金额
被支付次数、被支付件数、被支付金额
被退款次数、被退款件数、被退款金额
被加入购物车次数、被加入购物车件数
被收藏次数
好评数、中评数、差评数
--总共15个指标
- 维度
日期(day)+商品
- 本主题建表操作
注意:建表操作需要在hive中执行,presto不支持hive的建表语法。
知识点03:DWS层搭建--商品主题宽表--需求分析
- 确定指标字段与表的关系
当需求提出指标和维度之后,我们需要做的就是确定通过哪些表能够提供支撑。
思考:是不是意味着数仓分层之后,上一层的表只能查询下一层的,能不能跨层?
答案是不一定的。
#维度
dwb_order_detail
时间、商品
#下单次数、下单件数、下单金额
dwb_order_detail
order_id: 下单次数相当于下了多少订单(有多少个包含这个商品的订单) ----> 下单次数
buy_num : 下单件数相当于下了多少商品 ----> 下单件数
total_price(无运费)/goods_amount(有运费) 每个商品下单金额指的是订单金额还是商品金额?应该是商品金额(订单中可能会包含其他商品) ----> 下单金额
#被支付次数、被支付件数、被支付金额
dwb_order_detail
#支付状态的判断
order_state: 只要不是1和7 就是已经支付状态的订单
is_pay: 这个字段也可以 0表示未支付,1表示已支付。#推荐使用这个字段来判断
#次数 件数 金额
order_id ----> 被支付次数
buy_num ----> 被支付件数
total_price ----> 被支付金额
#被退款次数、被退款件数、被退款金额
dwb_order_detail
#退款的判断
refund_id: 退款单号 is not null的就表明有退款
#次数 件数 金额
order_id ----> 被退款次数
buy_num ----> 被退款件数
total_price ----> 被退款金额
#被加入购物车次数、被加入购物车件数
yp_dwd.fact_shop_cart(能够提供购物车相关信息的只有这张表)
id: 次数
buy_num: 件数
#被收藏次数
yp_dwd.fact_goods_collect
id: 次数
#好评数、中评数、差评数
yp_dwd.fact_goods_evaluation_detail
geval_scores_goods:商品评分0-10分
#如何判断 好 中 差 (完全业务指定)
得分: >= 9 好
得分: >6 <9 中
得分:<= 6 差
概况起来,计算商品主题宽表,需要参与计算的表有:
yp_dwb.dwb_order_detail 订单明细宽表
yp_dwd.fact_shop_cart 购物车表
yp_dwd.fact_goods_collect 商品收藏表
yp_dwd.fact_goods_evaluation_detail 商品评价表
知识点04:DWS层搭建--商品主题宽表--step1--下单、支付、退款统计
- 大前提:使用row_number对数据进行去重
基于dwb_order_detail表根据商品(goods_id)进行统计各个指标的时候;
为了避免同一笔订单下有多个重复的商品出现(正常来说重复的应该合并在一起了);
应该使用row_number对order_id和goods_id进行去重。
- 下单次数、件数、金额统计
基于上述的order_base进行查询
- 支付次数、件数、金额统计
计算支付相关指标之前,可以先使用is_pay进行订单状态过滤
然后基于过滤后的数据进行统计
可以继续使用CTE引导
- 退款次数、件数、金额统计
可以先使用refund_id is not null查询出退款订单,然后进行统计
知识点05:DWS层搭建--商品主题宽表--step2--购物车、收藏统计
思考:为什么下面这两个查询需要考虑拉链的状态?上面的下单、支付等统计为什么不需要?
- 购物车次数、件数统计
-- 购物车次数、件数
cart_count as (
select substring(create_time, 1, 10) dt, goods_id sku_id,
count(id) cart_count,
sum(buy_num) cart_num
from yp_dwd.fact_shop_cart
where end_date = '9999-99-99'
group by substring(create_time, 1, 10), goods_id
),
- 收藏次数统计
-- 收藏次数
favor_count as (
select substring(c.create_time, 1, 10) dt, goods_id sku_id,
count(c.id) favor_count
from yp_dwd.fact_goods_collect c
where end_date='9999-99-99'
group by substring(c.create_time, 1, 10), goods_id
),
知识点06:DWS层搭建--商品主题宽表--step3--好评、中评、差评
- 好评、中评、差评次数
根据评分来判断好评、中评、差评
老师上课SQL
insert into yp_dws.dws_sku_daycount
select * from (
with t1 as (
select
-- 维度字段
dt,
goods_id ,
goods_name ,
--指标字段
order_id ,
buy_num ,
total_price ,
-- 判断字段:
order_state ,
is_pay ,
refund_id,
refund_state,
--去重处理
row_number() over(partition by order_id,goods_id) as rn1
from yp_dwb.dwb_order_detail
),
t2 as(
select
dt,
goods_id,
goods_name,
count(distinct order_id) as order_count, -- 下单次数
sum(buy_num) as order_num, -- 下单件数,
sum(total_price) as order_amount -- 下单金额
from t1 where rn1 = 1
group by dt,goods_id ,goods_name
),
t3 as (
select
dt,
goods_id,
goods_name,
count(distinct order_id) as payment_count, -- 被支付次数
sum(buy_num) as payment_num, -- 被支付件数,
sum(total_price) as payment_amount -- 被支付金额
from t1 where rn1 = 1 and order_state not in(1,7) and is_pay = 1
group by dt,goods_id ,goods_name
),
t4 as (
select
dt,
goods_id,
goods_name,
count(distinct order_id) as refund_count , -- 被退款次数
sum(buy_num) as refund_num , -- 被退款件数,
sum(total_price) as refund_amount -- 被退款金额
from t1 where rn1 = 1 and refund_id is not null
group by dt,goods_id ,goods_name
),
t5 as (
select
substr(sc.create_time,1,10)as dt,
od.goods_id,
goods_name,
count(id) as cart_count , -- 被加入购物车次数
sum(sc.buy_num) as cart_num -- 被加入购物车件数,
from yp_dwd.fact_shop_cart sc left join yp_dwb.dwb_order_detail od on sc.goods_id = od.goods_id
group by substr(sc.create_time,1,10), od.goods_id ,goods_name
),
t6 as (
select
substr(gc.create_time,1,10) as dt,
od.goods_id,
goods_name,
count(id) as favor_count -- 收藏次数
from yp_dwd.fact_goods_collect gc left join yp_dwb.dwb_order_detail od on gc.goods_id = od.goods_id
group by substr(gc.create_time,1,10), od.goods_id,goods_name
),
t7 as (
select
substr(ged.create_time,1,10) as dt,
od.goods_id,
goods_name,
count(if(geval_scores_goods >= 9,1,null)) as evaluation_good_count, -- 好评数
count(if(geval_scores_goods >= 6 and geval_scores_goods < 9,1,null)) as evaluation_mid_count, -- 好评数
count(if(geval_scores_goods < 6,1,null)) as evaluation_bad_count -- 好评数
from yp_dwd.fact_goods_evaluation_detail ged left join yp_dwb.dwb_order_detail od on ged.goods_id = od.goods_id
group by substr(ged.create_time,1,10), od.goods_id,goods_name
)
,
t8 as (
select
coalesce(t2.dt, t3.dt, t4.dt, t5.dt, t6.dt, t7.dt) as dt,
coalesce(t2.goods_id, t3.goods_id, t4.goods_id, t5.goods_id, t6.goods_id,
t7.goods_id) as sku_id,
coalesce(t2.goods_name, t3.goods_name, t4.goods_name, t5.goods_name, t6.goods_name,
t7.goods_name) as sku_name,
coalesce(t2.order_count, 0) as order_count,
coalesce(t2.order_num, 0) as order_num,
coalesce(t2.order_amount, 0) as order_amount,
coalesce(t3.payment_count, 0) as payment_count,
coalesce(t3.payment_num, 0) as payment_num,
coalesce(t3.payment_amount, 0) as payment_amount,
coalesce(t4.refund_count, 0) as refund_count,
coalesce(t4.refund_num, 0) as refund_num,
coalesce(t4.refund_amount, 0) as refund_amount,
coalesce(t5.cart_count, 0) as cart_count,
coalesce(t5.cart_num, 0) as cart_num,
coalesce(t6.favor_count, 0) as favor_count,
coalesce(t7.evaluation_good_count, 0) as evaluation_good_count,
coalesce(t7.evaluation_mid_count, 0) as evaluation_mid_count,
coalesce(t7.evaluation_bad_count, 0) as evaluation_bad_count
from t2
full join t3 on t2.dt = t3.dt and t2.goods_id = t3.goods_id
full join t4 on t3.dt = t4.dt and t3.goods_id = t4.goods_id
full join t5 on t4.dt = t5.dt and t4.goods_id = t5.goods_id
full join t6 on t5.dt = t6.dt and t5.goods_id = t6.goods_id
full join t7 on t6.dt = t7.dt and t6.goods_id = t7.goods_id
)
select
dt,
sku_id,
sku_name,
sum(order_count),
sum(order_num),
sum(order_amount),
sum(payment_count),
sum(payment_num),
sum(payment_amount),
sum(refund_count),
sum(refund_num),
sum(refund_amount),
sum(cart_count),
sum(cart_num),
sum(favor_count),
sum(evaluation_good_count),
sum(evaluation_mid_count),
sum(evaluation_bad_count)
from t8
group by dt,sku_id,sku_name
)
参考代码
insert into hive.yp_dws.dws_sku_daycount
select * from (
with order_t1 as (
select
-- 维度字段
dt,
goods_id ,
goods_name ,
--指标字段
order_id ,
buy_num ,
total_price ,
-- 判断字段:
order_state ,
is_pay ,
refund_id,
refund_state,
--去重处理
row_number() over(partition by order_id,goods_id) as rn1
from hive.yp_dwb.dwb_order_detail
),
-- 下单次数、下单件数、下单金额
order_t2 as(
select
dt,
goods_id,
goods_name,
count(order_id) as order_count,
sum(buy_num) as order_num,
sum(total_price) as order_amount
from order_t1 where rn1 = 1
group by dt,goods_id,goods_name
),
-- 被支付次数、被支付件数、被支付金额
payment_t3 as (
select
dt,
goods_id,
goods_name,
count(order_id) as payment_count,
sum(buy_num) as payment_num,
sum(total_price) as payment_amount
-- 过滤出已经支付的订单数据: order_state 不能为 1和7 并且 is_pay =1
from order_t1 where rn1 = 1 and order_state not in(1,7) and is_pay = 1
group by dt,goods_id,goods_name
),
-- 被退款次数、被退款件数、被退款金额
refund_t4 as (
select
dt,
goods_id,
goods_name,
count(order_id) as refund_count,
sum(buy_num) as refund_num,
sum(total_price) as refund_amount
-- 过滤出已经退款的订单数据: refund_id is not null 并且 refund_state = 5
from order_t1 where rn1 = 1 and refund_id is not null and refund_state = 5
group by dt,goods_id,goods_name
),
-- 被加入购物车次数、被加入购物车件数
cart_t5 as (
select
substr(sc.create_time ,1,10) as dt,
sc.goods_id ,
g.goods_name ,
count(sc.id) as cart_count,
sum(sc.buy_num) as cart_num
from hive.yp_dwd.fact_shop_cart sc
left join hive.yp_dwb.dwb_goods_detail g
on sc.goods_id = g.id
group by substr(sc.create_time ,1,10) ,sc.goods_id ,g.goods_name
),
-- 被收藏次数
favor_t6 as(
select
substr(gc.create_time,1,10) as dt,
gc.goods_id ,
g.goods_name ,
count(gc.id) as favor_count
from hive.yp_dwd.fact_goods_collect gc
left join hive.yp_dwb.dwb_goods_detail g
on gc.goods_id = g.id
group by substr(gc.create_time,1,10), gc.goods_id ,g.goods_name
),
-- 好评, 中评 和 差评
evaluation_t7 as (
select
substr(ed.create_time,1,10) as dt,
ed.goods_id ,
g.goods_name,
count(
if(ed.geval_scores_goods is null or ed.geval_scores_goods >8 , ed.id, null)
) as evaluation_good_count,
count(
if(ed.geval_scores_goods is not null and ed.geval_scores_goods between 6 and 8, ed.id, null)
) as evaluation_mid_count,
count(
if(ed.geval_scores_goods is not null and ed.geval_scores_goods < 6, ed.id, null)
) as evaluation_bad_count
from hive.yp_dwd.fact_goods_evaluation_detail ed
left join hive.yp_dwb.dwb_goods_detail g
on ed.goods_id = g.id
group by substr(ed.create_time,1,10), ed.goods_id ,g.goods_name
),
t8 as(
select
coalesce(t2.dt,t3.dt,t4.dt,t5.dt,t6.dt,t7.dt) as dt,
coalesce(t2.goods_id,t3.goods_id,t4.goods_id,t5.goods_id,t6.goods_id,t7.goods_id) as sku_id,
coalesce(t2.goods_name,t3.goods_name,t4.goods_name,t5.goods_name,t6.goods_name,t7.goods_name) as sku_name,
coalesce(t2.order_count,0) as order_count,
coalesce(t2.order_num,0) as order_num,
coalesce(t2.order_amount,0) as order_amount,
coalesce(t3.payment_count,0) as payment_count,
coalesce(t3.payment_num,0) as payment_num,
coalesce(t3.payment_amount,0) as payment_amount,
coalesce(t4.refund_count,0) as refund_count,
coalesce(t4.refund_num,0) as refund_num,
coalesce(t4.refund_amount,0) as refund_amount,
coalesce(t5.cart_count,0) as cart_count,
coalesce(t5.cart_num,0) as cart_num,
coalesce(t6.favor_count,0) as favor_count,
coalesce(t7.evaluation_good_count,0) as evaluation_good_count,
coalesce(t7.evaluation_mid_count,0) as evaluation_mid_count,
coalesce(t7.evaluation_bad_count,0) as evaluation_bad_count
from order_t2 t2
full join payment_t3 t3 on t2.dt = t3.dt and t2.goods_id = t3.goods_id
full join refund_t4 t4 on t2.dt = t4.dt and t2.goods_id = t4.goods_id
full join cart_t5 t5 on t2.dt = t5.dt and t2.goods_id = t5.goods_id
full join favor_t6 t6 on t2.dt = t6.dt and t2.goods_id = t6.goods_id
full join evaluation_t7 t7 on t2.dt = t7.dt and t2.goods_id = t7.goods_id
)
select
dt,
sku_id,
sku_name,
sum(order_count) as order_count,
sum(order_num) as order_num,
sum(order_amount) as order_amount,
sum(payment_count) as payment_count,
sum(payment_num) as payment_num,
sum(payment_amount) as payment_amount,
sum(refund_count) as refund_count,
sum(refund_num) as refund_num,
sum(refund_amount) as refund_amount,
sum(cart_count) as cart_count,
sum(cart_num) as cart_num,
sum(favor_count) as favor_count,
sum(evaluation_good_count) as evaluation_good_count,
sum(evaluation_mid_count) as evaluation_mid_count,
sum(evaluation_bad_count) as evaluation_bad_count
from t8 group by dt,sku_id,sku_name
)
DWS层搭建--用户主题宽表--需求分析
- 主题需求
- 指标
登录次数、收藏店铺数、收藏商品数、加入购物车次数、加入购物车金额、下单次数、下单金额、支付次数、支付金额
- 维度
用户、日期
- 建表操作
注意:建表操作需要在hive中执行,presto不支持hive的建表语法。
- 确定字段与表关系
#登录次数
yp_dwd.fact_user_login
id
login_user
#收藏店铺数、收藏商品数
yp_dwd.fact_store_collect
id
user_id
#加入购物车次数、加入购物车金额
yp_dwd.fact_shop_cart
yp_dwb.dwb_goods_detail
#因为购物车中没有金额,因此需要和商品详情表进行关联
goods_promotion_price: 商品促销价格(售价)
#下单次数、下单金额
yp_dwd.fact_shop_order
yp_dwd.fact_shop_order_address_detail
#通过订单主副表可以提供
#支付次数、支付金额
yp_dwd.fact_trade_record
- 最终实现合并的方式
- 使用full join合并
用户主题的统计宽表,作为项目练习由大家自己完成。
所用到的技术点之前已经完成铺垫完毕。
----------------------------------课堂代码----------------------------------
insert into yp_dws.dws_user_daycount
select * from
(
-- 登录次数
with login_count as (
select
login_user as user_id,
count(id) as login_count,
dt
from yp_dwd.fact_user_login
group by login_user, dt
),
-- 店铺收藏数
store_collect_count as (
select
user_id,
count(id) as store_collect_count,
substring(create_time, 1, 10) as dt
from yp_dwd.fact_store_collect
where end_date='9999-99-99'
group by user_id, substring(create_time, 1, 10)
),
-- 商品收藏数
goods_collect_count as (
select
user_id,
count(id) as goods_collect_count,
substring(create_time, 1, 10) as dt
from yp_dwd.fact_goods_collect
where end_date='9999-99-99'
group by user_id, substring(create_time, 1, 10)
),
-- 加入购物车次数和金额
cart_count_amount as (
select
count(cart.id) as cart_count,
sum(coalesce(g.goods_promotion_price,0)) as cart_amount,
buyer_id as user_id,
substring(cart.create_time, 1, 10) as dt
from yp_dwd.fact_shop_cart cart, yp_dwb.dwb_goods_detail g
where cart.end_date='9999-99-99' and cart.goods_id=g.id
group by buyer_id, substring(cart.create_time, 1, 10)
),
-- 订单宽表,为后面的下单次数和金额准备数据
order_base as (
select
buyer_id,
create_date, order_id,
order_amount,
row_number() over(partition by orderd.order_id) rn
from yp_dwb.dwb_order_detail orderd
where orderd.is_valid=1
),
-- 下单次数和金额
order_count_amount as (
select
o.buyer_id user_id,
substring(o.create_date,1,10) dt,
count(o.order_id) order_count,
sum(coalesce(o.order_amount,0)) order_amount
from order_base o
where o.rn=1
group by o.buyer_id, substring(o.create_date,1,10)
),
-- 支付次数和金额
payment_count_amount as (
select
count(id) as payment_count,
sum(coalesce(order_amount,0)) as payment_amount,
create_user user_id,
substring(create_time, 1, 10) as dt
from yp_dwd.fact_shop_order_address_detail
where is_valid = 1 and pay_time is not null and end_date='9999-99-99'
group by create_user, substring(create_time, 1, 10)
),
fulljoin as (
select
coalesce(lc.dt, scc.dt, gcc.dt, cc.dt, oc.dt, pc.dt) dt ,
coalesce(lc.user_id, scc.user_id, gcc.user_id, cc.user_id, oc.user_id, pc.user_id) user_id ,
coalesce(login_count,0) login_count ,
coalesce(store_collect_count,0) store_collect_count ,
coalesce(goods_collect_count,0) goods_collect_count ,
coalesce(cart_count,0) cart_count ,
coalesce(cart_amount,0) cart_amount ,
coalesce(order_count,0) order_count ,
coalesce(order_amount,0) order_amount ,
coalesce(payment_count,0) payment_count ,
coalesce(payment_amount,0) payment_amount
from login_count lc
full join store_collect_count scc on lc.dt=scc.dt and lc.user_id=scc.user_id
full join goods_collect_count gcc on lc.dt=gcc.dt and lc.user_id=gcc.user_id
full join cart_count_amount cc on lc.dt=cc.dt and lc.user_id=cc.user_id
full join order_count_amount oc on lc.dt=oc.dt and lc.user_id=oc.user_id
full join payment_count_amount pc on lc.dt=pc.dt and lc.user_id=pc.user_id
)
select
dt,
user_id,
-- 登录次数
sum(coalesce(login_count,0)) login_count,
-- 店铺收藏数
sum(coalesce(store_collect_count,0)) store_collect_count,
-- 商品收藏数
sum(coalesce(goods_collect_count,0)) goods_collect_count,
-- 加入购物车次数和金额
sum(coalesce(cart_count,0)) cart_count,
sum(coalesce(cart_amount,0)) cart_amount,
-- 下单次数和金额
sum(coalesce(order_count,0)) order_count,
sum(coalesce(order_amount,0)) order_amount,
-- 支付次数和金额
sum(coalesce(payment_count,0)) payment_count,
sum(coalesce(payment_amount,0)) payment_amount
from fulljoin
group by dt, user_id
)
----------------------------------参考代码----------------------------------
insert into yp_dws.dws_user_daycount
-- 登录次数
with login_count as (
select
login_user as user_id,
count(id) as login_count,
dt
from yp_dwd.fact_user_login
group by login_user, dt
),
-- 店铺收藏数
store_collect_count as (
select
user_id,
count(id) as store_collect_count,
substring(create_time, 1, 10) as dt
from yp_dwd.fact_store_collect
where end_date='9999-99-99'
group by user_id, substring(create_time, 1, 10)
),
-- 商品收藏数
goods_collect_count as (
select
user_id,
count(id) as goods_collect_count,
substring(create_time, 1, 10) as dt
from yp_dwd.fact_goods_collect
where end_date='9999-99-99'
group by user_id, substring(create_time, 1, 10)
),
-- 加入购物车次数和金额
cart_count_amount as (
select
count(cart.id) as cart_count,
sum(coalesce(g.goods_promotion_price,0)) as cart_amount,
buyer_id as user_id,
substring(cart.create_time, 1, 10) as dt
from yp_dwd.fact_shop_cart cart, yp_dwb.dwb_goods_detail g
where cart.end_date='9999-99-99' and cart.goods_id=g.id
group by buyer_id, substring(cart.create_time, 1, 10)
),
-- 订单宽表,为后面的下单次数和金额准备数据
order_base as (
select
buyer_id,
create_date, order_id,
order_amount,
row_number() over(partition by orderd.order_id) rn
from yp_dwb.dwb_order_detail orderd
where orderd.is_valid=1
),
-- 下单次数和金额
order_count_amount as (
select
o.buyer_id user_id,
substring(o.create_date,1,10) dt,
count(o.order_id) order_count,
sum(coalesce(o.order_amount,0)) order_amount
from order_base o
where o.rn=1
group by o.buyer_id, substring(o.create_date,1,10)
),
-- 支付次数和金额
payment_count_amount as (
select
count(id) as payment_count,
sum(coalesce(order_amount,0)) as payment_amount,
create_user user_id,
substring(create_time, 1, 10) as dt
from yp_dwd.fact_shop_order_address_detail
where is_valid = 1 and pay_time is not null and end_date='9999-99-99'
group by create_user, substring(create_time, 1, 10)
),
fulljoin as (
select
coalesce(lc.dt, scc.dt, gcc.dt, cc.dt, oc.dt, pc.dt) dt ,
coalesce(lc.user_id, scc.user_id, gcc.user_id, cc.user_id, oc.user_id, pc.user_id) user_id ,
coalesce(login_count,0) login_count ,
coalesce(store_collect_count,0) store_collect_count ,
coalesce(goods_collect_count,0) goods_collect_count ,
coalesce(cart_count,0) cart_count ,
coalesce(cart_amount,0) cart_amount ,
coalesce(order_count,0) order_count ,
coalesce(order_amount,0) order_amount ,
coalesce(payment_count,0) payment_count ,
coalesce(payment_amount,0) payment_amount
from login_count lc
full join store_collect_count scc on lc.dt=scc.dt and lc.user_id=scc.user_id
full join goods_collect_count gcc on lc.dt=gcc.dt and lc.user_id=gcc.user_id
full join cart_count_amount cc on lc.dt=cc.dt and lc.user_id=cc.user_id
full join order_count_amount oc on lc.dt=oc.dt and lc.user_id=oc.user_id
full join payment_count_amount pc on lc.dt=pc.dt and lc.user_id=pc.user_id
)
select
dt,
user_id,
-- 登录次数
sum(coalesce(login_count,0)) login_count,
-- 店铺收藏数
sum(coalesce(store_collect_count,0)) store_collect_count,
-- 商品收藏数
sum(coalesce(goods_collect_count,0)) goods_collect_count,
-- 加入购物车次数和金额
sum(coalesce(cart_count,0)) cart_count,
sum(coalesce(cart_amount,0)) cart_amount,
-- 下单次数和金额
sum(coalesce(order_count,0)) order_count,
sum(coalesce(order_amount,0)) order_amount,
-- 支付次数和金额
sum(coalesce(payment_count,0)) payment_count,
sum(coalesce(payment_amount,0)) payment_amount
from fulljoin
group by dt, user_id ; -- order by dt, user_id ;