然而,面对稍稍复杂的查询计算需求,SQL 就会显得力不从心,经常写出几百行有多层嵌套的语句。这种 SQL,不要说非技术人员难以完成,即使对于专业程序员也不是件容易的事,常常成为很多软件企业应聘考试的重头戏。三行五行的 SQL 仅存在教科书和培训班,现实中用于报表查询的 SQL 通常是以“K”计的。
sales_amount | 销售业绩表 |
sales | 销售员姓名,假定无重名 |
product | 销售的产品 |
amount | 该销售员在该产品上的销售额 |
现在我们想知道出空调和电视销售额都在前 10 名的销售员名单。
1. 按空调销售额排序,找出前 10 名;
2. 按电视销售额排序,找出前 10 名;
select * from
( select top 10 sales from sales_amount where product='AC' order by amount desc )
intersect
( select top 10 sales from sales_amount where product='TV' order by amount desc )
with A as
select top 10 sales from sales_amount where product='AC' order by amount desc
B as
select top 10 sales from sales_amount where product='TV' order by amount desc
select * from A intersect B
1. 列出所有产品;
2.将每种产品的前 10 名取出,分别保存;
1.将数据按产品分组,将每组排序,取出前 10 名;
select sales
from ( select sales,
from ( select sales,
rank() over (partition by product order by amount desc ) ranking
from sales_amount)
where ranking <=10 )
group by sales
having count(*)=(select count(distinct product) from sales_amount)
这是能写出来,但这样复杂的 SQL,有多少人会写呢?
前两种简单的思路无法用 SQL 实现,只能采用第三种迂回的思路。这里的原因在于 SQL 的一个重要缺点:集合化不彻底。
虽然 SQL 有集合概念,但并未把集合作为一种基础数据类型提供,不能让变量或字段的取值是个集合,除了表之外也没有其它集合形式的数据类型,这使得大量集合运算在思维和书写时都需要绕路。
我们在上面的计算中使用了关键字 top,事实上关系代数理论中没有这个东西(它可以被别的计算组合出来),这不是 SQL 的标准写法。
我们来看一下没有 top 时找前 10 名会有多困难?
大体思路是这样:找出比自己大的成员个数作为是名次,然后取出名次不超过 10 的成员,写出的 SQL 如下:
select sales
from ( select A.sales sales, A.product product,
(select count(*)+1 from sales_amount
where A.product=product AND A.amount<=amount) ranking
from sales_amount A )
where product='AC' AND ranking<=10
select sales
from ( select A.sales sales, A.product product, count(*)+1 ranking
from sales_amount A, sales_amount B
where A.sales=B.sales and A.product=B.product AND A.amount<=B.amount
group by A.sales,A.product )
where product='AC' AND ranking<=10
这样的 SQL 语句,专业程序员写出来也未必容易吧!而仅仅是计算了一个前 10 名。
employee | 员工表 |
name | 员工姓名,假定无重名 |
gender | 员工性别 |
我们已经计算出“好”销售员的名单,比较自然的想法,是用名单到花名册时找出其性别,再计一下数。但在 SQL 中要跨表获得信息需要用表间连接,这样,接着最初的结果,SQL 就会写成:
select employee.gender,count(*)
from employee,
( ( select top 10 sales from sales_amount where product='AC' order by amount desc )
intersect
( select top 10 sales from sales_amount where product='TV' order by amount desc ) ) A
where A.sales=employee.name
group by employee.gender
select sales.gender,count(*)
from (…) // …是前面计算“好”销售员的SQL
group by sales.gender
显然,这个句子不仅更清晰,同时计算效率也会更高(没有连接计算)。
mov ax,3
mov bx,5
mul bx,7
add ax,bx
这样的代码无论书写还是阅读都远不如 3+5*7 了(要是碰到小数就更要命了)。虽然对于熟练的程序员也算不了太大的麻烦,但对于大多数人而言,这种写法还是过于晦涩难懂了,从这个意义上讲,FORTRAN 确实是个伟大的发明。
这些问题本身应该也算不上很复杂,都是在日常数据分析中经常会出现的,但已经很难为 SQL 了。
集合无序
select name, birthday
from (select name, birthday, row_number() over (order by birthday) ranking
from employee )
where ranking=(select floor((count(*)+1)/2) from employee)
中位数是个常见的计算,本来只要很简单地在排序后的集合中取出位置居中的成员。但 SQL 的无序集合机制不提供直接用位置访问成员的机制,必须人为造出一个序号字段,再用条件查询方法将其选出,导致必须采用子查询才能完成。
select max (consecutive_day)
from (select count(*) (consecutive_day
from (select sum(rise_mark) over(order by trade_date) days_no_gain
from (select trade_date,
case when
closing_price>lag(closing_price) over(order by trade_date)
then 0 else 1 END rise_mark
from stock_price) )
group by days_no_gain)
无序的集合也会导致思路变形。
集合化不彻底
with A as
select top 10 sales from sales_amount where product='AC' order by amount desc
B as
select top 10 sales from sales_amount where product='TV' order by amount desc
select * from A intersect B
0
分组的本意是将源集合分拆成的多个子集合,其返回值也应当是这些子集。但 SQL 无法表示这种“由集合构成的集合”,因而强迫进行下一步针对这些子集的汇总计算而形成常规的结果集。
with A as
select top 10 sales from sales_amount where product='AC' order by amount desc
B as
select top 10 sales from sales_amount where product='TV' order by amount desc
select * from A intersect B
1
缺乏对象引用
with A as
select top 10 sales from sales_amount where product='AC' order by amount desc
B as
select top 10 sales from sales_amount where product='TV' order by amount desc
select * from A intersect B
2
用子查询
with A as
select top 10 sales from sales_amount where product='AC' order by amount desc
B as
select top 10 sales from sales_amount where product='TV' order by amount desc
select * from A intersect B
3
如果员工表中的部门字段是指向部门表中的记录,而部门表中的经理字段是指向员工表的记录,那么这个查询条件只要简单地写成这种直观高效的形式:
with A as
select top 10 sales from sales_amount where product='AC' order by amount desc
B as
select top 10 sales from sales_amount where product='TV' order by amount desc
select * from A intersect B
4
with A as
select top 10 sales from sales_amount where product='AC' order by amount desc
B as
select top 10 sales from sales_amount where product='TV' order by amount desc
select * from A intersect B
5
用子查询
with A as
select top 10 sales from sales_amount where product='AC' order by amount desc
B as
select top 10 sales from sales_amount where product='TV' order by amount desc
select * from A intersect B
6
A | |
1 | =employee.sort(birthday) |
2 | =A1((A1.len()+1)/2) |
对于以有序集合为基础的 SPL 来说,按位置取值是个很简单的任务。
任务 2
A | |
1 | =stock_price.sort(trade_date) |
2 | =0 |
3 | =A1.max(A2=if(close_price>close_price[-1],A2+1,0)) |
SPL 按自然的思路过程编写计算代码即可。
任务 3
A | |
1 | =employee.group(month(birthday),day(birthday)) |
2 | =A1.select(~.len()>1).conj() |
SPL 可以保存分组结果集,继续处理就和常规集合一样。
任务 4
A | |
1 | =score_table.group(subject) |
2 | =A1.(~.rank(score).pselect@a(~<=10)) |
3 | =A1.(~(A2(#)).(name)).isect() |
使用 SPL 只要按思路过程写出计算代码即可。
任务 5
A | |
1 | =employee.select(gender=="male" && department.manager.gender=="female") |
支持对象引用的 SPL 可以简单地将外键指向记录的字段当作自己的属性访问。
任务 6
A | |
1 | =employee.new(name,resume.minp(start_date).company:first_company) |
SPL 支持将子表集合作为主表字段,就如同访问其它字段一样,子表无需重复计算。
with A as
select top 10 sales from sales_amount where product='AC' order by amount desc
B as
select top 10 sales from sales_amount where product='TV' order by amount desc
select * from A intersect B
7
GitHub 地址:
https://github.com/SPLWare/esProc
原文来自「逛逛GitHub」|侵删
●
●
●
●
●
推荐站内搜索:最好用的开发软件、免费开源系统、渗透测试工具云盘下载、最新渗透测试资料、最新黑客工具下载……
还没有评论,来说两句吧...