Mysql何时会使用内部临时表

Mysql除了我们显式的创建临时表外，在某些情况下会用到内部临时表，在开始之前可以先用下面脚本创建一个测试库

create table t1(id int primary key, a int, b int, index(a));
delimiter ;;
create procedure idata()
begin
  declare i int;

  set i=1;
  while(i<=1000)do
    insert into t1 values(i, i, i);
    set i=i+1;
  end while;
end;;
delimiter ;
call idata();

Union

如下面的语句，取俩条语句结果的并集

1	(select 1000 as f) union (select id from t1 order by id desc limit 2);

执行计划如下：

+----+--------------+------------+------------+-------+---------------+---------+---------+------+------+----------+-----------------+
| id | select_type  | table      | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra           |
+----+--------------+------------+------------+-------+---------------+---------+---------+------+------+----------+-----------------+
|  1 | PRIMARY      | NULL       | NULL       | NULL  | NULL          | NULL    | NULL    | NULL | NULL |     NULL | No tables used  |
|  2 | UNION        | t1         | NULL       | index | NULL          | PRIMARY | 4       | NULL |    2 |   100.00 | Using index     |
| NULL | UNION RESULT | <union1,2> | NULL       | ALL   | NULL          | NULL    | NULL    | NULL | NULL |     NULL | Using temporary |
+----+--------------+------------+------------+-------+---------------+---------+---------+------+------+----------+-----------------+

最后的Extra字段Using temporary表示使用了临时表

该语句流程如下：

创建一个内存临时表，只有一个字段是f，f是主键;
将第一条查询语句结果插入到临时表中
执行第2个子查询
1. 将符合条件的插入到临时表中，如果是union取并集，违反唯一约束写入失败；
2. 继续插入直到不符合规则
返回结果，删除临时表
注:如果是union all因为没有计算并集的需求也就没有唯一约束就不用临时表了

GROUP BY

例如下面的语句

1	select id%10 as m, count(*) as c from t1 group by m;

执行计划如下：

+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                        |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------------------+
|  1 | SIMPLE      | t1    | NULL       | index | PRIMARY,a     | a    | 5       | NULL | 1000 |   100.00 | Using index; Using temporary; Using filesort |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------------------+

通过Extra字段可以看到3个信息

Using index 因为需要遍历索引树，这里没选PRIMARY原因是因为a索引够用（只取count），并且相对PK更小
Using temporary 用了临时表
Using filesort 排序

流程如下：

创建内存临时表，创建俩个字段（m,c），且m为主键
将id%10，然后得到m，写入到临时表
1. 如果m不存在，insert m,1
2. 如果m存在，update c字段+1；
然后临时表通过filesort排序，内存临时表rowid排序，见前面的文章
返回结果，移除临时表。

上面提到里都是创建一个内存临时表，这是由参数tmp_table_size值控制的默认是16M，如果超过这个值就采用磁盘临时表了

group by优化方案

临时表会消耗内存，甚至是磁盘。所以在生产环境上我们哟啊尽量避免这种内部临时表的产生，针对上面说的group by有如下几点优化建议

通过索引

上面的例子，我们可以创建一个id%10的字段,然后根据这个字段group by

alter table t1 add column z int generated always as(id % 100), add index(z);

select z, count(*) as c from t1 group by z;

执行计划会告诉这你这条语句走的是索引z

直接排序

原理是，因为mysql会默认走临时表，但是我们可以通过SQL_BIG_RESULT来提示mysql这个结果集很大，让mysql直接选择磁盘临时表，mysql发现这个临时表是b+tree结构，不如数组方式排序效率高，就不会采用临时表而是至用filesortset了

sql语句如下

1	select SQL_BIG_RESULT id%100 as m, count(*) as c from t1 group by m;

执行计划如下

+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-----------------------------+
| id | select_type | table | partitions | type  | possible_keys | key  | key_len | ref  | rows | filtered | Extra                       |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-----------------------------+
|  1 | SIMPLE      | t1    | NULL       | index | PRIMARY,a     | a    | 5       | NULL | 1000 |   100.00 | Using index; Using filesort |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-----------------------------+

语句执行流程是：

初始化sort_buffer，去顶放入整形字段m
扫秒表t1索引树a,将id%100值放入sort_buffer中
在sort_buffer对m排序
返回数组

总结mysql何时会用临时表呢

如果mysql一边读数据，一边直接获取结果不需要额外的内存保存结果，否则会需要额外内存；
join_buffer是无序数据，sort_buffer是有序数组，临时表是二维表结构
如果执行逻辑需要二维表特性，就优先使用临时表，比如:Union需要唯一约束，groupby需要额外保存计数