SELECT col1, col2, ..., aggregate_function(col_name) FROM table_name WHERE condition GROUP BY col1, col2, ...;其中,col1, col2, ...是要分组的列名,aggregate_function是用于聚合数据的函数,如SUM,AVG, MAX, MIN等。table_name是要从中检索数据的表的名称,condition是可选的查询条件。
SELECT column1, column2, COUNT(*) FROM table_name WHERE condition GROUP BY column1, column2 ORDER BY column1, column2;在这个示例中,选择了column1和column2两列,并对它们进行了分组。使用COUNT(*)函数来计算每个组中的行数。使用ORDER BY子句按column1和column2升序排序结果集。
# 堆代码 duidaima.com mysql> select version(); +-----------+ | version() | +-----------+ | 8.0.18 | +-----------+ 1 row in set (0.06 sec) mysql> select @@global.sql_mode; +-----------------------------------------------------------------------------------------------------------------------+ | @@global.sql_mode | +-----------------------------------------------------------------------------------------------------------------------+ | ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE, +-----------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.18 sec)在这种模式下执行 SQL 会报下面的错误
mysql> select * from user group by age; 1055 - Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column xxx which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by可以通过下面两种方式解决:
mysql> select age, any_value(id) from `user` GROUP BY age; +-----+---------------+ | age | any_value(id) | +-----+---------------+ | 3 | 0 | | 6 | 3 | | 7 | 5 | | 12 | 1 | | 14 | 2 | | 19 | 7 | +-----+---------------+ 6 rows in set (0.15 sec) mysql> select age, group_concat(id) from `user` GROUP BY age; +-----+------------------+ | age | group_concat(id) | +-----+------------------+ | 3 | 0,4 | | 6 | 3 | | 7 | 5 | | 12 | 1 | | 14 | 2 | | 19 | 7 | +-----+------------------+ 6 rows in set (0.05 sec)不同版本的排序
mysql> show create table user; +-------+---------------------------------+ CREATE TABLE `user` ( `id` int(11) NOT NULL, `name` varchar(255) DEFAULT NULL , `age` int(255) DEFAULT NULL, PRIMARY KEY (`id`), KEY `nameIndex` (`name`) USING BTREE ) ENGINE=InnoDB DEFAULT CHARSET=utf8 +-------+---------------------------------+ mysql> select * from user; +----+------+-----+ | id | name | age | +----+------+-----+ | 0 | 陈 | 3 | | 1 | 李 | 12 | | 2 | 张 | 14 | | 3 | 陈 | 6 | | 4 | 李 | 3 | | 5 | NULL | 7 | | 7 | 张 | 19 | +----+------+-----+ 7 rows in set (0.06 sec)在MySQL 5.7中
GROUP BY column_name ORDER BY NULL然而,在MySQL 8.0中,GROUP BY默认不会使用排序功能,除非使用了ORDER BY语句。
explain select age,count(age) from user where name ='李' GROUP BY age;
Using filesort: 表示需要排序
4.2就将12这一行的count(age)值加1
"filesort_summary":{ "rows":2, "examined_rows":2, "number_of_tmp_files":0, "sort_buffer_size":320, "sort_mode":"<sort_key, rowid>" }临时表
mysql> show variables like '%tmp_table_size%'; +----------------+---------+ | Variable_name | Value | +----------------+---------+ | tmp_table_size | 2097152 | +----------------+---------+ 1 row in set (0.04 sec)磁盘临时表
mysql> show variables like '%internal_tmp_disk_storage_engine%'; +----------------------------------+--------+ | Variable_name | Value | +----------------------------------+--------+ | internal_tmp_disk_storage_engine | InnoDB | +----------------------------------+--------+ 1 row in set (0.04 sec)为了复现生成磁盘临时表,把 tmp_table_size设置小一点,通过查Created_tmp_disk_tables值,查看对应的磁盘临时表数量
mysql> set tmp_table_size=1; select age,count(age) from user where name ='李' GROUP BY age ORDER BY age ; show status like '%Created_tmp%'; Query OK, 0 rows affected (0.02 sec) +-----+------------+ | age | count(age) | +-----+------------+ | 3 | 1 | | 12 | 1 | +-----+------------+ 2 rows in set (0.03 sec) +-------------------------+-------+ | Variable_name | Value | +-------------------------+-------+ | Created_tmp_disk_tables | 3 | | Created_tmp_files | 60 | | Created_tmp_tables | 6 | +-------------------------+-------+ 3 rows in set (0.04 sec)Created_tmp_tables:在内存中创建内部临时表时或在磁盘,服务器将递增此值。
Created_tmp_disk_tables / Created_tmp_tables * 100% <= 25%分组优化
SELECT SQL_BIG_RESULT col1, col2 FROM my_table GROUP BY col1;需要注意的是,使用SQL_BIG_RESULT会增加服务器的内存和CPU使用量,因此应该仔细评估是否需要使用它。通常情况下,只有在处理大型数据集时才需要使用。