近期在Review项目代码时,发现同事们在查询MySQL行数时存在多样的方式,有的使用COUNT(1), 有的用COUNT(id), 还有人选择了COUNT()。这混杂的选择引发了我的思考。当然这三种count的方式也是众说纷纭,其中最大的分歧点就是COUNT()和COUNT(1)查询性能上,有人觉得COUNT(*)需要转换为COUNT(1),所以COUNT(1)得速度更快。究竟这三种计数方式之间有何区别,它们的背后原理是怎样的呢?
COUNT()含义
在《高性能Mysql》一书第236页中是这么解释COUNT的作用的:
COUNT()是一个特殊的函数,有两种非常不同的作用:它可以统计某个列值的数量,也可以统计行数。在统计列值时要求列值是非空的(不统计NULL)。如果在COUNT()的括号中指定了列或者列的表达式,则统计的就是这个表达式有值的结果数”。因为很多人对 NULL理解有问题,所以这里很容易产生误解。
COUNT()的另一个作用是统计结果集的行数。当MySOL确认括号内的表达式值不可能为空时,实际上就是在统计行数。最简单的就是当我们使用COUNT()的时候,这种情况下通配符并不会像我们猜想的那样扩展成所有的列,实际上,它会忽略所有的列而直接统计所有的行数。
我们发现一个最常见的错误就是,在括号内指定了一个列却希望统计结果集的行数。如果希望知道的是结果集的行数,最好使用COUNT(*),这样写意义清晰,性能也会很好
由此我们也可以大概总结COUNT函数的种方式如下:
区别 1、Mysql5.7
在中是这么介绍COUNT(expr)函数的
COUNT(expr)
Returns a count of the number of non-NULL values of expr in the rows retrieved by a SELECT statement. The result is a BIGINT value.
If there are no matching rows, COUNT() returns 0.
mysql> SELECT student.student_name,COUNT(*)
FROM student,course
WHERE student.student_id=course.student_id
GROUP BY student_name;
COUNT(*) is somewhat different in that it returns a count of the number of rows retrieved, whether or not they contain NULL values.
For transactional storage engines such as InnoDB, storing an exact row count is problematic. Multiple transactions may be occurring at the same time, each of which may affect the count.
InnoDB does not keep an internal count of rows in a table because concurrent transactions might “see” different numbers of rows at the same time. Consequently, SELECT COUNT(*) statements only count rows visible to the current transaction.
Prior to MySQL 5.7.18, InnoDB processes SELECT COUNT() statements by scanning the clustered index. As of MySQL 5.7.18, InnoDB processes SELECT COUNT() statements by traversing the smallest available secondary index unless an index or optimizer hint directs the optimizer to use a different index. If a secondary index is not present, the clustered index is scanned.
Processing SELECT COUNT(*) statements takes some time if index records are not entirely in the buffer pool. For a faster count, create a counter table and let your application update it according to the inserts and deletes it does. However, this method may not scale well in situations where thousands of concurrent transactions are initiating updates to the same counter table. If an approximate row count is sufficient, use SHOW TABLE STATUS.
InnoDB handles SELECT COUNT(*) and SELECT COUNT(1) operations in the same way. There is no performance difference.
For MyISAM tables, COUNT(*) is optimized to return very quickly if the SELECT retrieves from one table, no other columns are retrieved, and there is no WHERE clause. For example:
mysql> SELECT COUNT(*) FROM student;
This optimization only applies to MyISAM tables, because an exact row count is stored for this storage engine and can be accessed very quickly. COUNT(1) is only subject to the same optimization if the first column is defined as NOT NULL.
从官方文档中我们可以看出mysql官方对COUNT函数的解释:
2、Mysql 8.0
在中对COUNT(expr)的解释是这样
COUNT(expr) [over_clause]
Returns a count of the number of non-NULL values of expr in the rows retrieved by a SELECT statement. The result is a BIGINT value.
If there are no matching rows, COUNT() returns 0.COUNT(NULL)returns 0.
This function executes as a window function if over_clause is present. over_clause is as described in Section 12.20.2, “Window Function Concepts and Syntax”.
mysql> SELECT student.student_name,COUNT(*) FROM student,course WHERE student.student_id=course.student_id GROUP BY student_name;
COUNT(*) is somewhat different in that it returns a count of the number of rows retrieved, whether or not they contain NULL values.
For transactional storage engines such as InnoDB, storing an exact row count is problematic. Multiple transactions may be occurring at the same time, each of which may affect the count.
InnoDB does not keep an internal count of rows in a table because concurrent transactions might “see” different numbers of rows at the same time. Consequently, SELECT COUNT(*) statements only count rows visible to the current transaction.
As of MySQL 8.0.13, SELECT COUNT(*) FROM tbl_name query performance for InnoDB tables is optimized for single-threaded workloads if there are no extra clauses such as WHERE or GROUP BY.
InnoDB processes SELECT COUNT() statements by traversing the smallest available secondary index unless an index or optimizer hint directs the optimizer to use a different index. If a secondary index is not present, InnoDB processes SELECT COUNT() statements by scanning the clustered index.
Processing SELECT COUNT(*) statements takes some time if index records are not entirely in the buffer pool. For a faster count, create a counter table and let your application update it according to the inserts and deletes it does. However, this method may not scale well in situations where thousands of concurrent transactions are initiating updates to the same counter table. If an approximate row count is sufficient, use SHOW TABLE STATUS.
InnoDB handles SELECT COUNT(*) and SELECT COUNT(1) operations in the same way. There is no performance difference.
For MyISAM tables, COUNT(*)is optimized to return very quickly if the SELECT retrieves from one table, no other columns are retrieved, and there is no WHERE clause. For example:
mysql> SELECT COUNT(*) FROM student;
This optimization only applies to MyISAM tables, because an exact row count is stored for this storage engine and can be accessed very quickly. COUNT(1) is only subject to the same optimization if the first column is defined as NOT NULL.
从mysql8.0的文档中我们可以看出mysql8.0对COUNT的解释
处理 SELECT COUNT(*) 语句可能花费一些时间,如果索引记录没有完全在缓冲池中。为了更快的计数,可以创建一个计数器表,让应用程序根据插入和删除操作进行更新。但是,这种方法在数千个并发事务同时对同一计数器表进行更新的情况下可能不会很好地扩展。如果粗略的行数足够,可以使用 SHOW TABLE STATUS。
InnoDB 对待 SELECT COUNT(*) 和 SELECT COUNT(1) 操作的方式相同,没有性能差异。
对于 MyISAM 表,COUNT(*) 在从一个表中检索、没有检索其他列、没有 WHERE 子句的情况下可以快速返回,因为 MyISAM 存储了准确的行数。COUNT(1) 只有在第一列被定义为 NOT NULL 时才能进行相同的优化。
结合Mysql5.7与Mysql8.0的文档我们可以看出两个版本对COUNT的支持的差异:
MySQL 8.0 在性能优化方面对于 InnoDB 表的 SELECT COUNT(*) 查询进行了特别的关注,而且在 MyISAM 表的优化方面进行了详细的说明。其他方面,两个版本在COUNT()函数的解释和使用上基本保持一致。
基于此我们明白,其实COUNT(*)和COUNT(1)其实是一样的,在性能上并没有差异。
那这两种方式与COUNT(列)的差异呢?从以上《高性能Mysql》以及Mysql官方文档中我们知道,COUNT(列)是统计非空列的行数,它也会遍历整张表,然后会对列对应的值做非空判断,非空的字段进行个数累加。当然这是列为主键索引时的操作。如果列不为主键索引时,那么查询时还需要进行回表操作,再根据主键获取数据,此时无疑是增加了一次IO,在性能上其实是不如COUNT()和COUNT(1)的。那么我们就可以知道,按照效率来看,count() = count(1) > count(主键) > count(非主键列)。
使用建议
篇幅有限,深入验证将在后续文章中介绍。
本文已收录于我的个人博客:码农Academy的博客,专注分享Java技术干货,包括Java基础、Spring Boot、Spring Cloud、Mysql、Redis、Elasticsearch、中间件、架构设计、面试题、程序员攻略等