mysql - MySQL pushes CPU to 100% when including subquery in query
问题描述
Thanks in advance for your help.
My server suddenly went to 100% CPU usage. After turning off all scripts and gradually reinstating them, I have traced the problem to one specific type of query that is used in two places. The queries are:
select * from zzproductdata
where amazonproductid <> '' and amazoncategory1 ='' and disabled = 0
and productid in (select productid from zzdropshipstock where quantity >= 10)
And:
select 'ZZ Product Data With Image' as 'Statistic', count(zzproductdataid) as
'Number' from zzproductdata
where imagescollecteddate <> '0000-00-00' and zzproductdata.productid in
(select productid from zzdropshipstock where quantity >= 80)
Both queries contain a sub-query and both deal with a table called zzproductdata.
A couple things checked: 1. Other queries containing sub-queries continue to work. 2. Other queries containing zzproductdata continue to work, as long as they do not contain a sub-query.
When the above two queries run, the CPU gradually increases to 100% over about 90 seconds. It remains at 100% and does not seem to recover - I allowed it to run like this for 3 hours and it did not stop or complete the query.
Table zzproductdata has gradually been collecting data and has increased to 1,500,000 records over the last 12 days. I monitor the amount of time each query takes to run within the database, and these two queries have been gradually getting longer but not excessively so - they have gone from 2.5 seconds to 5.6 seconds.
All of a sudden they have just started going to 100% CPU with no warning. In mysqladmin processlist in terminal I see two records - one showing the process in 'Sleep' status, and one saying 'Sending Data' status.
I suspect that the queries have just hit some kind of sql cache limit, but I cannot work out which cache limit is used by mysql for using sub-queries. I have tried changing some settings in mysql but none of them seem to have an effect. Any suggestions?
EDIT
CREATE TABLE `zzproductdata` (
`zzproductdataid` int(11) NOT NULL AUTO_INCREMENT,
`zzproductdataname` text NOT NULL,
`disabled` int(11) NOT NULL,
`datecreated` date NOT NULL,
`masteronly` int(11) NOT NULL,
`productid` int(11) NOT NULL,
`isbn` text NOT NULL,
`ean` text NOT NULL,
`publishername` text NOT NULL,
`imagethumbnail` text NOT NULL,
`imagefull` text NOT NULL,
`amazonproductid` text NOT NULL,
`productdatasource` text NOT NULL,
`datelastupdatedamazon` date NOT NULL,
`datelastupdatedgoogle` date NOT NULL,
`googleproductid` text NOT NULL,
`publicationdate` date NOT NULL,
`binding` text NOT NULL,
`imagescollecteddate` date NOT NULL,
`amazoncategory1` text NOT NULL,
`amazoncategory2` text NOT NULL,
`amazoncategory3` text NOT NULL,
`datelastcheckedamazoncategory` date NOT NULL,
`datelastupdatedopenlibrary` date NOT NULL,
PRIMARY KEY (`zzproductdataid`)
)
ENGINE=InnoDB
AUTO_INCREMENT=1674296
DEFAULT CHARSET=utf
CREATE TABLE `zzdropshipstock` (
`zzdropshipstockid` int(11) NOT NULL AUTO_INCREMENT,
`zzdropshipstockname` text NOT NULL,
`disabled` int(11) NOT NULL,
`datecreated` date NOT NULL,
`masteronly` int(11) NOT NULL,
`zzdropshipsupplierid` int(11) NOT NULL,
`isbn` varchar(13) NOT NULL,
`quantity` double NOT NULL,
`suppliercode` text NOT NULL,
`supplierprice` double NOT NULL,
`standardshipcost` double NOT NULL,
`weightgram` double NOT NULL,
`rrp` double NOT NULL,
`productid` int(11) NOT NULL,
`initialimportdate` date NOT NULL,
`lastupdateddate` date NOT NULL,
`changed` int(11) NOT NULL,
`lastcheckedproductiddate` date NOT NULL,
`lastcheckedproductinamazondate` date NOT NULL,
`lastcheckedrawstockdata` date NOT NULL,
`lastcheckedproductstockitemlive` date NOT NULL,
`changedquantity` int(11) NOT NULL,
PRIMARY KEY (`zzdropshipstockid`),
KEY `isbn` (`isbn`)
)
ENGINE=InnoDB
AUTO_INCREMENT=7037817
DEFAULT CHARSET=latin1
Using EXPLAIN on the first query - I get this:
This is what I see in Terminal:
I have tried adding DISTINCT but it had no effect. It slowed down the escalation of CPU usage, so it took almost 200 seconds to get to 100% CPU usage, but it did not run the query.
The first part of the query returns 1,300,000 records in 0.23 seconds. The subquery returns 118,000 records in 2.3 seconds. This is when you run the two parts separately. Combining the two into one query, should currently return about 15,000 records.
解决方案
I'm not entirely sure how MySQL handle IN
vs EXISTS
, but broadly speaking you should preference EXISTS
over IN
as the EXISTS
clause returns true on the first match, whereas IN
may have to create the full set before it determines if there is a match - this could lead to high memory usage. I'm not sure though in the case of MySQL. I know this problem existed in MSSQL 2005 and possibly 2008, and I wouldn't be surprised if the same problem exists even in modern versions of MySQL, given it's not as polished as some enterprise grade RDMSs.
Spiel over, try this code to see if it improves things:
select *
from zzproductdata as pd
where amazonproductid <> ''
and amazoncategory1 =''
and disabled = 0
and exists (
select *
from zzdropshipstock as dss
where quantity >= 10
and pd.productid = dss.productid
)
推荐阅读
- powershell - 在外部窗口中操作 Afx OLE 控件
- sql - 如何使 MSSQL 忽略用户正在给予的价值中的“'”?
- javascript - argMax 将数组作为轴做什么?
- excel - 无法在 vba 中激活变量分配表
- angular - 星云切换在生产模式下不起作用
- bash - 等待 bash 中至少一项工作的结束
- javascript - 为什么 Exchange 身份令牌应该是安全的?
- sql - 从 TVF 执行 select 语句作为变量
- python - 如何在 Matplotlib 上的两个数据点之间绘制一条水平线?
- python - TensorFlow 中特征列的相关性