首页 > 解决方案 > MySQL pushes CPU to 100% when including subquery in query

问题描述

Thanks in advance for your help.

My server suddenly went to 100% CPU usage. After turning off all scripts and gradually reinstating them, I have traced the problem to one specific type of query that is used in two places. The queries are:

select * from zzproductdata
where amazonproductid <> '' and amazoncategory1 ='' and disabled = 0
and productid in (select productid from zzdropshipstock where quantity >= 10)

And:

select 'ZZ Product Data With Image' as 'Statistic', count(zzproductdataid) as 
'Number' from zzproductdata
where imagescollecteddate <> '0000-00-00' and zzproductdata.productid in 
(select productid from zzdropshipstock where quantity >= 80)

Both queries contain a sub-query and both deal with a table called zzproductdata.

A couple things checked: 1. Other queries containing sub-queries continue to work. 2. Other queries containing zzproductdata continue to work, as long as they do not contain a sub-query.

When the above two queries run, the CPU gradually increases to 100% over about 90 seconds. It remains at 100% and does not seem to recover - I allowed it to run like this for 3 hours and it did not stop or complete the query.

Table zzproductdata has gradually been collecting data and has increased to 1,500,000 records over the last 12 days. I monitor the amount of time each query takes to run within the database, and these two queries have been gradually getting longer but not excessively so - they have gone from 2.5 seconds to 5.6 seconds.

All of a sudden they have just started going to 100% CPU with no warning. In mysqladmin processlist in terminal I see two records - one showing the process in 'Sleep' status, and one saying 'Sending Data' status.

I suspect that the queries have just hit some kind of sql cache limit, but I cannot work out which cache limit is used by mysql for using sub-queries. I have tried changing some settings in mysql but none of them seem to have an effect. Any suggestions?

EDIT

CREATE TABLE `zzproductdata` (
 `zzproductdataid` int(11) NOT NULL AUTO_INCREMENT,
 `zzproductdataname` text NOT NULL,
 `disabled` int(11) NOT NULL,
 `datecreated` date NOT NULL,
 `masteronly` int(11) NOT NULL,
 `productid` int(11) NOT NULL,
 `isbn` text NOT NULL,
 `ean` text NOT NULL,
 `publishername` text NOT NULL,
 `imagethumbnail` text NOT NULL,
 `imagefull` text NOT NULL,
 `amazonproductid` text NOT NULL,
 `productdatasource` text NOT NULL,
 `datelastupdatedamazon` date NOT NULL,
 `datelastupdatedgoogle` date NOT NULL,
 `googleproductid` text NOT NULL,
 `publicationdate` date NOT NULL,
 `binding` text NOT NULL,
 `imagescollecteddate` date NOT NULL,
 `amazoncategory1` text NOT NULL,
 `amazoncategory2` text NOT NULL,
 `amazoncategory3` text NOT NULL,
 `datelastcheckedamazoncategory` date NOT NULL,
 `datelastupdatedopenlibrary` date NOT NULL,
 PRIMARY KEY (`zzproductdataid`)
)
ENGINE=InnoDB
AUTO_INCREMENT=1674296
DEFAULT CHARSET=utf

CREATE TABLE `zzdropshipstock` (
 `zzdropshipstockid` int(11) NOT NULL AUTO_INCREMENT,
 `zzdropshipstockname` text NOT NULL,
 `disabled` int(11) NOT NULL,
 `datecreated` date NOT NULL,
 `masteronly` int(11) NOT NULL,
 `zzdropshipsupplierid` int(11) NOT NULL,
 `isbn` varchar(13) NOT NULL,
 `quantity` double NOT NULL,
 `suppliercode` text NOT NULL,
 `supplierprice` double NOT NULL,
 `standardshipcost` double NOT NULL,
 `weightgram` double NOT NULL,
 `rrp` double NOT NULL,
 `productid` int(11) NOT NULL,
 `initialimportdate` date NOT NULL,
 `lastupdateddate` date NOT NULL,
 `changed` int(11) NOT NULL,
 `lastcheckedproductiddate` date NOT NULL,
 `lastcheckedproductinamazondate` date NOT NULL,
 `lastcheckedrawstockdata` date NOT NULL,
 `lastcheckedproductstockitemlive` date NOT NULL,
 `changedquantity` int(11) NOT NULL,
 PRIMARY KEY (`zzdropshipstockid`),
 KEY `isbn` (`isbn`)
)
ENGINE=InnoDB
AUTO_INCREMENT=7037817
DEFAULT CHARSET=latin1

Using EXPLAIN on the first query - I get this: Explain image

This is what I see in Terminal: Terminal image

I have tried adding DISTINCT but it had no effect. It slowed down the escalation of CPU usage, so it took almost 200 seconds to get to 100% CPU usage, but it did not run the query.

The first part of the query returns 1,300,000 records in 0.23 seconds. The subquery returns 118,000 records in 2.3 seconds. This is when you run the two parts separately. Combining the two into one query, should currently return about 15,000 records.

标签: mysql

解决方案


I'm not entirely sure how MySQL handle IN vs EXISTS, but broadly speaking you should preference EXISTS over IN as the EXISTS clause returns true on the first match, whereas IN may have to create the full set before it determines if there is a match - this could lead to high memory usage. I'm not sure though in the case of MySQL. I know this problem existed in MSSQL 2005 and possibly 2008, and I wouldn't be surprised if the same problem exists even in modern versions of MySQL, given it's not as polished as some enterprise grade RDMSs.

Spiel over, try this code to see if it improves things:

select *
from zzproductdata as pd
where amazonproductid <> ''
and amazoncategory1 =''
and disabled = 0
and exists (
    select *
    from zzdropshipstock as dss
    where quantity >= 10
    and pd.productid = dss.productid
)

推荐阅读