Lies, damned lies, and statistics - a practical demonstration of how table statistics make or break performance
I recently discovered a process that I didn't recognise which had been running for nearly an hour. With my current employer, the fact that it was a job I didn't recognise wasn't the bit that phased me (I've got used to that kind of thing happening here), but the fact that it had been running for so long was worrying.
I checked with the dev team responsible for it, and they told me that it "ran in 15m in development", which is a vastly underpowered VM compared to the physical production server, so the fact that it had been running an hour was cause for concern.
I checked the process was actually running and doing work, I checked that it wasn't blocking anything else, and that it looked like it would (probably, eventually) finish at some point.
I checked the execution plan (ugh, it was nasty) and compared it to the one on the development server. The two were completely different. This prompted me to take a look at the table statistics, and there were none! It's no surprise really that the process wasn't running efficiently.
So there we have the root of the problem. We implemented a new process, loading new data into a new table and didn't gather any statistics on the data, leaving the optimiser blind and making the default assumptions about the data in that table, which were clearly wrong.
After some discussion with the dev team, I was able to kill off that problem session, gather stats on that table with DBMS_STATS.GATHER_TABLE_STATS(), and then re-run the process.
It finished in minutes.
The take-home here should be that if you create a new table or load a new data set into a table, you should gather statistics on the table. It won't take long (probably), and it will help prevent poor performance.
I checked with the dev team responsible for it, and they told me that it "ran in 15m in development", which is a vastly underpowered VM compared to the physical production server, so the fact that it had been running an hour was cause for concern.
I checked the process was actually running and doing work, I checked that it wasn't blocking anything else, and that it looked like it would (probably, eventually) finish at some point.
I checked the execution plan (ugh, it was nasty) and compared it to the one on the development server. The two were completely different. This prompted me to take a look at the table statistics, and there were none! It's no surprise really that the process wasn't running efficiently.
So there we have the root of the problem. We implemented a new process, loading new data into a new table and didn't gather any statistics on the data, leaving the optimiser blind and making the default assumptions about the data in that table, which were clearly wrong.
After some discussion with the dev team, I was able to kill off that problem session, gather stats on that table with DBMS_STATS.GATHER_TABLE_STATS(), and then re-run the process.
It finished in minutes.
The take-home here should be that if you create a new table or load a new data set into a table, you should gather statistics on the table. It won't take long (probably), and it will help prevent poor performance.
Comments
Post a Comment