Beruflich Dokumente
Kultur Dokumente
Most customers use UFS buffered filesystems and ZFS already performs better than UFS buffered!. Since want to test performance, and we want ZFS to be super-fast, we decided to compare ZFS with UFS direction. We noticed that UFS Direction performs better than what we get with with ZFS out-of-the-box. With ZFS, not only was the throughput much lower, but we used more twice the amount of CPU per transaction, and we are doing 2x times the IO. The disks are also more heavily utilized. We noticed that we were not only reading in more data, but we were also doing more IO operations that what is needed. A little bit of dtracing quickly revealed that these reads were originating from the write code path! More dtracing showed that these are level 0 blocks, and are being read-in for the read-modify-write cycle. This lead us to the FIRST recommendation
Match the database block size (db_block_size) with ZFS record size (recordsize).
A look at the DBMS statistics showed that "log file sync" was one of the biggest wait events. Since the log files were in the same filesystem as the data, we noticed higher latency for log file writes. We then created a different filesystem (in the same pool), but set the record size to 128K as log writes are typically large. We noticed a slight improvement in our numbers, but not the dramatic improvement we we wanted to achieve. We then created a separate pool and used that pool for the database log files. We got quite a big boost in performance. This performance boost could be attributed to the decrease in the write latency. Latency of database log writes is critical for OLTP performance. When we used one pool, the extra IOs to the disks increased the latency of the database log writes, and thus impacted performance. Moving the logs to a dedicated pool improved the latency of the writes, giving a performance boost. This leads us to our SECOND recommendation
If you have a write heavy workload, you are better off by separating the log files on a separate pool
Looking at the extra IO being generated by ZFS, we noticed that the reads from disk were 64K in size. This was puzzling as the ZFS recordsize is 8K. More dtracing, and we figured out that the vdev_cache (or software track buffer) reads in quite a bit more than what we request. The default size of the read is 64k (8x more than what we request). Not surprisingly, the ZFS team is aware of this, and there are quite a few change requests (CR) on this issue 4933977: vdev_cache could be smarter about prefetching 6437054: vdev_cache: wise up or die 6457709: vdev_knob values should be determined dynamically Tuning the vdev_cache to read in only 8K at a time decreased the amount of extra IO by a big factor, and more importantly improved the latency of the reads too. This leads to our THIRD recommendation
Let ZFS auto-tune. It knows best. In cases were tuning helps, expect ZFS to incorporate that fix in future releases of ZFS
In conclusion, you can improve out-of-the-box performance of databases with ZFS by doing simple things. We have demonstrated that it is possible to run high-throughput workloads with current release of ZFS. We have also shown that it is quite possible to get huge improvements in performance for databases in future versions of ZFS. Given the fact that ZFS is around a year old, this is amazing!!