Thursday, October 31, 2019

Enable High Performance, Block Multi Queue-IO Scheduler under Ubuntu 16.04 Desktop

Compared to Single Queue-IO Schedulers (eg. cfq, deadline etc), we can gain a high IO performance improvment through MultiQueue IO Schedulers (eg bfq, kyber). These are the new IO Schedulers built in to Linux Kernal 4.12+, but disabled until 5.0. It uses multiple IO Queue (leveraging CPU cores) to provide a high responsive system. These are specificaly designed for Desktop OS. Its architecture has been detailed here and here.

Enabling the same On Ubuntu 16.04 has been mentiond below:

1. Add both mq-deadline, bfq and kyber-iosched, to /etc/modules, to load the modules on startup

2. Append, scsi_mod.use_blk_mq=1 to the GRUB_CMDLINE_LINUX parameter in /etc/default/grub file

3. Create /etc/udev/rules.d/60-scheduler.rules, to assign bfq and kyber to block devices

eg:

# set bfq scheduler for non-rotating disks | SDD
ACTION=="add|change", KERNEL=="sd[a-z]", TEST!="queue/rotational", ATTR{queue/scheduler}="bfq"
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="bfq"

# set kyber scheduler for rotating disks | HDD
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="1", ATTR{queue/scheduler}="kyber"

4. Update grub using “sudo update-grub2” and reboot

5. Now your SSD would be using bfq scheduler, and HDD using kyber scheduler. Enjoy your high responsive system !


References:

https://www.omgubuntu.co.uk/2017/07/linux-kernel-4-12-released-bfq

https://www.thomas-krenn.com/en/wiki/Linux_Multi-Queue_Block_IO_Queueing_Mechanism_(blk-mq)

https://lwn.net/Articles/767987/

https://lwn.net/Articles/784267/

https://wiki.ubuntu.com/Kernel/Reference/IOSchedulers

https://www.stephenrlang.com/2018/01/io-scheduler-tuning/

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/7.2_release_notes/storage

https://www.hecticgeek.com/2016/09/supercharge-ubuntu-16-04-lts-xanmod-kernel/

https://www.cnx-software.com/2019/08/14/bfq-budget-fair-queuing-i-o-scheduler-improves-linux-systems-responsiveness-video/

https://unix.stackexchange.com/questions/375600/how-to-enable-and-use-the-bfq-scheduler

https://wiki.debian.org/SSDOptimization#Low-Latency_IO-Scheduler

Virtualization to outperform Bare Metal performance? Simulating a RAM Cache, RAID Controller in a Desktop System

In general you could get a near native performance on Qemu-KVM following the instructions, specified in this article. 

Now what if you could get performance levels that can outperform the host system?

In Qemu-KVM, there are options to enable the same, which will provide you performance, more than the Bare Metal system, by the use of aggressive caching.

Note: This inherently has the risk of data loss, in case of a host failure or crash, hence not recommended for production workloads.

This tweaks can be applied to a POC/Development VM, by which you can complete your tasks much faster than before. To overcome data loss/OS corruption, its always recommended that you snapshot your Virtual disks before enabling this mode. So in case of any issues, you could revert back to your previous working image.

Below are the performance level, we’ve achieved with/without the Virtual Disk performance options. We’ve seen 60-70% performance gain on an average, once the system is up and running.

Windows7 VM

Normal (Seconds)

With IO Tweaks (Seconds)

System Reboots

57

13

Opening Visual Studio 2013

8

3

System Shutdown

21

9

 

The tests are comleted on a Windows7 VM with the below configuration:

This would be a great option for development and Virtual machines for POCs, typical usecase under Desktops/Laptops. The settings are related to IO mode on the virtual disks, and which extensively use caching, and most of the writes to disk, will be kept in cache, until there is sufficient data. Once it has, it will write them to disk in one go. This is “Analogous to RAID controller with RAM cache”, and KVM brings this feature to a normal desktop/laptop without any major efforts

 


During the initial startup, performance levels remain at par, however the performance will steadily move up, once you’ve the system up and running and more data has been cached in host overtime.

This level of performance may not be achievable from a Windows Desktop (Running on Bare Metal), as it would periodically synchronize IO with the underlying disks, which will hinder the performance for high data reliability.

You could get additional preformance boost, if you use the “raw” storage format, instead of “qcow2”

To know more, read through the below posts:

QEMU Disk IO performance comparison: Native or threads?

VIRTUALIZATION TUNING AND OPTIMIZATION GUIDE

Virtual IO CACHING

How to improve Windows perfomance when running inside KVM

What is the memory module on a RAID card needed for?