So, i got persuaded to switch from a “server that is going to do everything” to “compute server + storage server”
The two are connected via a DAC on an intel x520 network card.
Compute is 10.0.0.1, Storage is 10.255.255.254 and i left the usable hosts in the middle for future expansion.
Before I start to use it, I’m wondering if i chose the right protocols to share data between them.
I set NFS and iSCSI.
With iSCSI i create an image, share that image on the compute server, format it as btrfs, use it as a native drive. Files are not accessible anywhere else.
With NFS i just mount the share and files can be accessed from another computer.
Speed:
I tried to time how long it takes to fill a dummy file with zeroes.
/iscsi# time sh -c "dd if=/dev/zero of=ddfile bs=8k count=250000 && sync"
250000+0 records in
250000+0 records out
2048000000 bytes (2.0 GB, 1.9 GiB) copied, 0.88393 s, 2.3 GB/s
real 0m2.796s
user 0m0.051s
sys 0m0.915s
/nfs# time sh -c "dd if=/dev/zero of=ddfile bs=8k count=250000 && sync"
250000+0 records in
250000+0 records out
2048000000 bytes (2.0 GB, 1.9 GiB) copied, 2.41414 s, 848 MB/s
real 0m3.539s
user 0m0.038s
sys 0m1.453s
/sata-smr-wd-green-drive-for-fun# time sh -c "dd if=/dev/zero of=ddfile bs=8k count=250000 && sync"
250000+0 records in
250000+0 records out
2048000000 bytes (2.0 GB, 1.9 GiB) copied, 10.1339 s, 202 MB/s
real 0m46.885s
user 0m0.132s
sys 0m2.423s
what i see from this results:
the sata slow drive goes at 1.6 gigabit/s but then for some reason the computer needs so much time to acknowledge the operation.
nfs transferred it at 6.8 gigabit/s which is what i expected from a nvme array. Same command on the storage server gives similar speed.
iscsi transfers at 18.4 gigabit/s which is not possible with my drives and the fiber connection. Probably is using some native file system trickery to detect “it’s just a file full of zeroes, just tell the user it’s done”
The biggest advantage of NFS is that I can share a whole directory and get direct access. Also sharing another disk image via iscsi requires a service restart which means i have to take down the compute server.
But with iscsi i am the owner of the disk so i can do whatever i want, don’t need to worry about permissions, i am root, chown all the stuff
So… after this long introduction and explanation, what protocol would you use for…:
-
/var/lib/mysql - a database. Inside a disk image shared via iscsi or via nfs?
-
virtual machine images. Copy them inside another image that’s then shared via iscsi? Maybe nfs is much better for this case. Otherwise with iscsi i would have a single giant disk image that contains other disk images…
-
lots of small files like WordPress. Maybe nfs would add too much overhead? But it would be much easier to backup if it was an NFS share instead of a disk image
I haven’t ever run an iSCSI setup, but…
I don’t know what your application is, but if you’re planning on running a MySQL database on this, I can imagine that a throughput test isn’t going to be representative of your performance, since latency may matter a lot and throughput not so much. You may want to specifically test that.
ponders
I would guess that iSCSI probably exposes write barriers. That is, btrfs can say “all writes prior to this point must become durable before writes subsequent to this point”, without actually requiring that any data is committed to the disk at the time that the write barrier is issued.
But I believe that the Linux file API has a more-limited set of ways in which it can provide ordering without durability. There’s no
fwritebarrier()
, justfsync()
, and that forces a change to become durable.Depending upon how MySQL works, that might have a significant impact on performance.
Also, NFSv3, which I assume you are using, has behavior around locking and caching that differs from NFSv4 and I don’t know for sure how it will interact with something like MySQL, which may care a lot about precise write ordering behavior.
Disk images will also rely on write ordering to avoid corruption on power loss.
googles
Yeah.
https://dev.mysql.com/doc/refman/8.2/en/disk-issues.html
That’s kind of hand-wavy, but it does reinforce my concern about sticking a MySQL database on the thing.
I don’t have an answer for you as to which to use – it’s been a while since I’ve worked on network filesystem stuff, and I’m kinda shaking loose rusty bits trying to recall this – but in general I would be a little concerned about data integrity of both disk images and MySQL databases stored over a network. One can build a system that does it correctly, but I would try to do what I can to research potential issues there.
I would also probably test your actual workload if you’re concerned about performance, because it may differ a lot from what a simple throughput test might suggest for those uses.
yes after more thought, database is much better on iscsi. I can just create a 10gb image and share that. And getting backups from daily ZFS snapshots