After replacing my 4x 1000 MBit link aggregation with ASUS XG-C100C networking (10 GBit via Cat5e cables :D), I was astonished how slow data transfer took place. There were Several pitfalls which I debugged and here are some important points:
Before you debug your network, make sure, you solve the local problems first!
Linux GPL-only exports and ZFS CDDL module
For license reasons, kernel FPU functions are no longer exported for modules whose licenses differ from GPL. This led to an extreme performance drop in ZFS encryption, etc…
( factor 6 slower on some test equipment ). There are patches available for private usage which violate kernel GPL license, which is okay if not distributed and for research, you know 😉 However, I used an older kernel and I was not hit at all by this license quarrel. So this was not the problem for my bad transfer speed.
You debug this as follows:
look if fpu functions are global symbols:
grep fpu_begin /proc/kallsyms
0000000000000000 T __kernel_fpu_begin
0000000000000000 T kernel_fpu_begin
0000000000000000 r __ksymtab___kernel_fpu_begin
0000000000000000 r __ksymtab_kernel_fpu_begin
0000000000000000 r __kstrtab_kernel_fpu_begin
0000000000000000 r __kstrtab___kernel_fpu_begin
capital T tells global symbol is there
cat /sys/module/zcommon/parameters/zfs_fletcher_4_impl
[fastest] scalar superscalar superscalar4 sse2 ssse3
cat /sys/module/icp/parameters/icp_aes_impl
cycle [fastest] generic x86_64 aesni
SSH is slow, don’t use it
If you send an encrypted stream, what sense does it make to encrypt it further by SSH? None! And SSH is quite slow – talking bout 90 MB/s on my machine with zfs send -w | ssh target zfs receive … It’s better to use netcat.
netcat is not netcat
There exist two versions of netcat. One is gnu-netcat the other is openbsd-netcat. The latter supports TCP windowing. Without this feature, I got around 88 MB/s, so even less than with SSH or at least not better. With openbsd-netcat however, I got 194 MB/s, which is approx the read speed of my spinning HDD I read from. (Yes, ZFS on a single disk… I know, I know).
A very handy debug method is ‚yes‘.
yes | netcat target port
Or classic with ‚dd‘:
netcat -l -p 4444 | dd of=/dev/null dd if=/dev/zero | netcat localhost 4444 2772497408 bytes (2,8 GB, 2,6 GiB) copied, 5,52469 s, 502 MB/s
So as you can see, I cannot exceed 4 GBit locally, so I will never reach 10 Gbit by network. I have to buy faster hardware to do so. As long as the limiting factor is from the disks, I am fine with it.