Background: In looking at how one of my home machines will perform when used as a server for large files (~100MB) I noticed some strangeness. It appears that when creating files from over the local network, the fastest drive in the system is the drive with the slowest specifications! This is in stark contrast to local file operations where the drive with best specs indeed performs the best.
System specs
Drive information: as reported by hdparm -i:
/dev/hdg:
Model=Maxtor 5T040H4, FwRev=TAH71DP0, SerialNo=T4HED4AC
Config={ Fixed }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=57
BuffType=DualPortCache, BuffSize=2048kB, MaxMultSect=16, MultSect=16
CurCHS=16383/16/63, CurSects=-66060037, LBA=yes, LBAsects=80043264
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5
AdvancedPM=yes: disabled (255)
Drive Supports : ATA/ATAPI-6 T13 1410D revision 0 : ATA-1 ATA-2 ATA-3 ATA-4 ATA-5 ATA-6
/dev/hdh:
Model=IBM-DTTA-351680, FwRev=T51OA73A, SerialNo=WJ0WK147853
Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=34
BuffType=DualPortCache, BuffSize=462kB, MaxMultSect=16, MultSect=16
CurCHS=16383/16/63, CurSects=-66060037, LBA=yes, LBAsects=33022080
IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: sdma0 sdma1 sdma2 mdma0 mdma1 mdma2 udma0 udma1 *udma2
AdvancedPM=no
Drive Supports : ATA/ATAPI-4 T13 1153D revision 17 : ATA-1 ATA-2 ATA-3 ATA-4
Local benchmarks: To quickly demonstrate that the Maxtor drive outperforms the IBM drive for local file operations, here are some Bonnie++ benchmarks. I'll include some other local benchmarks below.
IBM drive (reiserfs)
--------------------
Version 1.01 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
xxxxxxxxx.xxxxxx 1G 8471 48 8471 6 3207 1 9114 46 9641 3 80.7 0
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 11214 61 +++++ +++ 22693 99 17564 99 +++++ +++ 18709 100
xxxxxxxxx.xxxxxxxxx.xxx,1G,8471,48,8471,6,3207,1,9114,46,9641,3,80.7,0,16,11214,61,+++++,+++,22693,99,17564,99,+++++,+++,18709,100
Maxtor drive (reiserfs)
-----------------------
Version 1.01 ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
xxxxxxxxx.xxxxxx 1G 17020 98 32809 26 11831 7 14672 77 32479 12 115.1 0
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 16775 100 +++++ +++ 21890 100 16195 99 +++++ +++ 17937 99
xxxxxxxxx.xxxxxxxxx.xxx,1G,17020,98,32809,26,11831,7,14672,77,32479,12,115.1,0,16,16775,100,+++++,+++,21890,100,16195,99,+++++,+++,17937,99
The following benchmarks were obtained by querying /proc/stat at 0.5 second intervals for the duration of the test...
Test 1: scp a 150MB file from another machine on the network to this machine. Cipher was blowfish. For this particular test, the connection was initiated by the remote machine (ie. similar to an ftp 'put' operation). I repeated the test by initiating the connection from this machine (similar to ftp 'get') with similar results.
![]() |
![]() |
![]() |
![]() |
Test 2: scp a 150MB file from/to localhost. If the test involved scp'ing to the Maxtor drive, then the "from" version was stored on the IBM drive. And vice versa (that way the same drive isn't used for reading and writing). These results would suggest that the slowness in Test #1 is somehow due to network activity.
![]() |
![]() |
Test 3: scp a 150MB file from/to localhost while pingflooding the machine. This test generates lots of NIC interrupts. So if the slowness is due to network activity, this should show it. It doesn't. The pingflood generates about 2.4MB/sec network traffic and results in a CPU load average of about 40% on this machine. From these benchmarks, we see that the scp operation is CPU-bound.
![]() |
![]() |
![]() |
![]() |
Test 4: ftp the 150MB file. If the slowness is somehow related to SSH, then FTP should not have the problem. It turns out that I seem to see the same slowness for any network app (scp, ftp, even Samba). Note the similarity between these graphs and those in Test #1.
![]() |
![]() |
![]() |
![]() |
Conclusions: The tests indicate that the Maxtor drive, while being the fastest drive for local operations, is for some reason very slow for network operations. Why?
Theory #1: A reasonable assumption might be that the problem is related to network activity in general (perhaps IRQ conflicts or somesuch). However, test #3 seems to suggest this isn't the case. In that test, we have very robust network activity yet the Maxtor drive does not experience the same sort of slowdown seen in the other tests.
Theory #2: Another reasonable guess might be that the Maxtor drive is not a good candidate for journalled filesystems. Perhaps it does not handle simultaneously updating the journal and writing to the data file whereas perhaps the IBM drive does this very well. But if this were the case, isn't it also reasonable to expect that local benchmarks would suffer similarly? Tests #2 and #3 show that they do not.
I think I posted to the L-K mailing list that these drives are each "master" on their own controller. This was incorrect. They share a controller with the Maxtor drive being designated the "master."
Theory #3: the IBM and Maxtor drives are somehow stomping on each other but only during network operations (??). I repeated the tests with the IBM drive completely disconnected. No performance differences were noted.
I'm open to ideas/explanations. Anybody have any?
Update: It was suggested that I knock the Maxtor drive down to UDMA2 and retry the tests. Bingo. The drive now performs on-par with the IBM drive. Now, I'm willing to accept that you can't/shouldn't mix UDMA modes in a master-slave setup. But I'm very curious why I didn't see this kind of performance problem locally and more importantly, why the problem continued to exist after I disconnected the IBM drive (leaving only the Maxtor). What's the difference, as far as filesystem activity is concerned, between test #1 and test #2? Could network packet size be playing a role here?