Subject: /sbin/badblocks: badblocks -c <huge-number> will not speed up completion time
Date: Fri, 06 Nov 2009 11:31:25 -0500
Package: e2fsprogs
Version: 1.41.3-1
Severity: wishlist
File: /sbin/badblocks
I use 'badblocks' to test new disks before I store data on them. The
last time I used it, I was setting up a raid1 array on two 500GB disks.
The process took something like 13 hours. (I had an unreasonable
expectation that it would take much less time! ;)
At that time, a couple of years ago, I looked at the man page and
noticed the -c option... and made a mental note to experiment with it
next time to see if it could speed up the process.
This week, I bought two new 1.5TB drives. Being a noob, and making
false assumptions about how 'badblocks' works (and having made no
attempt to look at the source code), I opened up 2 shell windows in
emacs and ran these commands in separate shells:
badblocks -c 524288 -v -w /dev/sdc
badblocks -c 524288 -v -w /dev/sdd
The intention was to let each instance of 'badblocks' write and read
512MB of data in bursts, hopefully taking advantage of economies of
scale to speed up completion time. Since my cheap little "server"
machine here has 2GB of RAM, I thought I might as well put that RAM to
good use!
Everything started out fine. The 'top' command showed that each
instance of 'badblocks' had acquired 512 MB of RAM, and were percolating
along barely using any CPU resources at all. So I went to bed.
When I got up, 'top' said this:
top - 08:55:22 up 8:13, 2 users, load average: 3.00, 2.85, 2.76
Tasks: 84 total, 1 running, 83 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.0%us, 1.7%sy, 0.0%ni, 50.6%id, 46.5%wa, 0.1%hi, 0.0%si, 0.0%st
Mem: 1927304k total, 1912440k used, 14864k free, 340k buffers
Swap: 1951888k total, 872560k used, 1079328k free, 2620k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2563 root 20 0 1035m 1.0g 56 D 26 54.4 15:16.01 badblocks
2562 root 20 0 1035m 743m 56 D 2 39.5 8:42.62 badblocks
[...]
I was running out of RAM and hitting the swap file.
I admit disappointment, but I was only testing my idea about
'badblocks -c' to see what would happen. Each instance had finished
with the first test pattern, so I decided that the disks were probably
good enough and I quit running 'badblocks'.
Instead of reporting an (alleged) bug in 'badblocks' -- which I doubt
that this is, since I'm probably abusing the program for purposes for
which it was not designed -- I would like to request either:
- a README file discussion of appropriate expectations of run time for
badblocks on modern, very large hard disks; and why abusing "-b"
and/or "-c" is not likely to lessen the pain of waiting.
- or, at least a blurb in the 'badblocks' man page (and "--help" text)
in the "-w" section pointing out that abusing "-c" won't help speed
things up.
I found a couple of archived Debian BTS reports touching on this:
- #7173 was closed without comment
- #232240 has a good discussion of why "-c" cannot be expected to
improve performance, and that IDE drives on the same channel make
running 2 instances of 'badblocks' at once unhelpful. (SATA
overcomes the latter issue, of course.)
It would help the uninitiated (such as myself) if some of that #232240
discussion appeared where an interested user might expect to find it,
instead of buried in archived BTS bug reports.
Thanks,
Dave W.
-- System Information:
Debian Release: 5.0.3
APT prefers stable
APT policy: (500, 'stable')
Architecture: amd64 (x86_64)
Kernel: Linux 2.6.31-0git.090921.fileserver.uvesafb (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Versions of packages e2fsprogs depends on:
ii e2fslibs 1.41.3-1 ext2 filesystem libraries
ii libblkid1 1.41.3-1 block device id library
ii libc6 2.7-18 GNU C Library: Shared libraries
ii libcomerr2 1.41.3-1 common error description library
ii libss2 1.41.3-1 command-line interface parsing lib
ii libuuid1 1.41.3-1 universally unique id library
e2fsprogs recommends no packages.
Versions of packages e2fsprogs suggests:
pn e2fsck-static <none> (no description available)
pn gpart <none> (no description available)
pn parted <none> (no description available)
-- no debconf information
# time badblocks -w -t 0x00 /dev/sdb3
real 1m58.691s
user 0m17.671s
sys 0m6.318s
# time badblocks -b 4096 -w -t 0x00 /dev/sdb3
real 1m47.225s
user 0m16.859s
sys 0m3.272s
# time badblocks -c 256 -w -t 0x00 /dev/sdb3
real 1m50.223s
user 0m16.102s
sys 0m3.283s
Using a larger blocksize, or larger number of blocks speeds up the test by
about 10%. Going larger than that had little effect of time, but reduced
cpu usage a bit.
-Ariel
Acknowledgement sent
to Matt Taggart <[email protected]>:
Extra info received and forwarded to list. Copy sent to Theodore Y. Ts'o <[email protected]>.
(Mon, 10 Jul 2017 22:21:03 GMT) (full text, mbox, link).
#554794 concerns the time it takes to run badblocks for any particular
value of the -c option (count of blocks to do at once).
At the time (2009) it wasn't clear if larger values of -c improved runtime,
although one user (in 2011) reports 10% improvement.
The current -c (block count) default is 64 blocks.
The current -b (block size) default is 1024 bytes.
AFAICT the last time they were increased was 2004 (#232240, -c from 16 to
64).
A related bug (#764636) when running badblocks on a 14tb device was solved
by switching to -b 4096.
Given the device size increases and RAM/CPU improvements since all these
things occurred, is there any value to increasing the defaults? Does anyone
have any data? If not then what tests would be valuable?
I often run many badblocks instances at once on separate SATA devices (so
no bus contention), what are the optimal settings for that case?
Thanks,
--
Matt Taggart
[email protected]
Acknowledgement sent
to Theodore Ts'o <[email protected]>:
Extra info received and forwarded to list. Copy sent to Theodore Y. Ts'o <[email protected]>.
(Tue, 11 Jul 2017 14:27:03 GMT) (full text, mbox, link).
On Mon, Jul 10, 2017 at 03:19:28PM -0700, Matt Taggart wrote:
> Given the device size increases and RAM/CPU improvements since all these
> things occurred, is there any value to increasing the defaults? Does anyone
> have any data? If not then what tests would be valuable?
>
> I often run many badblocks instances at once on separate SATA devices (so
> no bus contention), what are the optimal settings for that case?
I have to ask --- ***why*** are you (and other people) running
badblocks in 2017? Badblocks as a thing started becoming irrelevant
for e2fsprogs's purpose sometime around 2003-2005, when SATA was
introduced, and drive electronics were smart enough that they could be
largely counted upon to do automatic bad block redirection in the
drive.
Also, drives have gotten large enough that no matter what kind of
optimizations we could make to badblocks, the cost benefit ratio of
using badblocks went negative a long, long, ***LONG*** time ago.
As a result, I personally don't do much maintenance to badblocks,
since as I have free time, there are far more important things to
worry about than trying to optimize (or even provide I18N support, or
other crazy things people have asked for) in this program. As always,
however, patches will always be gratefully accepted for review....
- Ted
P.S. Yes, I have considered removing badblocks from e2fsprogs
altogether. The main reason why I haven't is that it's a small (28k),
mostly harmless, and inoffensive binary. Given the average increase in
bloat of, say, most of the other binaries in Debian, it hasn't even
been worth the effort to deprecate it....
Acknowledgement sent
to Matt Taggart <[email protected]>:
Extra info received and forwarded to list. Copy sent to Theodore Y. Ts'o <[email protected]>.
(Tue, 11 Jul 2017 17:39:03 GMT) (full text, mbox, link).
Theodore Ts'o writes:
> I have to ask --- ***why*** are you (and other people) running
> badblocks in 2017? Badblocks as a thing started becoming irrelevant
> for e2fsprogs's purpose sometime around 2003-2005, when SATA was
> introduced, and drive electronics were smart enough that they could be
> largely counted upon to do automatic bad block redirection in the
> drive.
Personally I use it for a few things:
1) as a way of forcing the drive to test every block and force to to
internally reallocate any sectors that are marginal _before_ the drive
is in production. The SMART tests are supposed to do this, but they
are opaque and up to the vendor to implement correctly. If I use
badblocks -w I know each (O/S visible) block gets tested 5 times.
2) as a way of exercising/burning-in the mechanism to avoid deploying a
drive that is likely to fail. I time the badblock run and if the time
diverges significantly from other drives of the same model, I know
something is wrong. As a side benefit it exercises other components in
the path, io controller, ram, cpu. The SMART tests should also work
for this, but again it's hard to measure.
(side note, I remember ~2000 someone (VA Linux Systems? joeyh?
cerberus?) having a tool that did burn-in on their servers by running
cpuburn in parallel with a bunch of copies of badblocks running on the
(then SCSI) drives.)
3) as a cheap and easy way to wipe data from drives. Using -w with it's
5 different patterns is a good way of ensuring the data is
unrecoverable before reusing/recycling the drive.
If you know of better options for these tasks I'm happy to switch to
something other than badblocks.
Thanks,
--
Matt Taggart
[email protected]
Debbugs is free software and licensed under the terms of the GNU General
Public License version 2. The current version can be obtained
from https://bugs.debian.org/debbugs-source/.