Debian Bug report logs - #726731
dump: Huge RAM usage on restore

version graph

Package: dump; Maintainer for dump is Alexander Zangerl <[email protected]>; Source for dump is src:dump (PTS, buildd, popcon).

Reported by: John Goerzen <[email protected]>

Date: Fri, 18 Oct 2013 13:45:01 UTC

Severity: important

Found in version dump/0.4b44-1

Reply or subscribe to this bug.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to [email protected], Debian QA Group <[email protected]>:
Bug#726731; Package dump. (Fri, 18 Oct 2013 13:45:05 GMT) (full text, mbox, link).


Acknowledgement sent to John Goerzen <[email protected]>:
New Bug report received and forwarded. Copy sent to Debian QA Group <[email protected]>. (Fri, 18 Oct 2013 13:45:06 GMT) (full text, mbox, link).


Message #5 received at [email protected] (full text, mbox, reply):

From: John Goerzen <[email protected]>
To: Debian Bug Tracking System <[email protected]>
Subject: dump: Huge RAM usage on restore
Date: Fri, 18 Oct 2013 08:08:10 -0500
Package: dump
Version: 0.4b44-1
Severity: important

Hello,

I am running a restore, and here is output from top:

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND                                                                      
11070 root      20   0 7549m 4.2g  284 D   6.3 53.7 831:32.08 restore                                                                       
The only reason the RSS is so much lower than the VSS is that part of
it has swapped out.

Here's what happens.

On restore, it reads the first 2.7GB or so of the backup, by that
point allocating around 800MB of RAM.  It then spends a long time
creating all the directories in the backup set, and as it does so the
RAM usage gradually increases to many GBs.  Once it starts creating
files, the RAM is up north of 5GB.  As it extracts the dump, the RAM
continues to climb.  I had to move the restore process to a different
machine than the server from which it was made, because that system
had only (!) 4GB RAM.  Extracting over NFS is working, so far, but
this system has 8GB RAM and the restore is only about 2/3 done at this
point.

The dump in question was made from a filesystem containing 1.8TB of
BackupPC data across about 24 million inodes.  BackupPC works with a
hardlink farm, and every backup has a directory skeleton created
(though only the "full" backups have files hardlinked into the storage
pool).

I didn't specifically watch while dump was running, but I would have
noticed if it tried to allocated 8GB RAM.

In addition, restore filled up /tmp because it tried to put 2.8GB
there, causing issues with other programs running on the system.
(Worked around with -T)

I think some people may face a situation where a backup is
unrestoreable because the restore process demands a far beefier system
than the backup process does!


-- System Information:
Debian Release: 7.1
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 3.2.0-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages dump depends on:
ii  e2fslibs      1.42.5-1.1
ii  libblkid1     2.20.1-5.3
ii  libc6         2.13-38
ii  libcomerr2    1.42.5-1.1
ii  libncurses5   5.9-10
ii  libreadline6  6.2+dfsg-0.1
ii  libselinux1   2.1.9-5
ii  libuuid1      2.20.1-5.3
ii  tar           1.26+dfsg-0.1

dump recommends no packages.

dump suggests no packages.

-- no debconf information



Information forwarded to [email protected], Alexander Zangerl <[email protected]>:
Bug#726731; Package dump. (Fri, 30 Jun 2017 03:15:02 GMT) (full text, mbox, link).


Acknowledgement sent to Elliott Mitchell <[email protected]>:
Extra info received and forwarded to list. Copy sent to Alexander Zangerl <[email protected]>. (Fri, 30 Jun 2017 03:15:02 GMT) (full text, mbox, link).


Message #10 received at [email protected] (full text, mbox, reply):

From: Elliott Mitchell <[email protected]>
To: [email protected]
Subject: #726731: dump: Huge RAM usage on restore
Date: Thu, 29 Jun 2017 19:52:20 -0700
Perhaps there should be a caution about filesystems with large numbers
of i-nodes.  Notice the numbers provided are just under 330 bytes for
every i-node.

The current `restore` program no longer acts as the traditional 4.4BSD
`restore` did.  Instead of restoring a near-exact image of the filesystem
to a clean filesystem, it has to remap each file to a new i-node.  I
think this behavior is a *vast* improvement, but it means large numbers
of i-nodes result in large memory consumption during restore.

In order to perform this task, `restore` has to generate a huge table to
map old i-node numbers to new filenames.  330 bytes per i-node isn't too
bad as far as this goes.  Perhaps some optimization can be done, but with
this many i-nodes you're simply bumping into a problem of how small can a
hash-table or tree be yet still perform the needed function.

(geeze, memory and processor power are so cheap nowadays...)


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         [email protected]  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445





Information forwarded to [email protected], Alexander Zangerl <[email protected]>:
Bug#726731; Package dump. (Tue, 18 Jul 2017 04:00:03 GMT) (full text, mbox, link).


Acknowledgement sent to Elliott Mitchell <[email protected]>:
Extra info received and forwarded to list. Copy sent to Alexander Zangerl <[email protected]>. (Tue, 18 Jul 2017 04:00:03 GMT) (full text, mbox, link).


Message #15 received at [email protected] (full text, mbox, reply):

From: Elliott Mitchell <[email protected]>
To: [email protected]
Subject: #726731: dump: Huge RAM usage on restore
Date: Mon, 17 Jul 2017 20:55:04 -0700
I should mention a reasonable alternative method.

I /think/ it should be reasonable to replace restoresymtable with a
small directory tree.  The first level of directories corresponding to
the first digit of a restored file, then inside each directory include a
hard link to the new file.

Say if i-nodes 0001, 0002, 1000, 1001, and 2134 the files might be:
0/001
0/002
1/000
1/001
2/134

When the file replacing i-node 0001 was created, the link to whatever
file was newly created is added to the FS.  Depending upon the number of
i-nodes a few levels of directories might be needed.  The point is to
replace the gigantic symbol table hash which needs to fit in memory, with
a directory tree which the new filesystem can hopefully handle reasonably
efficiently.  Directories would need symbolic links instead of hard
links.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         [email protected]  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445





Information forwarded to [email protected], Alexander Zangerl <[email protected]>:
Bug#726731; Package dump. (Thu, 24 Aug 2017 14:09:03 GMT) (full text, mbox, link).


Acknowledgement sent to Alexander Zangerl <[email protected]>:
Extra info received and forwarded to list. Copy sent to Alexander Zangerl <[email protected]>. (Thu, 24 Aug 2017 14:09:03 GMT) (full text, mbox, link).


Message #20 received at [email protected] (full text, mbox, reply):

From: Alexander Zangerl <[email protected]>
To: Elliott Mitchell <[email protected]>, [email protected]
Subject: Re: Bug#726731: #726731: dump: Huge RAM usage on restore
Date: Thu, 24 Aug 2017 23:52:26 +1000
[Message part 1 (text/plain, inline)]
On Mon, 17 Jul 2017 20:55:04 -0700, Elliott Mitchell writes:
>I should mention a reasonable alternative method.

i'm not sure i'd want to go to a dir tree for this; if
one just wants to handle level 0 backups then my understanding
is that the restoresymtable could be skipped completely.

there's no code for such a skip at this time, but i suspect that
that would be a cheap remedy - at least for full backups.

>I /think/ it should be reasonable to replace restoresymtable with a
>small directory tree.

right now i don't see why the symtable cannot be dumped incrementally
instead of being held in memory. 

regards
az


-- 
Alexander Zangerl + GPG Key 2FCCF66BB963BD5F + http://snafu.priv.at/
"It's a pity that punched card equipment is now almost all gone. There's
nothing better for grabbing a tie and breaking the wearer's neck."  
 -- Mike Andrews 
[signature.asc (application/pgp-signature, inline)]

Information forwarded to [email protected], Alexander Zangerl <[email protected]>:
Bug#726731; Package dump. (Thu, 24 Aug 2017 19:30:03 GMT) (full text, mbox, link).


Acknowledgement sent to Elliott Mitchell <[email protected]>:
Extra info received and forwarded to list. Copy sent to Alexander Zangerl <[email protected]>. (Thu, 24 Aug 2017 19:30:03 GMT) (full text, mbox, link).


Message #25 received at [email protected] (full text, mbox, reply):

From: Elliott Mitchell <[email protected]>
To: Alexander Zangerl <[email protected]>
Cc: [email protected]
Subject: Re: Bug#726731: #726731: dump: Huge RAM usage on restore
Date: Thu, 24 Aug 2017 12:10:55 -0700
On Thu, Aug 24, 2017 at 11:52:26PM +1000, Alexander Zangerl wrote:
> On Mon, 17 Jul 2017 20:55:04 -0700, Elliott Mitchell writes:
> >I should mention a reasonable alternative method.
> 
> i'm not sure i'd want to go to a dir tree for this; if
> one just wants to handle level 0 backups then my understanding
> is that the restoresymtable could be skipped completely.
> 
> there's no code for such a skip at this time, but i suspect that
> that would be a cheap remedy - at least for full backups.

That would be mode -x or -X which extracts files, rather than than
attempting to do a full restore.  That certainly has benefits if you're
merely trying to retrieve copies of file from a dump.  Since the original
bug specifically mentioned "restore", implying mode -r that doesn't sound
like an acceptable resolution (though possibly acceptable to the original
reporter).


> >I /think/ it should be reasonable to replace restoresymtable with a
> >small directory tree.
> 
> right now i don't see why the symtable cannot be dumped incrementally
> instead of being held in memory. 

If the i-nodes in a dump can be *guaranteed* to be in-order then such
should be possible.  I'm guessing right now either hash table(s) or a
tree is being used, since an old i-node number needs to be mapped to a
new i-node number/file.  If they're guaranteed in-order then that can be
optimized to a list-type structure with a provision for skipping ahead
quickly.

I'm pretty sure e2dumpfs upholds this guarantee, but do *all* dump
implementations uphold this guarantee? (ideally `restore` would be able
to handle foreign dumps)

Speaking of which, I'm inclined to suggest `dump` should have a -t option
similar to the -t option of `mount`.  Alas -t for `restore` has already
been allocated for a behavior like `tar`'s -t behavior.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         [email protected]  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445





Send a report that this bug log contains spam.


Debian bug tracking system administrator <[email protected]>. Last modified: Thu May 15 06:45:23 2025; Machine Name: bembo

Debian Bug tracking system

Debbugs is free software and licensed under the terms of the GNU General Public License version 2. The current version can be obtained from https://bugs.debian.org/debbugs-source/.

Copyright © 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson, 2005-2017 Don Armstrong, and many other contributors.