Dar Documentation

DAR's - Frequently Asked Questions

Questions:

Answers:

I restore/save all files but dar reported some files have been ignored, what are those ignored files?

When restoring/saving, all files are considered by default. But if you specify some files to restore or save, all other files are "ignored", this is the case when using -P -X -I, -g -[ or -] options

Dar hangs when using it with pipes, why?

Dar can produce backups on its standard output, if you give '-' as basename. But it cannot read a backup from its standard input in direct access mode. To feed a backup to dar through pipes, you either need dar_slave and two pipes or use the sequential mode (--sequential-mode option, which gives slow restoration of a few files compared to the (default) direct access mode). To use dar with dar_slave over pipes in direct access mode (which is the  more efficient way to proceed), see the detailed notes or more precisely dar and ssh note.

Why, when I restore 1 file, dar report 3 files have been restored?

if you restore for example the file usr/bin/emacs dar will first restore usr (if the directory already exists, it will get its date and ownership restored, all existing files in that directory will however stay preserved), then /usr/bin will be restored, and last usr/bin/emacs will be restored. Thus 3 inodes have been restored or modified while only one file has been asked for restoration.

While compiling dar I get the following message: g++: /lib/libattr.a: No such file or directory, what can I do?

The problem comes from an incoherence in your distro (Redhat and Slackware seem(ed) concerned at least): Dar (Libtool) finds /usr/lib/gcc-lib/i386-redhat-linux/3.3.3/../../../libattr.la file to link with. This file defines where is located libattr static and dynamic libraries but in this file both static and dynamic libraries are expected to be found under /lib. While the dynamic libattr is there, the static version has been moved to /usr/lib. A workaround is to make a symbolic link:

ln -s /usr/lib/libattr.a /lib/libattr.a
I cannot find the binary package for my distro, where to look for?

For any binary package, ask your distro maintainer to include dar (if not already done), and check on the web site of your preferred distro for a dar package

Can I use different filters between a full backup and a differential backup? Would not dar consider some file not included in the filter to be deleted?

Yes, you can. No, there is no risk to have dar deleting the files that were not selected for the differential backup. Here is the way dar works:

During a backup process, when a file is ignored due to filter exclusion, an "ignored" entry is added to the catalogue. At the end of the backup, dar compares both catalogues, the one of reference and the new one built during the backup process, and adds a "detruit" entry (which means "destroyed" in French), when an entry of the reference is not present in the new catalogue. Thus, if an "ignored" is present no "detruit" will be added for that name. Then all "ignored" entries are removed and the catalogue is written at the end of the backup or archive.

Once in action, dar makes all the system slower and slower, then it stops with the message "killed"! How to overcome this problem?

Dar needs virtual memory to work. Virtual memory is the RAM + SWAP space. Dar memory requirement grows with the amount of file saved, not with the amount of data saved. If you have a few huge files you will have little chance to see any memory limitation problem. At the opposite, saving a plethora of files (either big or small), will make dar request an increasing amount of virtual memory. Dar needs this memory to build the catalogue (the contents) of the backup it creates. Same thing, for differential backup, except it also needs to load in memory the catalogue of the backup of reference, which most of the time will make dar using twice more memory when doing a differential backup than a full backup.

Anyway, the solution is:

  1. Read the limitatons file to understand the problem and be aware of the limitations you will bring at step 3, bellow.
  2. If you can, add swap space to your system (under Linux, you can either add a swap partition or a swap file, which is less constraining but also a bit less efficient). Bob Barry provided a script that can give you a raw estimation of the required virtual memory (doc/samples/dar_rqck.bash), it was working well with dar 2.2.x but since then and the newly added features, the amount of metadata per file is variable: The memory requirement per file also depends on the presence and amount of Extended Attributes and Filesystem specific attributes, which changes from file to file.
  3. If this is not enough, or if you don't want/cannot add swap space, recompile dar giving --enable-mode=64 argument to the configure script. Note that since release 2.6.x this is the default compilation mode, thus you should be good now.
  4. If this is not enough, and you have some money, you can add some RAM on you system
  5. If all that fails, ask for support on the dar-support mailing-list.

Last, there is always the workaround to make several smaller backups of the files to save. For example, making a backup for all that is in /usr/local, another one for all that is in /var and so on. These backups can be full or differential. The drawback is not big as you can store these backups side by side and use them at will. Moreover, you can feed a unique dar_manager database with all these different backups which will hide you the fact that there are several full and several differential backups concerning different set of files.

I have a backup I want to change the size of slices?

dar_xform is your friend!

dar_xform -s <size> original_backup new_backup

dar_xform will create a new backup file with the slices of the requested size, (you can also make use of -S option for the first slice). Note that you don't need to decrypt the backup, not dar will uncompress it, this is thus a very fast processing. See dar_xform man page for more.

I have a backup in one slice, how can I split it in several slices?

dar_xform is your friend!

see just above.

I have a backup in several slice, how can I stick all them in a single file?

dar_xform is your friend!

dar_xform original_backup new_backup

dar_xform without -s option creates a single sliced backup. See dar_xform man page for more.

I have a backup, how can I change its encryption scheme?

The merging feature let you do that. The merging has two roles, putting in one backup the contents of two different backups, and at the same time filtering out some files you decided not to include into the resulting backup. The merging feature can take two but also only one backup as input. This is what we will use and without any filter to keep all saved files.

dar -+ new_backup -A original_backup -K "<new_algo>:new pass" -ak

If you don't want to have password in clear on the command line (command that can be seen for example with top or ps by other users), simply provide "<algo>:" then dar will ask you on the fly for the password. If using blowfish you can then just provide ":" for the keys. Note that before release 2.5.0, -J option was needed to provide the password of the source backup. Since then, without -J option, dar will ask interactively for the password of the backup to read. You can still use -J option to provide the password from a DCF file and this way avoid dar interactively asking for it.

Note that you can also change slicing of the backup at the same time thanks to -s and -S options:

dar -+ new_backup -A original_backup -K ":" -ak -s 1G
I have a backup, how can I change its compression algorithm?

Same thing as above: we will use the merging feature:

to use bzip2 compression:

dar -+ new_backup -A original_backup -zbzip2

to use gzip compression

dar -+ new_backup -A original_backup -zgzip

to use lzo compression, use -zlzo, for LZ4 use -zlz4, for zstd use -lzstd and so on.

To use no compression at all, do no add any -z option or exclude all files from compression (-Z "*"):

dar -+ new_backup -A original_backup

Note that you can also change encryption scheme and slicing at the same time you change compression:

dar -+ new_backup -A original_backup -zbzip2 -K ":" -J ":" -s 1G
Which options can I use with which options?

DAR provides seven commands:

-c to create a new backup
-x to extract files from a given backup
-l to list the contents of a given backup
-d to compare the contents of a backup with filesystem
-t to test the internal coherence of a given backup
-C to isolate a backup (extract its contents to a usually small file) or make a snapshot of the current filesystem
-+ to merge two backups in one or create a sub backup from one or two other ones
-y to repair a backup

For each command listed above, here follows the available options (those marked OK):

short option long option -c -x -l -d -t -C -+ -y
-v --verbose OK OK OK OK OK OK OK OK
-vs --verbose=s OK OK -- OK OK -- OK OK
-b --beep OK OK OK OK OK OK OK OK
-n --no-overwrite OK OK -- -- -- OK OK OK
-w --no-warn OK OK -- -- -- OK OK OK
-wa --no-warn=all -- OK -- -- -- -- -- --
-A --ref OK OK -- OK OK OK OK OK
-R --fs-root OK OK -- OK -- -- -- --
-X --exclude OK OK OK OK OK -- OK --
-I --include OK OK OK OK OK -- OK --
-P --prune OK OK OK OK OK -- OK --
-g --go-into OK OK OK OK OK -- OK --
-] --exclude-from-file OK OK OK OK OK -- OK --
-[ --include-from-file OK OK OK OK OK -- OK --
-u --exclude-ea OK OK -- -- -- -- OK --
-U --include-ea OK OK -- -- -- -- OK --
-i --input OK OK OK OK OK OK OK --
-o --output OK OK OK OK OK OK OK --
-O --comparison-field OK OK -- OK -- -- -- --
-H --hour OK OK -- -- -- -- -- --
-E --execute OK OK OK OK OK OK OK OK
-F --ref-execute OK -- -- -- -- OK OK OK
-K --key OK OK OK OK OK OK OK OK
-J --ref-key OK -- -- -- -- OK OK OK
-# --crypto-block OK OK OK OK OK OK OK OK
-* --ref-crypto-block OK -- -- -- -- OK OK OK
-B --batch OK OK OK OK OK OK OK OK
-N --noconf OK OK OK OK OK OK OK OK
-e --empty OK -- -- -- -- OK OK OK
-aSI --alter=SI OK OK OK OK OK OK OK OK
-abinary --alter=binary OK OK OK OK OK OK OK OK
-Q OK OK OK OK OK OK OK OK
-aa --alter=atime OK -- -- OK -- -- -- --
-ac --alter=ctime OK -- -- OK -- -- -- --
-am --alter=mask OK OK OK OK OK OK OK --
-an --alter=no-case OK OK OK OK OK OK OK --
-acase --alter=case OK OK OK OK OK OK OK --
-ar --alter=regex OK OK OK OK OK OK OK --
-ag --alter=glob OK OK OK OK OK OK OK --
-z --compression OK -- -- -- -- OK OK --
-s --slice OK -- -- -- -- OK OK OK
-S --first-slice OK -- -- -- -- OK OK OK
-p --pause OK -- -- -- -- OK OK OK
-@ --aux OK -- -- -- -- -- OK --
-$ --aux-key -- -- -- -- -- -- OK --
-~ --aux-execute -- -- -- -- -- -- OK --
-% --aux-crypto-block -- -- -- -- -- -- OK --
-D --empty-dir OK OK -- -- -- -- OK --
-Z --exclude-compression OK -- -- -- -- -- OK --
-Y --include-compression OK -- -- -- -- -- OK --
-m --mincompr OK -- -- -- -- -- OK --
-ak --alter=keep-compressed -- -- -- -- -- -- OK --
-af --alter=fixed-date OK -- -- -- -- -- -- --
--nodump OK -- -- -- -- -- -- --
-M --no-mount-points OK -- -- -- -- -- -- --
-, --cache-directory-tagging OK -- -- -- -- -- -- --
-k --deleted -- OK -- -- -- -- -- --
-r --recent -- OK -- -- -- -- -- --
-f --flat -- OK -- -- -- -- -- --
-ae --alter=erase_ea -- OK -- -- -- -- -- --
-T --list-format -- -- OK -- -- -- -- --
-as --alter=saved -- -- OK -- -- -- -- --
-ad --alter=decremental -- -- -- -- -- -- OK --
-q --quiet OK OK OK OK OK OK OK OK
-/ --overwriting-policy -- OK -- -- -- -- OK --
-< --backup-hook-include OK -- -- -- -- -- -- --
-> --backup-hook-exclude OK -- -- -- -- -- -- --
-= --backup-hook-execute OK -- -- -- -- -- -- --
-ai --alter=ignore-unknown-inode-type OK -- -- -- -- -- -- --
-at --alter=tape-marks OK -- -- -- -- -- OK --
-0 --sequential-read OK OK OK OK OK OK -- --
-; --min-digits OK OK OK OK OK OK OK OK
-1 --sparse-file-min-size OK -- -- -- -- -- OK --
-ah --alter=hole-recheck -- -- -- -- -- -- OK --
-^ --slice-mode OK -- -- -- -- OK OK OK
-_ --retry-on-change OK -- -- -- -- -- -- --
-asecu --alter=secu OK -- -- -- -- -- -- --
-. --user-comment OK -- -- -- -- OK OK --
-3 --hash OK -- -- -- -- OK OK OK
-2 --dirty-behavior -- OK -- -- -- -- -- --
-al --alter=lax -- OK -- -- -- -- -- OK
-alist-ea --alter=list-ea -- -- OK -- -- -- -- --
-4 --fsa-scope OK OK -- OK -- -- OK --
-5 --exclude-by-ra OK -- -- -- -- -- -- --
-7 --sign OK -- -- -- -- OK OK OK
-' --modified-data-detection OK -- -- -- -- -- -- --
-{ --include-delta-sig OK -- OK -- -- OK -- --
-} --exclude-delta-sig OK -- OK -- -- OK -- --
-8 --delta OK -- OK -- -- OK -- --
-6 --delta-sig-min-size OK -- OK -- -- OK -- --
-az --alter=zeroing-negative-dates OK -- -- -- -- -- -- --
-\ --ignored-as-symlink OK -- -- -- -- -- -- --
-T --kdf-param OK -- OK -- -- OK -- --
--aduc --alter=duc OK OK OK OK OK OK OK OK
-G --multi-thread OK OK OK OK OK OK OK OK
-j --network-retry-delay OK OK OK OK OK OK OK --
-afile-auth --alter=file-authentication OK OK OK OK OK OK OK --
-ab --alter=blind-to-signatures OK OK OK OK OK OK OK --
-aheader --alter=header -- -- OK -- -- -- -- --

Why dar reports corruption of the backup I have transfered with FTP?

Dar backups are binary files, they must be transfered in binary mode when using FTP. This is done in the following way for the ftp command-line client :

ftp <somewhere> <login> <password> bin put <file> get <file> bye

If you transfer a backup (or any other binary file) in ascii mode (the opposite of binary mode), the 8th bit of each byte will be lost and the backup will become impossible to recover (due to the destruction of this information). Be very careful to test your backup after transferring back to you host to be sure you can delete the original file.

Why DAR does save UID/GID instead of plain usernames and usergroups?

In each file property is not present the name of the owner nor the name of the group owner, but instead are present two numbers, the user ID and the group ID (UID & GID in short). The /etc/password file associates to these numbers a names and some other properties (like the login shell, the home directory, the password, see also /etc/shadow). Thus, when you do a directory list (with the 'ls' command for example or with any GUI program for another example), the listing application used does open each directory, there it finds a list of name and a inode number associated, then the listing program fetchs the inode attributes for each file and looks among other information for the UID and the GID. To be able to display the real user name and group name, the listing application use a well-defined standard C library call that will do the lookup in /etc/password, eventually NIS system if configured and any other additional system, [this way applications have not to bother with the many system configuration possible, the same API interface is used whatever is the system], then lookup returns the name if it exist and the listing application displays for each file found in a directory the attributes and the user name and group name as returned by the system instead of the UID and GID.

As you can see, the user name and group name are not part of any file attribute, but UID and GID *are* instead. Dar is a backup tool mainly, it does preserve as much as possible the file properties to be able to restore them as close as possible to their original state. Thus a file saved with UID=3 will be restored with UID=3. The name corresponding the UID 3 may exist or not,  may exist and be the same or may exist and be different, the file will be anyway restored in UID 3.

Scenario with dar's way of restoring

Thus, when doing backup and restoration of a crashed system you can be confident, the restoration will not interfere with the bootable system you have used to launch dar to restore your disk. Assuming you have UID 1 labeled 'bin' in your real crashed system, but this UID 1 is labeled 'admin' in the boot system, while UID 2 is labeled 'bin' in this boot system, files owned by bin in the system to restore will be restored under UID 1, not UID 2 which is used by the temporary boot system. At that time after restoration still running the from the boot system, if you do a 'ls' you will see that the original files owned by 'bin' are now owned by user 'admin'.

This is really a mirage: in your restoration you will also restore the /etc/password file and other system configuration files (like NIS configuration files if they have been used), then at reboot time on the newly restored real system, the UID 1 will be backed associated to user 'bin' as expected and files originally owned by user bin will now been listed as owned by bin as expected.

Scenario with plain name way of restoring

If dar had done else, restoring the files owned by 'bin' to the UID corresponding to 'bin', these files would have been given UID 2 (the one used by the temporary bootable system used to launch dar). But once the real restored system would have been launched, this UID 2 would have become some other user and not 'bin' which is mapped to UID 1 in the restored /etc/password.

Now, if you want to change some UID/GID when moving a set of files from one live system to another system, there is no problem if you are not restoring dar under the 'root' account. Other account than 'root' are usually not allowed to modify UID/GID, thus restored files by dar will have group and user ownership of the dar process, which is the one that has launched dar.

But if you really need to move a directory tree containing a set of files with different ownership and you want to preserve these different ownership from one live system to another, while the corresponding UID/GID do not match between the two system, dar can still help you:

Example on how to globally modify ownership of a directory tree user by user

For example, you have on the source system three users: Pierre (UID 100), Paul (UID 101), Jacques (UID 102) but on the destination system, these same users are mapped to different UID: Pierre has UID 101, Paul has UID 102 and Jacques has UID 100.

We temporary need an unused UID on the destination system, we will assume UID 680 is not used. Then after the backup restoration in the directory /tmp/A we will do the following:

find /tmp/A -uid 100 -print -exec chown 680 {} \; find /tmp/A -uid 101 -print -exec chown pierre {} \; find /tmp/A -uid 102 -print -exec chown paul {} \; find /tmp/A -uid 680 -print -exec chown jacques  {} \;

which is:

You can then move the modified files to appropriated destination or make a new dar backup to be restored in appropriated place if you want to use some of dar's feature like for example only restore files that are more recent than those present on filesystem.

Dar_Manager does not accept encrypted backups, how to workaround this?

Yes, that's true, dar_manager does not accept encrypted backups. The first reason is that while dar_manager database cannot be encrypted this is not very fair to add to them encrypted backups. The second reason is because the dar_manager database should hold the key for each encrypted backup making this backup the weakest point in your data security: Breaking the database encryption would then provide access to any encryption key, and with original backup access it would bring access to data of any of the backup added to the database.

To workaround this, you can proceed as follows:

Note that as the database is not encrypted this will expose the backup file listing (not the file's contents) of your encrypted backups to anyone able to read the database, thus it is recommended to set restrictive permission to this database file.

When will come the time to use dar_manager to restore some file, you will have to make dar_manager pass the key to dar for it be able to restore the needed files from the backup. This can be done in several ways: dar_manager's command-line, dar_manager database or dar.dcf file.

  1. dar_manager's command-line: simply pass the -e "-K <key>" to dar_manager . Note that this will expose the key twice: on dar_manager's command-line and on dar's command-line.
  2. dar_manager database: the database can store some constant command to be passed to dar. This is done using the -o option, or the -i option. The -o option exposes the arguments you want to be passed to dar because they are on dar_manager command-line. While the -i option, let you do the same thing but in an interactive manner, this is a better choice.
  3. A better way is to use a DCF file with restrictive permission. This one will receive the '-K <key>' option for dar to be able to read the encrypted backups. And dar_manager will ask dar to read this file thanks to the '-B <filename>' option you will have given either on dar_manager's command-line (-e -B <filename> ...) or from the stored option in the database (-o -B <filename>).
  4. The best way is let dar_manager pass the -K option to dar, but without password : simply passing the -e "-K :" option to dar_manager. When dar will get the -K option with the ":" argument, it will dynamically ask for the password and store it in a secured memory.
How to overcome the lack of static linking on MacOS X?

The answer comes from Dave Vasilevsky in an email to the dar-support mailing-list. I let him explain how to do:

Pure-static executables aren't used on OS X. However, Mac OS X does have other ways to build portable binaries. HOWTO build portable binaries on OS X?

First, you have to make sure that dar only uses operating-system libraries that exist on the oldest version of OS X that you care about. You do this by specifying one of Apple's SDKs, for example:

export CPPFLAGS="-isysroot /Developer/SDKs/MacOSX10.2.8.sdk" export LDFLAGS="-Wl,-syslibroot,/Developer/SDKs/MacOSX10.2.8.sdk"

Second, you have to make sure that any non-system libraries that dar links to are linked in statically. To do this edit dar/src/dar_suite/Makefile, changing LDADD to '../libdar/.libs/libdar.a'. If any other non-system libs are used (such as gettext), change the makefiles so they are also linked in statically. Apple should really give us a way to force the linker to do this automatically!

Some caveats:

Why cannot I test, extract file, list the contents of a given slice from a backup?

Well this is due to dar's design. However you can list a whole backup and see in which slice(s) a file is located:

# dar -l test -Tslice -g etc/passwd Slice(s)|[Data ][D][ EA ][FSA][Compr][S]|Permission| Filemane --------+--------------------------------+----------+----------------------------- 1 [Saved][-] [-L-][ 69%][ ] drwxr-xr-x etc 2 [Saved][ ] [-L-][ 63%][ ] -rw-r--r-- etc/passwd ----- All displayed files have their data in slice range [1-2] ----- #
Why cannot I merge two isolated catalogues?

Since version 2.4.0, isolated catalogues can also be used to rescue an corrupted internal catalogue of the backup it has been isolated from. For that feature be possible, a mecanism let dar know if an given isolated catalogue and a given backup correspond to the same contents. Merging two isolated catalogues would break this feature as the resulting backup would not match any real backup an could only be used as reference for a differential backup.

How to use the full power of my multi-processor computer?

Since release 2.7.0 it is possible to have dar efficiently using many threads at two independent levels:

encryption
You can specify the number of thread to use to cipher/decipher a backup. Note however that during tests done for 2.7.0 validation, it was observed that having more than two threads for encryption does not gives better results than using only two threads when compression is used, because most of the time compression is more CPU intensive than encryption (well all depends on the chosen algorithms, that's right).
compression
Before release 2.7.0 compression was done per file in streaming mode. In this mode to compress data you need to know the result of the compression of the data that is located before it, this brings good compression ratio but is impossible to parallelize. To be able to compress in parallel one need to split data in block, and compress blocks independently. There you can use a lot of threads up to the time when this is the disk I/O that is the slowest process. Adding more compression thread will not change the result. The drawback of compressing per thread is less the compression ratio that is slightly less good than in stream compression mode, than the memory requirement to hold a data block of clear data per thread and the compressed resulting data, times the number of threads. To avoid having any thread waiting for disk I/O, you even have to store a bit more memory block than the number of threads, this is managed by libdar.

To activate multi-threading with dar, use the -G option, read the dar man page for all details about the way to define the number of encryption thread and the number of compression thread, as well as the compression block size to use.

Is libdar thread-safe, which way do you mean it is?

libdar is the part of dar's source code that has been rewritten to be used by external programs (like kdar). It has been modified to be used in a multi-threaded environment, thus, *yes*, libdar is thread-safe. However, thread-safe does not mean that you do not have to take some precautions in your programs while using libdar (or any other library).

Care must thus be taken for two different threads not acting on the same variables/objects at the same time. This is however possible with the use of posix mutex, which would define a portion of code (known as a critical section) that cannot be entered by more than one thread at a time.

A few objects provided by libdar API supports the concurrent access from several threads, read the API documentation for more.

How to solve configure: error: Cannot find size_t type?

This error shows when you lack support for C++ compilation. Check the gcc compiler has been compiled with C++ support activated, or if you are using gcc binary from a distro, double check you have installed the C++ support for gcc.

Why dar became much slower since release 2.4.0?

This is the drawback of new features!

You can disable both of these features, using respectively the options -at option, which suppress "tape marks" (just another name for escape sequences), but does not allow the generated backup to be used in sequential read mode, and -1 0 option, which completely disables the sparse file detection. The execution time becomes back the same as the one of dar 2.3.x releases.

Why dar became yet slower since release 2.5.0?

This is again the drawback of new features!

You can disable both of these features. The first can be disabled at compilation time giving --disable-fadvise to the ./configure script. The second option can be disabled at any time by adding the --fsa-scope=none option to dar. The execution time becomes back then the same as the one of dar 2.4.x releases.

How to search for questions (and their answers) about known problems similar to mines?

Have a look a the dar-support mailing-list archive and if you cannot find any answer to your problem feel free to send an email to this mailing-list describing your problem/need.

Why dar tells me that he failed to open a directory, while I have excluded this directory?

Reading the contents of a directory is done using the usual system call (opendir/readdir/closedir). The first call (opendir) let dar design which directory to inspect, the dar call readdir to get the next entry in the opened directory. Once nothing has to be read, closedir is called. The problem here is that dar cannot start reading a directory do some treatment and start reading another directory. In brief, the opendir/readdir/closedir system call are not re-entrant.

This is in particular critical for dar as it does a depth lookup in the directory tree. In other words, from the root if we have two directories A and B, dar reads A's contents, the contents of its subdirectories, then once finished, it read the next entry of the root directory (which is B), then read the contents of B and then of each of its subdirectories, then once finished for B, it must go back to the root again, and read the next entry. In the meanwhile dar had to open many directories to get their contents.

For this reason dar caches the directory contents (when it first meet a directory, it read its whole content and stores it in the RAM). This is only after, that dar decide whether to include or not a given directory. But at this point then, its contents has already been read thus you may get the message that dar failed to read a given directory contents, while you explicitly specify not to include that particular directory in the backup.

Dar reports a SECURITY WARNING! SUSPICIOUS FILE what does that mean!?

When dar reports the following message:

SECURITY WARNING! SUSPICIOUS FILE <filepath>: ctime changed since backup of reference was done, while no inode or data changed

You should be concerned by finding an explanation to the root cause that triggered dar to ring this alarm. As you probably know, a unix file has three (sometimes four) dates:

  1. atime is changed anytime you read the file's contents or write to it (this is the last access time)
  2. mtime is changed anytime you write to the file's data (this is the last modification time)
  3. ctime is changed anythime ou modify the file's attributes (the is the last change time)
  4. btime is never changed once a file has been created (this is the birth time or creation time), not all filesystem do provide it.

In other words:

Yes, the point is that in most (if not all) unix systems, over the kernel itself, user program can also manually set the atime and mtime manually to any arbitrary value (see the "touch" command for example), but to my knowledge, no system provides a mean to manually set the ctime of a file. This value cannot thus be faked.

However, some rootkits and other nasty programs that tend to hide themselves from the system administrator use this trick and modify the mtime to become more difficult to detect. Thus, the ctime keeps track of the date and time of their infamy. However, ctime may also change while neither mtime nor atime do, in several almost rare but normal situations. Thus, if you are faced to this message, you should first verify the following points before thinking your system has been infected by a rootkit:

How to know atime/mtime/ctime of a file?

Note:
With dar version older than 2.4.0 (by default, unless -aa option is use) once a file has been read for backup, dar set back the atime to the value it had before dar read it. This trick was used to accomodate some programs like leafnode (NNTP caching program) that base their cache purging scheme on the atime of files. When you do a backup using dar 2.3.11 for example, file that had their mtime modified are saved as expected and their atime is set back to their original values (value they had just before dar read them), which has the slide effect to modify the ctime. If then you upgrade to dar 2.4.0 or more recent and do a differential backup, if that same file has not been modified since, dar will see that the ctime has changed while no other metadata did (user, ownership, group, mtime), thus this alarm message will show for all saved files in the last 2.3.11 backup made. The next differential backup made using dar 2.4.0 (or more recent), the problem will not show anymore.

Well, if you cannot find an valid explanation from the one presented above, you'd better consider that your system has been infected by a rootkit or a virus and use all the necessary tools (see below for examples) to find some evidence of it.

Last point, if you can explain the cause of the alarm and are annoyed by it (you have hundred of files concerned for example) you can disable this feature adding the -asecu switch to the command-line.

1 atime may also not be updated at all if filesystem is mounted with relatime or noatime option.

Can dar help copy a large directory tree?

The answer is "yes" and even for more than one reason:

  1. Many backup/copy tools do not take care of hard linked inode (hard linked plain files, named pipes, char devices, block devices, symlinks)... dar does,
  2. Many backup/copy tools do not take care of sparse files... dar does,
  3. Many backup/copy tools do not take care of Extended Attributes... dar does,
  4. Many backup/copy tools do not take care of Posix ACL (Linux)... dar does,
  5. Many backup/copy tools do not take care of file forks (MacOS X)... dar does,
  6. Many backup/copy tools do not take any precautions while working on a live system... dar does.

Using the following command will do the trick without relying on temporary file or backup:

dar -c - -R <srcdir> --retry-on-change 3 -N | dar -x - --sequential-read -N -R <dstdir>

<srcdir> contents will be copied to <dstdir> both must exist before running this command, and <dstdir> should be an empty dir.

Here is an example: we will copy the content of /home/my to /home2/my. first we create the destination directory, then we run dar

mkdir /home2/my dar -c - -R /home/my --retry-on-change 3 | dar -x - --sequential-read -R /home2/my

The --retry-on-change let dar retry the copy of a file up to three times if that file has changed at the time dar was reading it. You can increase this number at will. If a file fails to be copied correctly after more than the allowed retry, a warning is issued about that file and it is flagged as dirty in the data flow, the second dar command will then ask you whether you want it to be restored (here copied) on not.

"piping" ('|' shell syntax) the first dar's output to the second dar's input makes the operation not requiering any temporary storage, only virtual memory is used to perform this copy. Compression is thus not requested as it would only slow down the whole process.

last point, you should compare the copied data to the original one, before removing it, as no backup file has been dropped down to filesystem. This can simply be done using: diff -r <srcdir> <dstdir>

But, no, diff will not check extended Attributes, File Forks or Posix ACL, hard linked inodes, etc. If you want a more controlable way of copying a large directory, simply use dar with a real backup file, compare the backup toward the original filesystem, restore the backup contents to its new place, and compare the restored filesystem toward the original backup.

Any better idea? Feel free to contact dar's author for an update of this documentation!

Does dar compress per file or the whole backup?

Dar uses compression (gzip, lzo, bzip2, xz/lzma, zstd, lz4, ...) with different level of compression (1 for quick but low compression up to 9 for best compression but slower execution) on a file by file basis. I other words, the compression engine is reset for each new file added into the backup. When a corruption occurs in a file like a compressed tar backup, it is not possible to decompress the data passed that corruption, with tar you loose all files stored after such data corruption.

Having compression per file has instead the advantage to only impact one file inside the backup and all files that are stored before or after such data corruption can still be restored from that corrupted backup. Compressing per file opens the possibility to not compress all files in the backup, in particular already compressed files (like *.jpeg, *.mpeg, some *.avi files and of course the *.gz, *.bz2 or *.lzo files). Avoiding compressing already compressed files save CPU cycles (in other words it speeds up backup process time). And while compressing an already compressed file takes time for nothing, it also leads to require more storage space than if that same file was not compressed a second tim

The drawback is that the overall compression ratio is slightly less good.

How to activate compression with dar? Use the --compression option (or -z in short), telling the algorithm to use and the compression level (--compression=bzip2:9 or -zgip:7 for example), you may not mention the compression ratio (which default to 9) and even not mention the compression algorithm which default to gzip. Thus -z or -zlzo are correct.

To select file to compress or not compress, several options are available: --exclude-compression (or -Z in short --- the uppercase Z here) --include-compression (or -Y in short). Both take as argument a mask that based on their names define files that have to be compressed or not to be compressed. For example -Z "*.avi" -Z "*.mp?" -Z "*.mpeg" will avoid compressing MPEG, MP3, MP2 and AVI files. Note that dar provides in its /etc/darrc default configuration file, a long list of -Z options to avoid compressing most common compressed files, that you can activate by simply adding compress-exclusion on dar command-line.

In addition to excluding/including files from compression based on their name, you can also exclude small files (for which compression ratio is usually poor) using the --mincompr option which takes a size as argument: --mincompr 1k will avoid compressing files which size is less than or equal to 1024 bytes. You should find all details about these options in dar man page. Check also the -am and -ar options to understand how --exclude-compressionand --include-compression interact with each other, or how to use regular expressions in place of glob expressions in masks.

What slice size can I use with dar?

The minimum slice size is around 20 bytes, but you will only be able to store 3 to 4 bytes of information per slice, due to the slice header that need around 15 bytes in each slice (this vary depending on options used and may increase in future backup version format). But there is no maximum slice size! In other words you can give to -s and -S options an as long as required positive integer, thanks to its internal own integer type named "infinint" dar is able to handle arbitrarily large integers (file offset, file size, etc.).

You can make use of suffixes like 'k' for kilo, M for mega, G for giga etc... (all suffixes are listed here) to simplify your work. See also the -aSI and -abinary options to swap meaning between ko (= 1000 octets) kio (= 1024 octets).

Last point dar/libdar can be compiled using the --enable-mode=64 option given to ./configure while building dar (this is the default since release 2.6.0). This replaces the "infinint" type by 64 bits integers, for better performances and reduced memory usage. However this has some drawback on backup size and dates. See the limitations for more details.
Since release 2.6.0 the default being the 64 bits mode, to have dar/libdar using infinint one need to use the following option ./configure --enable-mode=infinint.

Is there a dar fuse filesystem?

You can find several applications relying on dar or directly on libdar to manage dar backup, these are referred here as external software because they are not maintained nor have been created by the author of dar and libdar. AVFS is such external software that provides a virtual file system layer for transparently accessing the content of backups and remote directories just like local files.

how dar compares to tar or rsync

All depends on the use case you want to address. A benchmark has been setup to match the performances, features and behaviors or dar, rsync and tar in regard to a set of common use cases. Hopefully this will help you answer this question.

Why when comparing a backup with filesystem, dar does not report new files found on filesystem?

Backup comparison (-d option) is to be seen as a step further than backup testing (-t option) where dar checks the backup internal structure and usability. The step further here is not only to check that each part of the backup is readable and has a correct associated CRC but also that it matches what is present on filesystem. So yes, if new files are present on filesystem, nothing has to be reported. If a file changed, dar reports that the file does not match what's in the backup, if a file is missing dar cannot compare it with filesystem and reports an error too.

So you want to know what has changed on your filesystem? No problem, do a differential backup! OK, you don't want to have a new backup or do not have the space for that, just output the backup to /dev/null and request on-fly isolation as follows:

dar -c - -A <ref backup> -@ <isolated> ... other options ... > /dev/null

<ref backup>
is the backup of reference or an isolated catalogue
<isolated>
is the name of the isolated catalogue to produce.

Once the operation has completed, you can list the isolated catalogue using the following command:

dar -l <isolated> -as

It will give you the exact difference between your current filesystem and the filesystem at the time the <ref backup> was done: modified files and new files are reported with [inref] for either data EA or both, while deleted files are reported by [--- REMOVED ENTRY ----] information, followed by the estimated removal date and the type of the removed file ([-] for plain file, [d] for directory, and so on. More details in dar man page for listing command).

Why dar does not automatically perform delta difference (aka rsync increment)?

Because delta different is subject in theory to checksum collision (but it is very unprobable though), which could lead a new version of a file being seen the same as an older one while some changes took place in it. A second reason is to take care of users preference, that do not want having this feature activated by default. Well, now, activating delta difference with dar is quite simple and flexible, see note.

Why do dar reports truncated filenames under Windows, especially with cyrillic filenames?

Dar/libdar has been first developer for Linux. It has been later ported to many other operating systems. For Unix-like system (FreeBSD, Solaris, ...), it can run as a native program by just recompiled it for the target OS and processor. For Windows system, it cannot because Unix and Windows systems do not provide the same system calls at all. The easiest way to have dar running under Windows was to rely on Cygwin, which translates the Unix system calls to Windows system calls. However Cygwin brings some limitations. One of them is that it cannot provide filenames longer than 256 bytes, while today's Windows can have much longer filenames.

What the point with cyrillic filenames? Cyrillic characters unlike most latin ones are not stored as a single byte, they usually use several bytes per character, thus this maximum file size is reached much quicker than with latin filenames, but the problem also exists with them.

The consequence is that when dar reads a directory that contains a large filename, the Cygwin layer is not able to provide it entierly: the filename is truncated. When dar wants to read information about that filename most of the time such truncated filename does not exists and dar reports the message from the system that this file does not exists (which might sound strange from user point of view). Since release 2.5.4 dar reports instead that filename has been truncated and that it will be ignored.

I have a 32 bits windows system, which binary package can I to use?

Up to release 2.4.15 (including) the dar/libdar binaries for windows were built on a 32 bits windows (XP) system. After that release, binaries for windows have been built using a 64 bits windows system (7, now 8 and probably 10 soon). Unfortunately, the filename of the binary packages for windows do not reflect that change and have still been labeled "i386" while included binaries do no more supporting i386 CPU family (which are 32 bits CPU). This is an oversight that has been unseen until Adrian Buciuman's remark in dar-support mailing-list September 23d, 2016. In consequence after that date binary packages for windows will receive an additional field corresponding to the windows flavor they have been built against.

Some may still need 32 bits windows binaries of dar, unfortunately I have no more access to such system, but if you have such windows ISO image and valid license to give me, I could install it into a virtual machine and provide binary packages for 32 bits too.

Until then, you can build yourself the binary for windows. Here follows the recipe:

install Cygwin on windows including at least the following packages:

Then get the dar source code and extract its content (either using windows native tools or using tar under cygwin) For clarity let's assuming you have extracted dar source package for version x.y.z into C:\Temp directory, thus you now have the directory C:\Temp\dar-x.y.z

Run a cygwin terminal and "cd" into that directory:

cd /cygdrive/c/Temp/dar-x.y.z

In the previous command, note that from within a cygwin shell, the path use slashes not windows backslashes ; note also the 'c' is lowercase while windows shows upper case letter for drives...

But don't worry, we are almost finished, run the following script:

misc/batch_cygwin x.y.z

starting release 2.5.7 the syntax will change / has changed

misc/batch_cygwin x.y.z win32

the new "win32" or "win64" field will be used to label the zip package containing the dar/libdar binary for windows, that's up to you to choose the value corresponding to your OS 32/64 bits flavor.

At the end of the process you will get a dar zip file for windows in C:\Temp\dar-x.y.z directory.

Feel free to ask for support on dar-support mailing-list if you enconter any problem building dar binary for windows, this FAQ will be updated accordingly.

Path slash and back-slash consideration under Windows

The paths given to dar's arguments and options must respect the UNIX way (use slashes "/" not back slashes "\" as it ought to be under Windows) thus for example you have to have to use /temp in place of \temp

Moreover, drive letters cannot be used the usual way, like c:\windows\system32. Instead you will have to give the following path /cygdrive/c/windows/system32. As you see the /cygdrive directory is a virtual directory that has all the drives as children directories:

Here is a more global example:

 c:\dar_win-1.2.1\dar -c /cygdrive/f/tmp/toto -s 2G -z1 -R "/cygdrive/c/My Documents"     ^ ^ ^ ^ ^   | | | | |   --------------- ---------------------------   here use anti-slash but here we use slash   as usually under in arguments given to dar   windows to point   the command Under Windows, which directory corresponds to /

When running dar from a windows command-line (thus not from cygwin environement), dar's root directory is the parent directory of the one holding the dar.exe file. This does not mean that you cannot have dar backing up anything outside this directory (you can thanks to the /cygdrive/... path alias seen above), but when dar looks for darrc it looks using this parent directory as the "/" root one.

Since release 2.6.14, the published dar packages for Windows are configured and built in such a way that dar.exe uses the provided darrc file located in the etc sub-directory. So darrc is now usable, out of the box. However if you rename the directory where dar.exe is located, which name is something like dar64-x.y.z-win64, the dar.exe binary will still look for a darrc at /dar64-x.y.z-win64/etc/darrc, taking as root directory the parent directory of the directory where it resides. You can still then explicitely rely on it by mean of a -B option pointing to the modified path where the darrc is located.

lzo compression is slower with dar than with lzop, why?

when using the "lzo" compression algorithm, dar/libdar always uses the algorithm lzo1x_999 with the compression level requested (from 1 to 9) as argument. Dar thus provides 9 different compression/speed levels with lzo.

In the other hand, as of today (2017) lzop, the command line tool, uses the very degradated lzo algorithm known as lzo1x_1_15 for level 1 and the intermediate lzo1x_1 algorithm for levels from 2 to 6, which makes levels 2 to 6 totally equivalent from the lzop program point of view. Last, compression levels 7 to 9 for lzop uses the same lzo1x_999 algorithm as what dar/libdar uses, which is the only algorithm of the lzo family that makes use of a compression levels. In total lzop only provides 5 different compression levels/algorithms only.

So now, you know why dar is slower than lzop when using lzo compression at level 1 to 6. To get to equivalent feature as lzop provides for level 1 and 2-6, dar/libdar provides two additional lzo-based compression algorithms: lzop-1 and lzop-3. As you guess, lzop-1 uses the lzo1x_1_15 algorithm as lzop does for its compression level 1, and lzop-3 uses the lzo1x_1 algorithm as lzop does for its compression levels 2 to 6. For both lzop-1 and lzop-3 algorithms, the compression level is not used, you can keep the default or change its value this will not change dar behavior.

compression level
for lzop
algorithm for dar compression level
for dar
lzo algorith used
1 lzop-1
-
lzo1x_1_15
2
lzop-3
-
lzo1x_1
3
lzop-3 -
lzo1x_1
4
lzop-3 -
lzo1x_1
5
lzop-3 -
lzo1x_1
6
lzop-3 -
lzo1x_1
-
lzo
1
lzo1x_999
-
lzo
2
lzo1x_999
-
lzo
3
lzo1x_999
-
lzo
4
lzo1x_999
-
lzo
5
lzo1x_999
-
lzo
6
lzo1x_999
7
lzo
7
lzo1x_999
8
lzo
8
lzo1x_999
9
lzo
9
lzo1x_999

What is libthreadar and why libdar relies on it?

libthreadar is a wrapping library of the Posix C threads. It was originally part of webdar a libdar based web server project, but as this code became necessary also inside libdar, all this thread relative classes have been put into a separated library called libthreadar, that today both webdar and libdar rely upon.

dar/libdar rely on libthreadar to manage several threads inside libdar, which is necessary to efficiently implement the remote repository feature based on libcurl (available starting release 2.6.0).

Why not using boost library or the thread suppport brought by C++11?

Because first no complier implemented C++11 at the time webdar was started and second boost thread was not found to be adapted to the need for the following reasons:

libthreadar does all this and is a completely independant piece of software from both webdar and dar/libdar. So you can use it freely (LGPLv3 licensing) if you want. As all project I've been published, it is documented as much as possible, feedback is always welcome of something is not clear, wrong or missing.

libthreadar source code can be found here, documentation is available in source package as well as online here

I have sftp pubkey authentication working with ssh/sftp, how to have dar using too this public key authentication for sftp?

The answer is as simply as adding the following option while calling dar: -afile-auth

Why not doing pubkey by default and falling back to password authentication?

First this is by choice, because -afile-auth also uses ~/.netrc even when using sftp. Second it could be possible to first try public key authentication and falling back to password authentication, but it would require libdar to first connect, eventually failing if pubkey was not provisionned or wrong then connecting again asking user for password on command line. I seems more efficient doing else: file authentication when user ask to to so, password authentication else. The counterpart is not huge for user (you can add -afile-auth in your ~/.darrc and forget about it).

I Cannot get dar to connect to remote server using SFTP, it fails with SSL peer certificate or SSH remote key was not OK

This may be due to several well known reasons:

How to workaround?

For the three first cases, you can make use of environment variable to change the default behavior:

DAR_SFTP_KNOWNHOSTS_FILE DAR_SFTP_PUBLIC_KEYFILE DAR_SFTP_PRIVATE_KEYFILE

They respectively default to:

$HOME/.ssh/known_hosts $HOME/.ssh/id_rsa.pub $HOME/.ssh/id_rsa

Changing them accordingly to your need is done before running dar from the shell, for example if you use sh or bash:

export DAR_SFTP_KNOWNHOSTS_FILE=~/.ssh/known_hosts_alternative # then use dar as expected dar -c sftp://.... dar -t sftp://...

if you use csh or tcsh:

setenv DAR_SFTP_KNOWNHOSTS_FILE ~/.ssh/known_hosts_alternative # then use dar as expected dar -c sftp://... dar -t sftp://...

For the fourth and last case, the thing is more tricky:

First, if you don't already know what the known_hosts file is used for:

It is used by ssh/sftp to validate that the host you connect to is not a pirate host trying to put itself between you and the real sftp/ssh server you intend to connect to. Usually the first time you connect to an sftp/ssh server you need to validate the fingerprint of the key received from the server (checking by another mean like phone call to the server's admin, https web browsing to the server page, and so on). When you validate the host key the first time, this adds a new line in known_hosts file in order for ssh/sftp client to automatically check the next time you connect that the host is still the correct one.

The known_hosts file is usually located in your home directory at ~/.ssh/known_hosts and looks like this:

asteroide.lan ecdsa-sha2-nistp256 AAAAE2V... esxi,192.168.5.20 ssh-rsa AAAAB3N... 192.168.6.253 ssh-rsa AAAAB3N...

Each line concerns a different sftp/ssh server and contains three fields

<hostame or IP>
this is the server we have already connected to
<host-key type>
this is the type of key
<key>
this is the public key the server has sent the first time we connected

We will focus on the second field.

dar/libdar relies on libcurl for networking protocol interaction, which in turn relies on libssh2. Before libssh2 1.9.0 only rsa host key were supported leading to this message as soon as the known_hosts file contained a non-rsa host key (even another host listed in the known_hosts file than the one we tend to connect). As of December 2020, if 1.9.0 has now support for addition host key types (ecdsa and ed25519) libcurl does not yet leverage this support and the problem persists. I'm confident that things will be updated soon for this problem to be solved in a few months.

In the meantime, several options are available to workaround that limitation:

  1. disable known_hosts checking, by setting the environment variable DAR_SFTP_KNOWNHOSTS_FILE to an empty string. Libdar will then not ask libcurl/libssh2 to check for known hosts validity, but this is not a recommend option! because it opens the door to man-in-the-middle attacks.
  2. copy the known_host file to ~/.ssh/known_host_for_libssh2 and remove from this copy all the lines corresponding to host keys that are not supported by libssh2, then set the DAR_SFTP_KNOWNHOSTS_FILE variable to that new file. This workaround is OK only if the non supported host key are not the one you intend to have dar communcating with...
  3. replace the the host key of the ssh/sftp server by an ssh-rsa one, OK, this will most probably imply you to have root permission on the remote ssh/sftp server... which is not possible when using public cloud service over Internet.
Cannot open catalogue: Cannot handle such a too large integer. What to do?

Unless using dar/libdar built in 32 bits mode, you should not meet this error message from dar unless exceeding the 64 bits integer limits. To know which intergers type dar relies on (infinint, 32 bits or 64 bits) run dar -V and check the line Integer size used:

# src/dar_suite/dar -V dar version 2.7.0_dev, Copyright (C) 2002-2020 Denis Corbin Long options support : YES Using libdar 6.3.0 built with compilation time options: gzip compression (libz) : YES bzip2 compression (libbzip2) : YES lzo compression (liblzo2) : YES xz compression (liblzma) : YES zstd compression (libzstd) : YES lz4 compression (liblz4) : YES Strong encryption (libgcrypt): YES Public key ciphers (gpgme) : YES Extended Attributes support : YES Large files support (> 2GB) : YES ext2fs NODUMP flag support : YES Integer size used : 64 bits Thread safe support : YES Furtive read mode support : YES Linux ext2/3/4 FSA support : YES Mac OS X HFS+ FSA support : NO Linux statx() support : YES Detected system/CPU endian : little Posix fadvise support : YES Large dir. speed optimi. : YES Timestamp read accuracy : 1 nanosecond Timestamp write accuracy : 1 nanosecond Restores dates of symlinks : YES Multiple threads (libthreads): YES (1.3.1) Delta compression (librsync) : YES Remote repository (libcurl) : YES argon2 hashing (libargon2) : YES compiled the Jan 7 2021 with GNUC version 8.3.0 dar is part of the Disk Backup suite (Release 2.7.0_dev) dar comes with ABSOLUTELY NO WARRANTY; for details type `dar -W'. This is free software, and you are welcome to redistribute it under certain conditions; type `dar -L | more' for details.

If you read "infinint" and see the above error message from dar, thanks to report a bug this should never occur. Else the problem appear when using dar before release 2.5.13 either at backup creation time when dar met a file with a negative date, or at backup reading time, reading a backup generated by dar 2.4.x or older and containing a file with a very distant date in the future thing dar 2.4.x and below recorded when the system returned a negative date for a file to save.

What is a negative date? Date of files are recorded un "unix" time, that's to say the number of second elapsed since the beginning of year 1970. A negative date is means a date before 1970, which should normally not be met today because the few computer that existed at that time had not such way of storing dates nor the same files and filesystems.

However for some reasons such negative dates can be set returned by several operating systems (Linux based ones among others) and dar today has not the ability to record such dates (but if you need dar storing negative dates for a good reason please fill a feature request with the reason you need this feature).

Since release 2.5.13 when dar the system reports a negative date for a file to save, dar asks the user to consider the date was zero, this requires user interaction and may not fit all needs. For that reason, the -az option has been added to automatically assume negative dates read from filesystem to be equal to zero (January 1st 1970, 00:00 GMT) without user interaction.

I have a diff/incremental backup and I want to convert it to a full backup, how to do that?

it is possible to convert a differential backup if you also have the full backup is has been based on, in other words: the backup of reference. This is pretty simple to do:

dar -+ new_full_backup -A backup_of_reference -@ differential_backup full-from-diff [other options]
new_full_backup
is a backup that will be created according the provided other options (compression, encryption, slicing, hashing and so on as specified on arguments).
backup_of_reference
is the full backup that was used as reference for the differential backup
differential_backup
is the differential backup you want to convert into a full backup

the important point is the last argument "full-from-diff" which is defined in /etc/darrc and makes the merging operation used here (-+ option) working as expected for the resulting backup be the same as if a full backup had been done instead of a differential backup at the time "differential_backup" was created.

For incremental backups, (backup which reference is not a full backup) you can also use this method but you first need to create the full backup from the incremental/differential backup that has been used as reference for this incremental backup. Thus the process should follow the same order used to create backups.

How to use dar with tapes (like LTO tapes)?

dar (Disk Archive) was designed to replace tar (Tape archive) to leverage the direct access brought by disks, something tar was not able to use. A tape by nature does not allow to jump to a given position (or at least, it is so inefficient to skip back and forth, that this is barely used). That said, dar has also evolved to replace tar when it comes to use tapes (like LTO tapes) as backup media. The advantage of dar here is the integrated ciphering, efficient compression (no need to compress already compressed files), resiliency, redundancy and CRC data protection for the most interesting features.

Backup operation

dar can produce a backup on its stdout, which can be piped or redirected to a tape device. That's easy:

dar -c - (other options)... > /dev/tape dar -c - (other options) | some_command > /dev/tape

Thing get more complicated when the backup exceeds the size of a single tape. For that reason dar_split has been added to the suite of dar programs. Its purpose it to receive the backup produce by dar on its standard input and write it to a given file up to the time the write fails due to lack of space. At that time, it records what was written and what still remains to be written down, close the descriptor for the target file, display a message to the user and waits for the user to hit enter. Then it reopens the file and continues writing the pending data to that target file. The user is expected to have made the necessary for further writing to this same file (or special device) to work, for example by replacing the tape by a new one rewound at its beginning, tape that will be overwritten by the continuation of the dar backup:

dar -c - (other options)... | dar_split split_output /dev/tape

Testing operation

Assuming you have completed your backup over three tapes, you should now be concerned by testing the backup:

dar_split split_input /de/tape | dar -t - --sequential-read

Before running the previous call, you should have rewound all your tapes at the offset they had when you used them to write the dar backup (their beginning, most of the time). The first tape should have been inserted in the drive ready for reading. dar nor dar_split know about the location of the data on tape, they will not seek the tape forth or backward, they will just sequentially read (or write depending on the requested operation).

when dar_split readings will reach the end of the tape, the process pause and let you swap the tape with the following one. You can also take the time to rewind the tape before swapping it, if you want. Once the next tape is ready in the drive and set at the properly offset, just hit enter in the dar_split terminal for the process to continue.

At the end of the testing dar will report the backup status (hopefully the backup test will succeed) but dar_split does not know anything about that and still continues to try providing data to dar, so you will have to hit CTRL-C to stop it.

to avoid stopping dar_split by hand, you can indicate to dar_split the number of tapes used for the backup, by mean of -s option. If after the last tape at backup time you wrote an EOF tape mark mt -f /dev/tape weof then dar_split will stop by itself after that number of tape. In our example, the backup expanded over three tapes, where from the -c 3 option:

dar_split -c 3 split_input /dev/tape | dar -t - --sequential-read

Listing operation

Listing operation can be done the same way as the testing operation seen above, just replacing -t by -l:

dar_split split_input /dev/tape | dar -l - --sequential-read

But what a pity not to use the isolated catalogue feature! Catalogue isolation let you keep on disk (not on tape) a small file containing the table of content of the backup. Such small backup can be used as backup of the internal catalogue of the backup (which resided on tape) to recovery corruption of that part of the backup (this gives an additional level of protection for backup metadata). It can also be used for backup content listing, it can be provided to dar_manager and most interesting can be used as reference for incremental or differential backups in place of reading the reference backup content from tapes.

Assuming you did not created an isolated catalogue at the time of the backup, let's do it once the backup has been written to tape:

dar_split split_input /dev/tape | dar -A - --sequential-read -C isolated -z

This will lead dar to read the whole backup. Thus, it is more efficient to create it "on-fly", which means during the backup creation process, in order to avoid this additional reading operation:

dar -c - --on-fly-isolate isolated (other options)... | dar_split split_output /dev/tape

You will get a small isolated.1.dar file (you can replace isolated after -C or --on-fly-isolate options, by a more meaningful name of course), file located in the current directory by default, while your backup will be sent to tapes, as already seen earlier.

The isolated catalogue can now be used in place of the backup on tapes, the process becomes much much faster for listing the backup content:

dar -l isolated (other options like filters)...

Restoration operation

You can perform a restoration the same way we did the backup testing above, just replacing -t by -x:

dar_split split_input /dev/tape | dar -x - --sequential-read (other options like --fs-root and so on)

But better leverage an isolated catalogue, in particular if you only plan to restore a few files. Without isolated catalogue dar will have to read the whole backup up to its end (the same as tar does but for other reasons) to reach the internal catalogue that contains additional information (like files that have been remove since backup of reference was made). Using an isolated catalogue avoids that and let dar to stop reading earlier, that's to say, once the last file to restore will have been reached in the backup. So if this file is located near the beginning of the backup, you can save a lot of time using an isolated catalogue!

dar_split split_input /dev/tape | dar -x - -A isolated --sequential-read (other options, like --fs-root and so on)

Rate limiting

It is sometime necessary to rate limit the output from and to tapes. dar_split has a -r option for that purpose:

dar_split -r 10240000 split_input /dev/tape | dar ... dar ... | dar_split -r 20480000 split_output /dev/tape

Argument to -r option is expected in bytes per second.

Block size

Some tape device do not behave well if the data requested or sent to them uses large block of data at once. Usually the operating system knows about that and split application provided data in smaller blocks, if necessary. Sometimes this is not the case, where from the -b option that receives the maximum block size in bytes that dar_split will use. It does not matter whether the block size used when writing is different from the one use at reading time, both must just not exceed the block size supported by the tape device:

dar_split -b 1048576 split_input /dev/tape | dar ... dar ... | dar_split -b 1048576 split_output /dev/tape

Differential and incremental backups

Differential and incremental backups are built the same way: providing the backup of reference at the time of the backup creation, by mean of dar's -A option. One could use dar_split twice for that: once to read the backup of reference from a set of tapes, operation that precedes the backup itself, then a second dar_split command to send the backup to tapes... The problem is that the second backup will open the tape device for writing while it has first to be open for reading by the first dar_split command, in order to fetch the backup of reference.

Thus, in this context, we have no choice (unless we have two tape drives): we must rely on an isolated catalogue of the backup of reference:

dar -c - -A isolated_cat_of_ref_backup (other options)... | dar_split split_output /dev/tape

dar_split and tar

dar_split is by design a separated command from dar. You can thus use it with any other command than dar, in particular, yes, you can use it with tar if you don't want to rely on the additional features and resiliency dar provides.

Why dar does not compress small files together for better compression ratio?

Since around year 2010, this is a question/suggestion/remark/revew that haunted the dar-support mailing-list and new feature requests, resurrecting from time to time: Why dar does not compress small files together in dar archive for better compression, like tar does? (its grand and venerable brother).

First point to note: tar does not compress at all. This is gzip, bzip2, xz or other similar programs that take as unstructured input what tar outputs, in order to produce an unstructured compressed data stream redirected into a file.

It would be tempting to answer: "You can do the same with dar!", but there are better things to do, read below.

But before let's remind dar's design and objectives:

Doing so, has several advantages:

dar is doing that way, because tar's way was not addressing some major concerns in the backup area. Yes, this has the drawback to degrade the compression ratio, but this is a design choice.

Now, looking for the best of both approaches, some proposed to gather small files together and compress them together. This would not only break all the three advantages exposed above, but also break another feature which is the order in which files are stored: Dar does not inspect twice the same directory at backup time nor at restoration time. Doing so avoids saving the full path of each directory and file (and at two places: in-line metadata and in the catalog). This also leads to better performances as it better leverage disk cache for metadata (directory content). OK, one could say that today with SSD and NVMe this is negligible, but one would ignore that direct RAM access from cache, is still much faster than any NVMe disk access.

So, if you can't afford keeping small files uncompressed (see dar's --mincompr, -X and -I options for example), or if compressing them with dar versus what tar does makes a so big difference that it worth considering to compress them together, you have three options:

  1. use tar in dar

    • make a tar archive of the many small files you have, just a tar file, without compression. Note: you can automate this when entering some particular directory trees of your choices by mean of -< -> and -= options, and remove those temporary tar file when dar exit those directories at backup time. You would also have to exclude those files used to build the tar file you created dynamically (see -g/-P/-X/-I/-[/-] options).
    • Then let dar perform the backup, compressing those tar files with other files, if they satisfy the --mincompr size, or any other filtering of you choice (see -Z and -Y options). Doing so can let you leverage parallel compression and reduced execution time, brought by dar, something you cannot have with tar alone.
    • Of course, you benefit also of all other dar's features (slicing, ciphering, slice hashing in fly, isolated catalogues, differential/incremental/decremental backups... and even delta binary!)

    But yes, you will lose dar's three advantages seen above, but just for those small files you have gathered in a tar in dar file, not for the rest of what's under backup.

  2. use tar alone

    If dar does not match your need and/or if you do not need to leverage any of the three dar's advantages seen above, tar is probably a better choice for you. That's a pity, but there is not one tool that matches all needs...

  3. describe with details a new implementation/enhancement

    The proposal should take into account dar's design objectives (robustness to data corruption, efficient directory seeking, fast access to any file's data) in a way or another.

    But please, do not make an imprecised proposal, that assumes it will just "magically" work: I only like magic when I go to a magic show ;)

    Thanks to detail both backup and restoration processes. Often times, pulling out the missing details one after the other, results in something unfeasible or with unexpected complexity and/or much less gain than expected. Also look at the Dar Archive Structure to see how it could fit or if not, what part should be redesigned and how.