Why Worry About It?

Backups are essential, and so is the reducing the time needed to perform those backups. Many’s the time I have sat waiting for a backup to complete only to remember that I had a link to a large set of files, or a bunch of ISO files in the ./download directory, and had to migrate those over to somewhere else and restart the backup.

If you are like me, data = “files that contain data of original or irreplaceable content”. I don’t want to backup ISO files, large sets of files that can be gotten from an install DVD or things that are easy to download from a site somewhere.

I use a simple (yeah, it really is), script before every backup to find all the files over a particular size, which I then can so anything I want with. If I find anything that’s too large and expendable, I either use an -exclude statement (usually in the case of all ISO’s) or even move the files elsewhere quickly by re-running the script and tacking on a -exec statement.

The Script

Here is the script I use, it’s from a bunch of different sources, and uses a couple of useful tools to do it’s work:


echo "Enter the fully-qualified start path"
read start_path
echo "Enter the lower size limit in Megabytes"
read lower_size
find $start_path \( -size +"$lower_size"M -fprintf ~/Desktop/bigfiles.txt '%kk %p\n' \)

Fables of the Deconstruction


The first line is where you declare what shell you want to run this script with. This string is known as the “shebang”, not absolutely necessary since it defaults to the bash shell anyway, but it’s certainly good form.

echo "Enter the fully-qualified start path"
read start_path

Lines 3 and 4 work together, prompting you to enter the fully-qualified start path and then storing what you enter in the newly-created variable named start_path. This is expanded in Line 7 by referring to it’s name $start_path.

echo "Enter the lower size limit in Megabytes"
read lower_size

The same arrangement occurs with lines 5 and 6, you’re prompted to enter the smallest size in Megabytes you want to report on, which stores that in the newly-created variable named lower_size. This too is expanded in Line 7 with the name $lower_size.

All Together Now

Line 7 is where all the fun stuff happens. First you are using the find command, not the easiest thing for newcomers, but well worth, ahem, “finding” out more about. Find requires several things, shown below:

find (path) (-option) (expression)

We’re using the start_path variable as the (path), then we include a function (sort of a macro) that looks for files of a size that is at least the value of the lower_size variable we set and populated earlier. Then when it finds each file over that size, it will print out the file size in 1K blocks, followed by the LETTER k, so it’s obvious, and then the full path and name of the file that has been found. This will all then be output to a file named bigfiles.txt in the current user’s Desktop folder.

Note: The use of the tilde (technical name: squiggle 8-> ) character in a command means to expand the current user’s $HOME variable from the executing shell, so the full path of the bigfiles.txt file if rossb is running the script is:


Running the Script

Executing scripts that aren’t in your path (the variable, not the physical directory) is different on Linux/Unix, either you’ll use this script as a parameter to the bash shell:

# /bin/bash findbigfiles.sh

Or you’ll use the following command to set the script to be executable:

# chmod +x findbigfiles.sh

Then when it’s set to executable, you’ll either need to put it in your path, (try /usr/local/bin) or execute it by preceding it with the characters “./”, which is necessary to execute something in the local directory if it’s not in the path:

# ./findbigfiles.sh


There are so many other things you can do with find, such as tack on a -exec statement and execute a command on each and every file found, or find and act on files that meet a particular permission set, the possibilities are nearly endless.

Of particular help in my work over the years with the find command has been the find man page, with it’s useful examples and Chapter 14 “Finding Files with Find” of the Unix Power Tools 2nd Edition from O’Reilly and Associates. (I know the 2nd Edition is out of print, but I don’t care much for the 3rd Edition’s updates).

Let us know in the comments what cool find scripts you have come up with, the randomly drawn winner will get a very cool Novell-Candy-Apple-Red 9 LED flashlight. (Sorry, Continental U.S. only).