Often I find myself poring over data files, usually logs or large output from programs or data sets, trying to find any differrences, if they exist. Years ago the method of finding differences in similar files was to get a text editor and scroll them on the screen simultaneously if possible. Not a very accurate method and seriously hard on the eyes.

I’ll start with the most simple and prosaic of comparison tools, the diff command. Comparing two files with diff is pretty easy, the command would be:

diff file1 file2

Most people are confused about the output from a diff compare, as any differences will be shown rather cryptically with < and > arrows, sets of numbers and letters etc.. The output is not designed to be necessarily human-friendly, it’s designed to be used to patch files with the updates to those files, and the output is really a set of instructions that will be used by the patch command when the patch is applied. Explaining this in a posting of this length without putting people to sleep is not really possible, so for a more detailed view of these instructions, visit the GNU help pages for comparing and merging files.

A particularly useful module or offshoot of VIM (VI iMproved) is my favorite method for comparing files, used by executing the vimdiff command with two or more filenames as arguments. For example, if I had file1 and file2 to compare, I would execute the following command:

vimdiff file1 file2

This opens a version of VIM with two windows, vertically separated, making it easy to visually compare the two files. If I scroll the first file, it locksteps the second file, moving them both so you can see the changes in real-time. You can switch between file1 and file2 by pressing Ctrl-w and then w again, and quitting all the files is easiest by hitting ESC and then typing :qall.

More GUI-related tools abound, the most common of which seems to be Meld, which you can read more about in this Linux.com article. Other options include Diffuse, a graphical tool that does similar things to Meld. Another tool in the same type and style is Directory Synchronise.

Of course this doesn’t include tools like uniq, which will go over a file and after sorting the lines to group any exact matches, will discard all but a single unique instance of that line. The resulting output is sent to standard out, typically the console. This doesn’t tell you the differences between files, but it’s seriously useful

Got a fave tool that you use to compare files or directories? Post a comment and if we update the story with it, you’ll get a shout-out/mention and some good karma, probably.

RossB

P.S.   thehoagie commented that the Trac tool is a great way to see differences in code, see a demo here.

Advertisements