Delian's Tech blog: May 2017

Monday, May 29, 2017

File deduplication written in bash

Once there was this guy asking me would I be able to write a file deduplication script in shell.

It is not very hard and it is a curious problem, so I am publishing my code here:

#!/bin/bash
[ ! -d $1 ] && echo "$1 is not a directory! exit" && exit 1
cd $1
oldsize="yyyyy";oldname="xxxxx"
find . -type f -ls | awk '{ print $7":"$11 }' | sort -k 1,1 -n -r | while read line; do
size=${line%%:*}
name=${line##:*}
if [ "$oldsize" == "$size" -a -f "$name" -a -f "$oldname" ] && diff -s "$oldname" "$name"; then
rm -f "$name"
ln "$oldname" "$name"
continue
fi
oldsize="$size"
oldname="$name"
done

I am wondering, would it be possible to be made even simpler...