Once there was this guy asking me would I be able to write a file deduplication script in shell.
It is not very hard and it is a curious problem, so I am publishing my code here:
#!/bin/bashI am wondering, would it be possible to be made even simpler...
[ ! -d $1 ] && echo "$1 is not a directory! exit" && exit 1
cd $1
oldsize="yyyyy";oldname="xxxxx"
find . -type f -ls | awk '{ print $7":"$11 }' | sort -k 1,1 -n -r | while read line; do
size=${line%%:*}
name=${line##:*}
if [ "$oldsize" == "$size" -a -f "$name" -a -f "$oldname" ] && diff -s "$oldname" "$name"; then
rm -f "$name"
ln "$oldname" "$name"
continue
fi
oldsize="$size"
oldname="$name"
done