Monday, May 29, 2017

File deduplication written in bash

Once there was this guy asking me would I be able to write a file deduplication script in shell.

It is not very hard and it is a curious problem, so I am publishing my code here:

#!/bin/bash
[ ! -d $1 ] && echo "$1 is not a directory! exit" &&  exit 1
cd $1
oldsize="yyyyy";oldname="xxxxx"
find . -type f -ls | awk '{ print $7":"$11 }' | sort -k 1,1 -n -r | while read line; do
  size=${line%%:*}
  name=${line##:*}
  if [ "$oldsize" == "$size" -a -f "$name" -a -f "$oldname" ] && diff -s "$oldname" "$name"; then
      rm -f "$name"
      ln "$oldname" "$name"
      continue
  fi
  oldsize="$size"
  oldname="$name"
done
I am wondering, would it be possible to be made even simpler...