How to Do this List Comparison with Find?

I do and get a list of files where I would like to delete many duplicate backup files

find . -type f -name '._*' 

I would like to find those files which have a corresponding filename

  • /home/masi/._test.tex matches /home/masi/test.tex
  • /home/masi/math/lorem.png matches /home/masi/math/._lorem.png

Pseudocode about files wanted to be saved filename which has corresponding ._filename but also save filename without ._filename

find . -type f -name '._*' -exec \ 
   find filenameWithoutDotUnderscore, if yes, print the filename

Pseudocode 2 clarification about files wanted to be removed = ._filename if there is a corresponding filename

  • If there is filename and ._filename in the same directory, print ._filename such that I can remove the duplicate = ._filename.
  • Exclude filenamePart1_.filenamePart2, bok_3A.pdf, … in ._filename.
  • Do not remove ._filename if there is no corresponding filename in the same directory.

Reviewing Wildcard’s command

I do find . -type f -name '._*' -exec sh -c 'for a; do f="${a%/*}/${a##*/._}"; [ -e "$f" ] && printf "rm -- %s\n" "$a"; done' find-sh {} + but it returns too many files. I think I need more && conditions beside the existence check ([ -e "$f" ]). It would be great to get here some content comparison and lastly diff if suspicion of much difference.

Systems: Ubuntu 16.04 and Debian 8.25
Bash: 4.3.42
Find: 4.7.0

Here is Solutions:

We have many solutions to this problem, But we recommend you to use the first solution because it is tested & true solution that will 100% work for you.

Solution 1

You can do this with find, but to do it robustly you will need to embed a shell one-liner as well. The proper way to do this is one of the following:

Stuff the looping into the spawned shell:

find . -type f -name '._*' -exec sh -c 'for a in "[email protected]"; do f="${a%/*}/${a##*/._}"; [ -e "$f" ] && printf %s\\n "$f"; done' find-sh {} +

Or, spawn a separate shell for each file to be tested (less efficient, potentially more readable):

find . -type f -name '._*' -exec sh -c 'f="${1%/*}/${1##*/._}"; [ -e "$f" ] && printf %s\\n "$f"' find-sh {} \;

To directly remove the backup files, change this to the following for a dry run:

find . -type f -name '._*' -exec sh -c 'for a; do f="${a%/*}/${a##*/._}"; [ -e "$f" ] && printf "rm -- %s\n" "$a"; done' find-sh {} +

Then once you’re satisfied with the list of commands that gets printed, use:

find . -type f -name '._*' -exec sh -c 'for a; do f="${a%/*}/${a##*/._}"; [ -e "$f" ] && rm -- "$a"; done' find-sh {} +

Notes:

In all of these, the find-sh argument is an arbitrary string; you could put anything there. It gets set as $0 within the spawned shell and is used for error reporting.

for a in "[email protected]"; do is exactly equivalent to for a; do.

printf is better than echo.

Quoting is important.

Note: Use and implement solution 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply