How to remove all white spaces just between brackets [] using Unix tools?

Replace text between brackets

Input

testing on Linux [Remove white space] testing on Linux

Output

testing on Linux [Removewhitespace] testing on Linux

So, how can we just remove all the white space between the brackets and achieve output as given?

Here is Solutions:

We have many solutions to this problem, But we recommend you to use the first solution because it is tested & true solution that will 100% work for you.

Solution 1

If the [, ] are balanced and not nested, you could use GNU awk as in:

gawk -v RS='[][]' '
   NR % 2 == 0 {gsub(/\s/,"")}
   {printf "%s", $0 RT}'

That is use [ and ] as the record separators instead of the newline character and remove blanks on every other records only.

With sed, with the additional requirement that there be no newline character inside [...]:

sed -e :1 -e 's/\(\[[^]]*\)[[:space:]]/\1/g;t1'

If they are balanced but may be nested as in blah [blih [1] bluh] asd, then you could use perl‘s recursion regexp operators like:

perl -0777 -pe 's{(\[((?:(?>[^][]+)|(?1))*)\])}{$&=~s/\s//rsg}gse'

Another approach, which would scale to very large files would be to use the (?{...}) perl regexp operator to keep track of the bracket depth like in:

perl -pe 'BEGIN{$/=\8192}s{((?:\[(?{$l++})|\](?{$l--})|[^][\s]+)*)(\s+)}
  {"$1".($l>0?"":$2)}gse'

Actually, you can also process the input one character at a time like:

perl -pe 'BEGIN{$/=\1}if($l>0&&/\s/){$_=""}elsif($_ eq"["){$l++}elsif($_ eq"]"){$l--}'

That approach can be implemented with POSIX tools:

od -A n -vt u1 |
  tr -cs 0-9 '[\n*]' |
  awk 'BEGIN{b[32]=""; b[10]=""; b[12]=""} # add more for every blank
       !NF{next}; l>0 && $0 in b {next}
       $0 == "91" {l++}; $0 == "93" {l--}
       {printf "%c", $0}'

With sed (assuming no newline inside the [...]):

sed -e 's/_/_u/g;:1' -e 's/\(\[[^][]*\)\[\([^][]*\)]/\1_o\2_c/g;t1' \
    -e :2 -e 's/\(\[[^]]*\)[[:space:]]/\1/g;t2' \
    -e 's/_c/]/g;s/_o/[/g;s/_u/_/g'

Are considered white space above any horizontal (SPC, TAB) or vertical (NL, CR, VT, FF…) spacing character in the ASCII charset. Depending on your locale, others might get included.

Solution 2

Perl 5.14 solution (which is shorter and IMO easier to read—especially if you format it over multiple lines in a file, instead of as a one-liner)

perl -pE 's{(\[ .*? \])}{$1 =~ y/ //dr}gex'

That works because in 5.14, the regular expression engine is re-entrant. Here it is, expanded out and commented:

s{
    (\[ .*? \])         # search for [ ... ] block, capture (as $1)
}{
    $1 =~ y/ //dr       # delete spaces. you could add in other whitespace here, too
                        # d = delete; r = return result instead of modifying $1
}gex; # g = global (all [ ... ] blocks), e = replacement is perl code, x = allow extended regex

Solution 3

Perl solution:

perl -pe 's/(\[[^]]*?)\s([^][]*\])/$1$2/ while /\[[^]]*?\s[^][]*\]/'

Note: Use and implement solution 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply