Why does cut fail with bash and not zsh?

I create a file with tab-delimited fields.

echo foo$'\t'bar$'\t'baz$'\n'foo$'\t'bar$'\t'baz > input

I have the following script named zsh.sh

#!/usr/bin/env zsh
while read line; do
    <<<$line cut -f 2
done < "$1"

I test it.

$ ./zsh.sh input
bar
bar

This works fine. However, when I change the first line to invoke bash instead, it fails.

$ ./bash.sh input
foo bar baz
foo bar baz

Why does this fail with bash and work with zsh?

Additional troubleshooting

  • Using direct paths in the shebang instead of env produces the same behaviour.
  • Piping with echo instead of using the here-string <<<$line also produces the same behaviour. i.e. echo $line | cut -f 2.
  • Using awk instead of cut works for both shells. i.e. <<<$line awk '{print $2}'.

Here is Solutions:

We have many solutions to this problem, But we recommend you to use the first solution because it is tested & true solution that will 100% work for you.

Solution 1

That’s because in <<< $line, bash versions prior to 4.4 did word splitting, (though not globbing) on $line when not quoted there and then joined the resulting words with the space character (and put that in a temporary file followed by a newline character and make that the stdin of cut).

$ a=a,b,,c bash-4.3 -c 'IFS=","; sed -n l <<< $a'
a b  c$

tab happens to be in the default value of $IFS:

$ a=$'a\tb'  bash-4.3 -c 'sed -n l <<< $a'
a b$

The solution with bash is to quote the variable.

$ a=$'a\tb' bash -c 'sed -n l <<< "$a"'
a\tb$

Note that it’s the only shell that does that. zsh (where <<< comes from, inspired by Byron Rakitzis’s implementation of rc), ksh93, mksh and yash which also support <<< don’t do it.

When it comes to arrays, mksh, yash and zsh join on the first character of $IFS, bash and ksh93 on space.

$ mksh -c 'a=(1 2); IFS=:; sed -n l <<< "${a[@]}"'
1:2$
$ yash -c 'a=(1 2); IFS=:; sed -n l <<< "${a[@]}"'
1:2$
$ ksh -c 'a=(1 2); IFS=:; sed -n l <<< "${a[@]}"'
1 2$
$ zsh -c 'a=(1 2); IFS=:; sed -n l <<< "${a[@]}"'
1:2$
$ bash -c 'a=(1 2); IFS=:; sed -n l <<< "${a[@]}"'
1 2$

There’s a difference between zsh/yash and mksh (version R52 at least) when $IFS is empty:

$ mksh -c 'a=(1 2); IFS=; sed -n l <<< "${a[@]}"'
1 2$
$ zsh -c 'a=(1 2); IFS=; sed -n l <<< "${a[@]}"'
12$

The behaviour is more consistent across shells when you use "${a[*]}" (except that mksh still has a bug when $IFS is empty).

In echo $line | ..., that’s the usual split+glob operator in all Bourne-like shells but zsh (and the usual problems associated with echo).

Solution 2

What happens is that bash replaces the tabs with spaces. You can avoid this problem by saying "$line" instead, or by explicitly cutting on spaces.

Solution 3

The problem is that you’re not quoting $line. To investigate, change the two scripts so they simply print $line:

#!/usr/bin/env bash
while read line; do
    echo $line
done < "$1"

and

#!/usr/bin/env zsh
while read line; do
    echo $line
done < "$1"

Now, compare their output:

$ bash.sh input 
foo bar baz
foo bar baz
$ zsh.sh input 
foo    bar    baz
foo    bar    baz

As you can see, because you’re not quoting $line, the tabs aren’t interpreted correctly by bash. Zsh seems to deal with that better. Now, cut uses \t as the field delimiter by default. Therefore, since your bash script is eating the tabs (because of the split+glob operator), cut only sees one field and acts accordingly. What you are really running is:

$ echo "foo bar baz" | cut -f 2
foo bar baz

So, to get your script to work as expected in both shells, quote your variable:

while read line; do
    <<<"$line" cut -f 2
done < "$1"

Then, both produce the same output:

$ bash.sh input 
bar
bar
$ zsh.sh input 
bar
bar

Solution 4

As has already been answered, a more portable way to use a variable is to quote it:

$ printf '%s\t%s\t%s\n' foo bar baz
foo    bar    baz
$ l="$(printf '%s\t%s\t%s\n' foo bar baz)"
$ <<<$l     sed -n l
foo bar baz$

$ <<<"$l"   sed -n l
foo\tbar\tbaz$

There is a difference of implementation in bash, with the line:

l="$(printf '%s\t%s\t%s\n' foo bar baz)"; <<<$l  sed -n l

This is the result of most shells:

/bin/sh         : foo bar baz$
/bin/b43sh      : foo bar baz$
/bin/bash       : foo bar baz$
/bin/b44sh      : foo\tbar\tbaz$
/bin/y2sh       : foo\tbar\tbaz$
/bin/ksh        : foo\tbar\tbaz$
/bin/ksh93      : foo\tbar\tbaz$
/bin/lksh       : foo\tbar\tbaz$
/bin/mksh       : foo\tbar\tbaz$
/bin/mksh-static: foo\tbar\tbaz$
/usr/bin/ksh    : foo\tbar\tbaz$
/bin/zsh        : foo\tbar\tbaz$
/bin/zsh4       : foo\tbar\tbaz$

Only bash split the variable on the right of <<< when unquoted.
However, that has been corrected on bash version 4.4
That means that the value of $IFS affects the result of <<<.


With the line:

l=(1 2 3); IFS=:; sed -n l <<<"${l[*]}"

All shells use the first character of IFS to join values.

/bin/y2sh       : 1:2:3$
/bin/sh         : 1:2:3$
/bin/b43sh      : 1:2:3$
/bin/b44sh      : 1:2:3$
/bin/bash       : 1:2:3$
/bin/ksh        : 1:2:3$
/bin/ksh93      : 1:2:3$
/bin/lksh       : 1:2:3$
/bin/mksh       : 1:2:3$
/bin/zsh        : 1:2:3$
/bin/zsh4       : 1:2:3$

With "${l[@]}", an space is needed to separate the different arguments, but some shells choose to use the value from IFS (Is that correct?).

/bin/y2sh       : 1:2:3$
/bin/sh         : 1 2 3$
/bin/b43sh      : 1 2 3$
/bin/b44sh      : 1 2 3$
/bin/bash       : 1 2 3$
/bin/ksh        : 1 2 3$
/bin/ksh93      : 1 2 3$
/bin/lksh       : 1:2:3$
/bin/mksh       : 1:2:3$
/bin/zsh        : 1:2:3$
/bin/zsh4       : 1:2:3$

With a null IFS, the values should become joined, as with this line:

a=(1 2 3); IFS=''; sed -n l <<<"${a[*]}"

/bin/y2sh       : 123$
/bin/sh         : 123$
/bin/b43sh      : 123$
/bin/b44sh      : 123$
/bin/bash       : 123$
/bin/ksh        : 123$
/bin/ksh93      : 123$
/bin/lksh       : 1 2 3$
/bin/mksh       : 1 2 3$
/bin/zsh        : 123$
/bin/zsh4       : 123$

But both lksh and mksh fail to do so.

If we change to a list of arguments:

l=(1 2 3); IFS=''; sed -n l <<<"${l[@]}"

/bin/y2sh       : 123$
/bin/sh         : 1 2 3$
/bin/b43sh      : 1 2 3$
/bin/b44sh      : 1 2 3$
/bin/bash       : 1 2 3$
/bin/ksh        : 1 2 3$
/bin/ksh93      : 1 2 3$
/bin/lksh       : 1 2 3$
/bin/mksh       : 1 2 3$
/bin/zsh        : 123$
/bin/zsh4       : 123$

Both yash and zsh fail to keep arguments separated. Is that a bug?

Note: Use and implement solution 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply