What command(s) will feed a tab-delimited text file and cut each line to 80 characters?

I’ve got multiple-line text files of (sometimes) tab-delimited data. I’d like to output the file so I can glance over it – so I’d like to only see the first 80 characters of each line (I designed the text file to put the important stuff first on each line).

I’d thought I could use cat to read each line of the file, and send each line to the next command in a pipe:

cat tabfile | cut -c -80

But that seemed broken. I tried monkeying around, and grep appeared to work – but then I found out that, no it didn’t (not every line in the file had 80+ characters) – it appears tabs are counted as single characters by cut.

I tried:

cat tabfile | tr \t \040 | cut -c -80

Even though that would mangle my data a bit, by eliminating the white-space readability. But that didn’t work. Neither did:

cat tabfile | tr \011 \040 | cut -c -80

Maybe I’m using tr wrong? I’ve had trouble with tr before, wanting to remove multiple spaces (appears the version of tr that I have access to on this machine has an -s option for squeezing down multiple characters – I may need to play with it more)

I’m sure if I messed around I could use perl, awk or sed, or something to do this.

However, I’d like a solution that uses (POSIX?) regular commands, so that it’s as portable as possible. If I do end up using tr, I’d probably eventually try turning tabs into characters, maybe do a calculation, cut on the calculation, and then turn those characters back into tabs for the output.

It doesn’t need to be a single line / entered directly on the command line – a script is fine.

More info on tab-files:

I use tab to break fields, because someday I may want to import data into some other program. So I tend to have only one tab between pieces of content. But I also use tabs to align things with vertical columns, to aid in readability when looking at the plain text file. Which means for some pieces of text I pad the end of the content with spaces until I get to where the tab will work in lining up the next field with the ones above and below it.

DarkTurquoise           #00CED1         Seas, Skies, Rowboats   Nature
MediumSpringGreen       #00FA9A         Useful for trees        Magic  
Lime                    #00FF00         Only for use on spring chickens and fru$

Here is Solutions:

We have many solutions to this problem, But we recommend you to use the first solution because it is tested & true solution that will 100% work for you.

Solution 1

I think you’re looking for expand and/or unexpand. It seems you’re trying to ensure a \tab width counts as 8 chars rather than the single one. fold will do that as well, but it will wrap its input to the next line rather than truncating it. I think you want:

expand < input | cut -c -80

expand and unexpand are both POSIX specified:

  • The expand utility shall write files or the standard input to the standard output with \tab characters replaced with one or more space characters needed to pad to the next tab stop. Any backspace characters shall be copied to the output and cause the column position count for tab stop calculations to be decremented; the column position count shall not be decremented below zero.

Pretty simple. So, here’s a look at what this does:

unset c i; set --;                                                             
until [ "$((i+=1))" -gt 10 ]; do set -- "[email protected]" "$i" "$i"; done                      
for c in 'tr \\t \ ' expand;  do eval '                                           
    { printf "%*s\t" "[email protected]"; echo; } | 
      tee /dev/fd/2 |'"$c"'| { 
      tee /dev/fd/3 | wc -c >&2; } 3>&1 |
      tee /dev/fd/2 | cut -c -80'

The until loop at top gets a set of data like…

1 1 2 2 3 3 ...

It printfs this with the %*s arg padding flag so for each of those in the set printf will pad with as many spaces as are in the number of the argument. To each one it appends a \tab character.

All of the tees are used to show the effects of each filter as it is applied.

And the effects are these:

1        2        3        4        5        6        7        8                9               10
1  2   3    4     5      6       7        8         9         10 
1  2   3    4     5      6       7        8         9         10 
1        2        3        4        5        6        7        8                9               10
1        2        3        4        5        6        7        8                9               10 
1        2        3        4        5        6        7        8                

Those rows are lined up in two sets like…

  1. output of printf ...; echo
  2. output of tr ... or expand
  3. output of cut
  4. output of wc

The top four rows are the results of the tr filter – in which each \tab is converted to a single space.

And the bottom four the results of the expand chain.

Solution 2

Since the tabs are more for alignment than delimitation, one way could be to use column and then cut:

column -s '\t' -t <some-file | cut -c -80

It seems column is not POSIX. It is part of the BSD utils on Ubuntu, so I assume it is fairly cross platform.

Solution 3

Don’s suggestion in comments was a good start.

This is what I needed to make it (mostly) work:

pr +1 -1 -t -m -l1000 -w 80 tabfile

The -m was needed to make the -w flag take effect on a single column. The man page could use some re-writing to indicate that.

When trying a workaround, I found that pr outputs \t characters, so feeding its results to cut resulted in the same problem.

-1 (the column flag) specifically says in the man page:

This option should not be used with -m.

However, without this option pr truncates lines willy-nilly, at much shorter than specified length.

pr also inserts a space before (or after?) every word in a field (i.e. every place I have a single space, has two after processing). If there are too many words, the inserted spaces ignore the -w restriction (creating wrap-around). But, curiously, otherwise-non-tab-delimited (i.e. whitespace arranged) ‘columns’ stay lined up.

Solution 4

Using awk:

awk '{ $0 = substr($0, 1, 80) }1' file

Based on Chris Down’s answer here.

Solution 5

One utility that should be truly display-width aware is fold: unfortunately, it doesn’t seem to have an option to discard instead of wrap. Although it’s probably horribly inefficient, you could however do something like

while read -r line; do fold -w80 <<< "$line" | head -n1; done < file

Note: Use and implement solution 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply