Unix Truncate Column in csv file

How do I truncate column “test10” to 5 characters from Unix command line?

From this

test1,test2,test3,test4,test10,test11,test12,test17
rh,mbn,ccc,khj,ee3 eeeeeEeee ee$eeee e.eeeee2eeeee5eeeeeeee,a2,3,u
hyt,bb,mb,khj,R ee3ee eeEeee ee$eeee e.eeeee2eeeee5eeeeeeee,a,5,r
mbn,htr,ccc,fdf,F1ee eeeeEeee ee$eeee e.eeeee2eeeee5eeeeeeee,a,e,r

To this

test1,test2,test3,test4,test10,test11,test12,test17
rh,mbn,ccc,khj,ee3 e,a2,3,u
hyt,bb,mb,khj,R ee3,a,5,r
mbn,htr,ccc,fdf,F1ee ,a,e,r

Here is Solutions:

We have many solutions to this problem, But we recommend you to use the first solution because it is tested & true solution that will 100% work for you.

Solution 1

If your file really is as simple as your example, you can do one of:

  • awk

    $ awk -F, -vOFS=, 'NR>1{$5=substr($5,1,5)}1' file 
    test1,test2,test3,test4,test10,test11,test12,test17
    rh,mbn,ccc,khj,ee3 e,a2,3,u
    hyt,bb,mb,khj,R ee3,a,5,r
    mbn,htr,ccc,fdf,F1ee ,a,e,r
    

    Explanation

    The -F, sets the input field separator to , and the -vOFS=, sets the variable OFS (the output field separator) to ,. NR is the current line number, so the script above will change the 5th field to a 5-character substring of itself. The lone 1 is awk shorthand for “print this line”.

  • perl

    $ perl -F, -lane '$F[4]=~s/(.{5}).*/$1/ if $.>1; print join ",", @F' file 
    test1,test2,test3,test4,test10,test11,test12,test17
    rh,mbn,ccc,khj,ee3 e,a2,3,u
    hyt,bb,mb,khj,R ee3,a,5,r
    mbn,htr,ccc,fdf,F1ee ,a,e,r
    

    Explanation

    The -a makes perl act like awk and split its input lines on the character given by -F and saves them as elements of the array @F. We then remove all but the 1st 5 characters of the 5th field (they start counting at 0) and then print the resulting @F array joined with commas.

  • sed

    $ sed  -E '1!s/(([^,]+,){4}[^,]{5,5})[^,]*,/\1,/' file
    test1,test2,test3,test4,test10,test11,test12,test17
    rh,mbn,ccc,khj,ee3 e,a2,3,u
    hyt,bb,mb,khj,R ee3,a,5,r
    mbn,htr,ccc,fdf,F1ee ,a,e,r
    

    Explanation

    This is the substitution operator whose general format is s/original.replacement/. The 1! means “don’t do this for the 1st line”. The regular expression matches a set of non-, followed by a , 4 times (([^,]+,){4}), then any 5 non-, characters ([^,]{5})—these are the 1st 5 of the 5th field—and then anything else until the end of the field ([^,]+,). All this is replaced with the first part of the line, effectively truncating the field.

Solution 2

Using awk:

awk -F , 'BEGIN { OFS = FS } NR > 1 { $5 = substr($5,1,5) }; 1' data.csv

The -F flag sets the input field separator, and the BEGIN block sets the output field separator to whatever the input field separator is (a comma).

If the ordinal number of the current record (NR) is greater than one (i.e. we’ve passed the header line), then the substr() function will truncate the fifth field (column) to at most five characters. This avoids modifying the first line of the input data.

The lone 1 will cause awk to print the (possibly) modified record (line) to standard output.

Note: Use and implement solution 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply