How can I extract/change lines in a text file whose data are separated into fields?

How can I manipulate field-based data from the commandline? For example

  • How can I print only lines whose Nth field is foo?
  • How can I print only lines whose Nth field isn’t foo?
  • How can I print only lines whose Nth field matches foo?
  • How can I change field N to foo?

Is there a standard approach or toolset that facilitates manipulating field-based data on *nix systems?

Here is Solutions:

We have many solutions to this problem, But we recommend you to use the first solution because it is tested & true solution that will 100% work for you.

Solution 1

There are two basic approaches one can use when dealing with fields: i) use a tool that understands fields; ii) use a regular expression. Of the two, the former is usually both more robust and simpler.

Many of the commonly available tools on *nix are either explicitly designed to deal with fields or have nifty tricks to facilitate it.

1. Use a tool that understands fields

1.1 awk

The classic tool here is awk. It will automatically split each input line into fields (the field separator is whitespace by default but can be changed using the -F flag) and the fields are then available to the awk script as $n where n is the field number. The 1st field is $1, the second $2 etc.

  • Print lines whose 3rd field is foo.

    awk '$3=="foo"' file
    

    Changing the delimiter to :

    awk -F":" '$3=="foo"' file
    

    The default action of awk is to print. Therefore the commands above will print all lines whose 3rd field is foo. When using -F, you can set arbitrary field separators, and even use regular expressions.

  • How can I print only lines whose 3rd field isn’t foo?

    awk '$3!="foo"' file
    
  • How can I print only lines whose 3rd field matches foo?

    If you’re just looking for fields that match a pattern (for example, foo matches foobar), use ~ instead of ==:

    awk '$3~/foo/' file
    
  • How can I print only lines whose 3rd field doesn’t match foo?

    awk '$3!~/foo/' file
    
  • How can I change the 3rd field to foo?

    awk '$3="foo"' file
    

1.2 Perl

Another choice is perl one-liners. Like awk, Perl is a full-featured scripting language but can also be run as a commandline program taking a script as input. Its behavior is modified by commandline switches, the most relevant of which for this question are:

  • -e : the script that perl should run;
  • -n : read the input file line by line;
  • -p : print each input line after applying the script given by -e;
  • -l : remove trailing newlines from each input line and add a newline to each print call;
  • -a : awk-mode, split each input line into the array @F;
  • -F : the field separator for -a.

An important difference with awk is that perl‘s -a switch splits files into an array. In Perl, arrays start at 0, not 1. This means that the 2nd field is actually $F[1] and not $F[2]. With all this in mind, the perl equivalents of the above are:

  • Print lines whose 3rd field is foo.

    perl -ane 'print if $F[2] eq "foo"' file
    

    Changing the delimiter to :

    perl -F":" -ane 'print if $F[2] eq "foo"' file
    

    Unlike awk, perl can’t use regular expressions as field delimiters. They need to be a specific character or string.

  • How can I print only lines whose 3rd field isn’t foo?

    perl -ane 'print unless $F[2] eq "foo"' file
    
  • How can I print only lines whose 3rd field matches foo?

    perl -ane 'print if $F[2]=~/foo/' file
    
  • How can I print only lines whose 3rd field doesn’t match foo?

    perl -lane 'print unless $F[2]=~/foo/' file
    
  • How can I change the 3rd field to foo?

    This one is a bit more cumbersome in Perl. The usual approach is to change the value in the @F array and then print the array. With simple space-separated files, this is easy:

    perl -lane '$F[2]="foo"; print "@F"' file
    

    With a different delimiter, you will need to join the array. Otherwise, it will be printed space-separated:

    perl -F: -lane '$F[2]="foo"; print join ":",@F' file
    

2. Use regular expressions

The idea here is to use a regular expression (“regex” for short) that defines the position of the target string in the line. For example, in a file whose fields are separated by :, we can find the 2nd field by matching everything up to the 1st : (the 1st field) and then looking for the second:

^[^:]*:[^:]*:

This regex means:

  • ^ : the beginning of the line;
  • [^] : a negated character class. [^:] means “anything but :“;
  • * : 0 or more of the previous pattern;
  • : : a literal :;

Taken together, this means that the first [^:]* is the first field and the second is the second field. Obviously, this is not very practical if you’re looking for the 14th field but it can be useful for simpler things. So, how do we implement this to manipulate our data? There are various tools that can do this; in these examples I will use sed but you could do very similar things with awk, perl or python.

  • How can I print only lines whose 2nd field is foo?

    sed -n '/^[^:]*:foo:/p' file
    

    The -n suppresses normal output and the /regex/p means “print any lines that the regex matched.

  • How can I print only lines whose 2nd field isn’t foo?

    sed '/^[^:]*:foo:/d' file
    

    The logical inverse of the above. Here, the /regex/d means “delete any lines that the regex matches.

  • How can I print only lines whose 2nd field matches foo?

    sed -n '/^[^:]*:[^:]*foo/p' file
    
  • How can I print only lines whose 2nd field doesn’t match foo?

    sed '/^[^:]*:[^:]*foo/d' file
    
  • How can I change the 2nd field to foo?

    sed 's/\([^:]*:\)[^:]*/\1foo/' file 
    

    Or, since sed substitution can directly address a patterns occurrence by its repetition with a simple numeric flag:

    sed 's/[^:]*/foo/2' file
    

Note: Use and implement solution 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply