Can a single pipeline entry view the whole pipeline?

I have a function that can generate either tabular output or json ouput. I know that the conventional way to handle this would be to use a parameter like this:

$ func -j | jq .firstField
$ func -t | awk '{print $1}'

I would like to “look ahead” and infer the output type based on the next stage in the pipeline. The ideal usage would be simply:

$ func | jq .firstField
$ func | awk '{print $1}'

I imagine that somewhere within func I would check the 0’th argument to the next stage in the pipeline and see if it contains a j. I’d produce json if so and tables otherwise.

Does bash allow such voodoo? If so, how?

Here is Solutions:

We have many solutions to this problem, But we recommend you to use the first solution because it is tested & true solution that will 100% work for you.

Solution 1

There is no formal metadata or query API associated with a pipeline,
beyond what may be salvaged from the process tree via process tools or
from digging around in /proc type filesystems, should those exist. The
parent shell will (probably) have the complete pipeline somewhere in
memory and will know the various child processes involved though again
there is no API by which an arbitrary cat of the (pointless, except as
an example) pipeline cat | cat | cat | ... would know which cat it
is in that pipeline and therefore who its peers are.

% cat | cat -b | cat -e | cat -n

is more useful as with unique flags a human will have an easier
time of telling which is which; pstree(1) in another terminal for
example may show

 |     \-+= 35276 jhqdoe -zsh (zsh)
 |       |--- 44661 jhqdoe cat -n
 |       |--- 03968 jhqdoe cat -b
 |       |--- 96165 jhqdoe cat -e
 |       \--= 26975 jhqdoe cat

but this would not tell us that cat -e pipes to cat -n, only
that the bag of cats all belong to the process group of the parent
shell 35276.

% ps ao ppid,pid,command | grep '[ ]cat'
35276 44661 cat -n
35276 96165 cat -e
35276  3968 cat -b
35276 26975 cat

If the system you are on has /proc or commands to inspect what pipes
or descriptors of a pid are wired up to what you may be able to figure
out what is connected to what in a process group that a process belongs
to. For example over on linux with lsof and a similar cat pipeline
running, the cat -e and cat -n commands can be linked as they both
share the pipe 14301040:

-bash-4.2$ lsof -p 23591 | grep pipe
cat     23591 jhqdoe    0r  FIFO    0,9       0t0 14301039 pipe
cat     23591 jhqdoe    1w  FIFO    0,9       0t0 14301040 pipe
-bash-4.2$ lsof -p 23592 | grep pipe
cat     23592 jhqdoe    0r  FIFO    0,9       0t0 14301040 pipe

so while this information may be available it may take a bunch of
digging around and reconstructing with unportable tools to figure out.

A parent shell could perhaps offer a means to rewrite the pipeline after
it has been input, though the ZSH hook function preexec does not
appear to offer any means of rewriting the command to be run. (Such a
feature may be similar to how LISP macros let a programmer rework the
code.) A parent shell might also offer some sort of API child processes
could use to inspect the pipeline…but these sorts of additions would
need to be written into the shell.

However one could construct a complex pipeline:

func | ( cd ... && ... | ( ... | awk ... ) )

in which case your func would either fail to find awk and react
(maybe) wrongly, or your process pipeline search feature would need to
recurse through all the commands of the next pipeline element and in
that case the awk might be unrelated to func and not need
modification on the fly. Or you could forget that you setup this
behaviour and the awk could be incorrectly modified, which may lead to
hard-to-find bugs…

Solution 2

I’ve was able to do this, at least on linux. Here is a script that demonstrates it: https://gist.github.com/MatrixManAtYrService/790a4a058bc841b0ceb2eb0263fb5d88

Example usage:

❯ cat -b | ./luigi | jq .
[
  {
    "pid": "20832",
    "name": "cat -b",
    "node": {
      "write": "5157339",
      "read": null
    }
  },
  {
    "pid": "20833",
    "name": "bash ./luigi",
    "node": {
      "write": "5157341",
      "read": "5157339"
    }
  },
  {
    "pid": "20834",
    "name": "jq .",
    "node": {
      "write": null,
      "read": "5157341"
    }
  }
]

Note: Use and implement solution 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply