So, what can you do with a process ID?

For the longest time, the only thing I knew how to do with the process ID (PID) for a running process was kill it. It turns out, if you have a PID, you can gather a ton of info about what that process is up to!

I'm sure these commands will seem blindingly obvious to someone with more of sysadmin background. This is all coming from the perspective of someone who occasionally hops on to a machine and has a lot of trouble navigating around and exploring what the machine is up to and why. If that describes you too, I hope this is useful!

Let's start a process and grab its PID:

> python3 -m http.server &
[1] 41186
> PID=$!
> echo "PID is ${PID}"
PID is 41186

ps -f $PID

ps -f $PID gives an amazing high-level overview of what a process is doing. Being able to see the PID of the parent, uptime, and the actual command that's being run can really help debugging work.

> ps -f 41186
501 41186 35891   0 12:33PM ttys001    0:00.16 /usr/local/Cellar/python@3.10/3.10.6_1/Frameworks/Python.framework/Versions/3.10/Resources/ -m http.server

lsof -p $PID -P

lsof means "list open files." If you provide a PID, you can see all of the files, pipes, network sockets, and devices that the process is using. If you add -P, it will show port numbers rather than port names.

> lsof -p $PID -P
COMMAND   PID USER   FD    TYPE            DEVICE SIZE/OFF                NODE NAME
Python  41186 will  cwd     DIR               1,4      192            51860960 /Users/will/tmp
Python  41186 will  txt     REG               1,4    49400            78478046 /usr/local/Cellar/python@3.10/3.10.6_1/Frameworks/Python.framework/Versions/3.10/Resources/
# ...omitting a long list of text files
Python  41186 will  txt     REG               1,4  2177216 1152921500312782996 /usr/lib/dyld
Python  41186 will    0u    CHR              16,1  0t12464                1095 /dev/ttys001
Python  41186 will    1u    CHR              16,1  0t12464                1095 /dev/ttys001
Python  41186 will    2u    CHR              16,1  0t12464                1095 /dev/ttys001
Python  41186 will    3u  systm 0x60e5bd939a75061      0t0                     [ctl id 6 unit 50]
Python  41186 will    4u   IPv6 0x60e5bd936095681      0t0                 TCP *:8000 (LISTEN)
Python  41186 will    6u   unix 0x60e5bde017a4a71      0t0                     ->0x60e5bde017a1d91

lsof supports a ton of amazing options. Do you have a port and want to get the PID of what's using it? Try lsof -i :8000! Or you can use -i to keep an eye on network traffic for a process: lsof -i -r 2 | awk '$2 == 41186'

Extra trick: if you're on Linux, pwdx can report the current working directory of a process.

top -pid $PID

Want to keep an eye on CPU and memory of a process? top can narrow in on a single process if you provide a PID.

> top -pid 41186
41186  Python       0.0  00:00.34 2    0    19   10M  0B   0B

dtruss -p $PID or strace -p $PID

I'm a complete novice when it comes to tracing, but it's something I'm trying to learn more about! On macs, you can use dtruss to follow the system calls of a running process. It's cool to see what a process is up to in detail! (strace is the linux equivalent)

> dtruss -p 41186
stat64("/Users/will/tmp/index.html\0", 0x70000A3F8678, 0x0)        = -1 2
stat64("/Users/will/tmp/index.htm\0", 0x70000A3F8678, 0x0)         = -1 2
open_nocancel("/Users/will/tmp/\0", 0x1100004, 0x0)              = 7 0
fstatfs64(0x7, 0x70000A3F7E80, 0x0)              = 0 0
getdirentries64(0x7, 0x7F95E7058E00, 0x2000)             = 240 0
close_nocancel(0x7)              = 0 0

kill -HUP $PID

If you want to communicate with a running process, you can send that process signals! Deciding what to do with (non-SIGKILL) signals is up to the process, so you can add code to respond to a signal however you'd like and enable some simple inter-process communication.

const interval = setInterval(() => null, 60_000); // keeps process from exiting
process.on("SIGHUP", () => {
  console.log("received SIGHUP :)");
process.on("SIGTERM", () => {
  console.log("received SIGTERM. Stopping work");
  clearInterval(interval); // clearing interval means we're done with work so Node can exit
> node ./listen_to_signals.js &
[1] 51019
> kill -HUP 51019
received SIGHUP :)
> kill -TERM 51019
received SIGTERM. Shutting down
[1]  + 51019 done       node ./listen_to_signals.js

wait $PID

If you're writing a script that starts a process, you can wait for that process to complete using wait. If you provide the PID, wait will return with the exit code from that process. This can enable some easy parallelization in shell scripts.

sleep 1 &
false & # if we have a failing command like this one, we want this whole script to fail!
true &
for pid in "${pids[@]}"; do
    wait "$pid" || status=$?
exit "${status}"