Skip to content

Samia-Hb/Pipex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

pipex banner

πŸ”€ pipex

A faithful recreation of Unix shell piping in C

Language School Norm

Understand pipes, forks, and file descriptors by building them from scratch.


πŸ“‘ Table of Contents

  1. Overview
  2. Demo
  3. Features
  4. How It Works
  5. Project Structure
  6. Build
  7. Usage
  8. Error Handling
  9. OS Concepts Deep-Dive
  10. Allowed Functions (42 Subject)
  11. Testing
  12. Key Concepts Mastered
  13. Author

πŸ” Overview

pipex is a C program that replicates the behavior of the Unix shell pipeline:

< infile cmd1 | cmd2 > outfile

Built as part of the 42 School curriculum, this project dives deep into inter-process communication (IPC) by wiring together four fundamental Unix system calls:

System Call Purpose
pipe() Creates a unidirectional kernel buffer connecting two file descriptors
fork() Spawns a child process that inherits the parent's file descriptor table
dup2() Redirects stdin/stdout by replacing a file descriptor with another
execve() Replaces the current process image with a new program

Together, these four primitives are the engine behind every Unix shell pipeline you have ever typed.


🎬 Demo

pipex demo

✨ Features

  • Shell-faithful piping β€” reproduces < infile cmd1 | cmd2 > outfile exactly
  • Two child processes connected through a kernel pipe buffer
  • PATH resolution β€” searches every directory in PATH to locate the command binary
  • Empty command handling β€” gracefully manages blank or whitespace-only commands
  • Permission & access checks β€” validates infile/outfile permissions before executing
  • Descriptive error messages β€” zsh-style output for "command not found" and missing files
  • Clean memory management β€” all allocated arrays and strings are freed before exit
  • Embedded libraries β€” ships with its own libft, ft_printf, and get_next_line

βš™οΈ How It Works

Execution Flow

argv[1]      argv[2]        argv[3]      argv[4]
 infile  ──►  cmd1    ──►   cmd2   ──►  outfile
         (child 1)        (child 2)
              β”‚                β–²
              └──── pipe β”€β”€β”€β”€β”€β”€β”˜
                 pipe_fd[1]  pipe_fd[0]
                  (write)     (read)
  1. The parent process calls pipe() to obtain two file descriptors: pipe_fd[0] (read end) and pipe_fd[1] (write end).
  2. fork() is called twice to create two child processes.
  3. Child 1 (cmd1):
    • Closes pipe_fd[0] (it will only write).
    • Redirects stdout β†’ pipe_fd[1] with dup2().
    • Opens infile and redirects stdin β†’ infile_fd with dup2().
    • Calls execve() to run cmd1.
  4. Child 2 (cmd2):
    • Closes pipe_fd[1] (it will only read).
    • Redirects stdin β†’ pipe_fd[0] with dup2().
    • Opens outfile and redirects stdout β†’ outfile_fd with dup2().
    • Calls execve() to run cmd2.
  5. The parent closes both ends of the pipe and calls waitpid() on both children.

The fork / pipe / dup2 / execve Dance

Parent process
β”‚
β”œβ”€ pipe(pipe_fd)          # kernel allocates read/write FD pair
β”‚
β”œβ”€ fork() ──────────────► Child 1
β”‚                         β”‚  close(pipe_fd[0])
β”‚                         β”‚  dup2(pipe_fd[1], STDOUT)   # write β†’ pipe
β”‚                         β”‚  dup2(infile_fd,  STDIN)    # read  ← infile
β”‚                         β”‚  execve(cmd1, ...)          # become cmd1
β”‚
β”œβ”€ fork() ──────────────► Child 2
β”‚                         β”‚  close(pipe_fd[1])
β”‚                         β”‚  dup2(pipe_fd[0], STDIN)    # read  ← pipe
β”‚                         β”‚  dup2(outfile_fd, STDOUT)   # write β†’ outfile
β”‚                         β”‚  execve(cmd2, ...)          # become cmd2
β”‚
β”œβ”€ close(pipe_fd[0])
β”œβ”€ close(pipe_fd[1])
β”œβ”€ waitpid(child1, ...)
└─ waitpid(child2, ...)

Why close pipe ends in the parent?
If the parent keeps pipe_fd[1] open, cmd2 will never see EOF on its stdin and will hang forever waiting for more data.


πŸ“ Project Structure

pipex/
β”œβ”€β”€ README.md
β”œβ”€β”€ assests/
β”‚   β”œβ”€β”€ baseImage.png
β”‚   └── DemoImage.png
β”œβ”€β”€ subject/                    # 42 project subject PDF
└── Project/
    β”œβ”€β”€ Makefile
    β”œβ”€β”€ pipex.h                 # header β€” includes & prototypes
    β”œβ”€β”€ main.c                  # argument validation + entrypoint
    β”œβ”€β”€ pipex.c                 # core pipe/fork/dup2/execve logic
    β”œβ”€β”€ pipex_utils.c           # execute_commands + norm helpers
    β”œβ”€β”€ parsing.c               # PATH resolution & command lookup
    β”œβ”€β”€ helperFunctions.c       # small utility functions
    └── Include/
        β”œβ”€β”€ libft/              # custom C standard library
        β”œβ”€β”€ ft_printf/          # custom printf implementation
        └── get_next_line/      # custom line reader

πŸ”¨ Build

cd Project
make

This compiles everything (including the embedded libraries) and produces:

./pipex

Clean targets

Command Effect
make clean Remove object files
make fclean Remove object files and the binary
make re Full rebuild from scratch

πŸš€ Usage

./pipex <infile> <cmd1> <cmd2> <outfile>
Argument Description
infile Source file β€” read as stdin for cmd1
cmd1 First command (with optional arguments)
cmd2 Second command (with optional arguments)
outfile Destination file β€” receives stdout of cmd2 (created/truncated)

Examples

# Count lines containing 'error' in a log file
./pipex server.log "grep error" "wc -l" result.txt

# Sort unique words from a text file
./pipex words.txt "cat" "sort -u" sorted.txt

# Extract fields from a CSV
./pipex data.csv "cut -d, -f2" "sort" output.txt

Equivalent shell command:

< infile cmd1 | cmd2 > outfile

Notes on commands & quoting

  • Each command is passed as a single quoted argument on the shell (e.g. "grep -i error").
  • Internally, the program splits on spaces (ft_split(argv[i], ' ')), so complex shell quoting (nested quotes, glob expansions, variable substitutions) is not supported β€” consistent with the 42 subject specification.

🚨 Error Handling

Situation Behaviour
infile does not exist Prints zsh: no such file or directory: <infile> and stops
infile not readable Prints permission error and stops
outfile not writable Prints permission error and stops
Command not found in PATH Prints zsh: command not found: <cmd> for each missing command
PATH missing from environment Reports invalid environment and exits cleanly
Empty command string Handled gracefully β€” passes data through unchanged
Blank infile/outfile argument Detected at startup; reports no such file or directory

πŸ“š OS Concepts Deep-Dive

File Descriptors & Standard Streams

In Unix/Linux, everything is a file β€” regular files, devices, sockets, and pipes all share the same unified I/O interface. Each open resource is represented by an integer called a file descriptor (FD).

Every process starts with three FDs pre-opened:

FD Name Default connection
0 stdin keyboard
1 stdout terminal
2 stderr terminal

dup2(oldfd, newfd) atomically replaces newfd with a copy of oldfd, which is exactly how pipex rewires the standard streams of child processes before calling execve().


Processes & the PCB

When the OS creates a process via fork(), it allocates a Process Control Block (PCB) in the kernel:

Field Description
PID Unique process identifier
PPID Parent's PID
Process state Running / Ready / Blocked / Zombie
Program counter Address of the next instruction
CPU registers Saved context for context switching
Memory maps Code, stack, heap regions
Open file table List of open FDs (including pipe FDs!)
Scheduling info Priority, CPU time used

fork() creates an almost-identical copy of the parent PCB. After execve(), the process image is fully replaced β€” but the file descriptor table survives, which is the key mechanism that makes pipe redirection work.

Scheduling queues

The kernel manages processes in logical queues:

  • Ready queue β€” waiting for CPU time
  • Wait/blocked queue β€” waiting on I/O or an event
  • Priority / multilevel feedback queues β€” advanced schedulers

Common algorithms: FCFS Β· SJF Β· Round Robin Β· Priority Β· Multilevel Feedback Queue


Pipes β€” Named vs Unnamed

What is a pipe?

A pipe is a kernel-managed, in-memory, circular buffer that lets two processes communicate without any disk I/O. It exposes two file descriptors:

[write end fd[1]] ──────────────────► [read end fd[0]]
   (producer writes)    kernel buffer    (consumer reads)

Key properties:

  • Unidirectional β€” data flows in one direction only
  • FIFO β€” bytes are read in the exact order they were written
  • Blocking β€” write() blocks when the buffer is full; read() blocks when it is empty
  • Auto-EOF β€” the read end receives EOF when all write ends are closed (this is why unused pipe ends must always be closed!)

Unnamed vs Named pipes

Unnamed pipe (pipe()) Named pipe / FIFO (mkfifo())
Created by pipe(int fd[2]) mkfifo(path, mode)
Visible in filesystem ❌ No βœ… Yes (appears as a special file)
Usable between unrelated processes ❌ Requires a common ancestor βœ… Any process can open by path
Lifetime Until all FDs are closed Until explicitly deleted
Typical use Parent ↔ child IPC Long-lived inter-process channels

pipex uses unnamed pipes β€” the classic approach for short-lived parent↔child communication.


πŸ“‹ Allowed Functions (42 Subject)

The 42 subject restricts which library functions may be used. The key system calls exercised in this project are:

open  close  read  write        # file I/O
malloc  free                    # memory management
perror  strerror                # error reporting
access                          # permission checking
dup  dup2                       # FD duplication / redirection
pipe                            # IPC pipe creation
fork  wait  waitpid             # process creation & synchronization
execve                          # process image replacement

πŸ§ͺ Testing

Basic correctness check

Run pipex and compare its output directly against the shell:

echo "hello world" > infile

./pipex infile "cat" "wc -w" out_pipex
< infile cat | wc -w > out_shell

diff out_pipex out_shell   # should produce no output (files are identical)

More test cases

# grep + wc
./pipex /etc/passwd "grep root" "wc -l" result.txt
< /etc/passwd grep root | wc -l

# cat + head
./pipex /etc/hosts "cat" "head -3" result.txt
< /etc/hosts cat | head -3

# Command not found (verify error message)
./pipex infile "nonexistentcmd" "wc -l" result.txt

# Missing infile (verify error message)
./pipex no_such_file "cat" "wc -l" result.txt

# Permission denied on outfile
touch locked && chmod 000 locked
./pipex infile "cat" "wc -l" locked

Automated diff loop

for cmd1 in "cat" "grep a" "sort"; do
  for cmd2 in "wc -l" "wc -w" "head -1"; do
    ./pipex infile "$cmd1" "$cmd2" /tmp/out_pipex
    eval "< infile $cmd1 | $cmd2 > /tmp/out_shell"
    diff /tmp/out_pipex /tmp/out_shell \
      && echo "PASS: $cmd1 | $cmd2" \
      || echo "FAIL: $cmd1 | $cmd2"
  done
done

πŸŽ“ Key Concepts Mastered

By completing this project, the following low-level Unix concepts were implemented from scratch:

  • Inter-process communication (IPC) via anonymous pipes
  • File descriptor manipulation with dup2() for I/O redirection
  • Process forking and the parent/child relationship model
  • PATH environment variable parsing for dynamic binary resolution
  • execve() semantics β€” the process image replacement model
  • Zombie prevention with waitpid()
  • Deadlock avoidance β€” closing all unused pipe ends so EOF propagates correctly
  • Resource cleanup β€” freeing every heap allocation before process exit

πŸ‘€ Author

Samia-Hb
42 Network/ 1337 School Student

GitHub

About

Pipex is a Unix pipeline reimplementation in C, focusing on process creation, file descriptors, and inter-process communication using pipes.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors