Regular Expressions   «Prev 

Splitting a Mail header using Perl


Here is an example of a mail header:
From wew@bearnet.com Fri Jun 27 20:31:48 1997
Return-Path: <wew@bearnet.com>
Received: looloo.bearnet.com (207.55.144.29)
 by luna.bearnet.com with SMTP;
 27 Jun 1997 20:31:47 -0000
Date: Fri, 27 Jun 1997 13:31:42 -0700 (PDT)
From: wew@bearnet.com
To: You There <you@overthere.com>
Message-Id:
 <199706272031.NAA08953@luna.bearnet.com>
Subject: Welcome to the world of email!


The body of the message goes here, after one blank line.
All Internet email is formatted like this, according to RFC-821 (SMTP) and RFC-822 (Internet Email).
Here, then, is an easy way to split a mail header.
Regular Expressions
while(<>) {
 chomp;
 last unless $_; 
 next unless /^\w*:/; 
 ($lhs, $rhs) = split /:\s*/;          
 $headers{uc $lhs}  = $rhs;
}

  1. The line last unless $_; ends the loop at the first blank line.
    The last statement tells a looping structure to go to the last possible increment of the loop, complete the cycle, then exit. We will be looking at last in more detail in Module 5.
  2. The line next unless /^\w*:/; skips old Unix-style headers that do not have a colon.
    The next statement tells a looping structure to skip the remaining steps in a given cycle of the loop and go to the next increment.
    We will be looking at next in more detail in Module 5.
  3. The %headers hash will get the mail headers (except duplicate header lines).
Now you can easily do something like this:
print "on $headers{DATE},
   $headers{FROM} said: . . . \n";

The following simple program prints the name, home directory, and login shell of all the users on a Unix system:
 
#!/usr/bin/perl -w

my ($lhsogin, $passwd, $uid, $gid,
  $gcos, $home, $shell);
open(PASSWD, '</etc/passwd');
while (<PASSWD>) {
 chomp;
 ($lhsogin, $passwd, $uid, $gid, $gcos,
     $home, $shell) = split /:/;
 print "$lhsogin ($gcos): UID: $uid,
     HOME: $home, SHELL: $shell\n";
}   
The /etc/passwd file is a colon-delimited list of all the user-related information for each user on a Unix system (the password is only one component of that information, and it is one-way encoded so it can not be read anyway).