Splitting a Mail header using Perl
Here is an example of a mail header:
From wew@bearnet.com Fri Jun 27 20:31:48 1997
Return-Path: <wew@bearnet.com>
Received: looloo.bearnet.com (207.55.144.29)
by luna.bearnet.com with SMTP;
27 Jun 1997 20:31:47 -0000
Date: Fri, 27 Jun 1997 13:31:42 -0700 (PDT)
From: wew@bearnet.com
To: You There <you@overthere.com>
Message-Id:
<199706272031.NAA08953@luna.bearnet.com>
Subject: Welcome to the world of email!
The body of the message goes here, after one blank line.
All Internet email is formatted like this, according to RFC-821 (SMTP) and RFC-822 (Internet Email).
Here, then, is an easy way to split a mail header.
Regular Expressions
while(<>) {
chomp;
last unless $_;
next unless /^\w*:/;
($lhs, $rhs) = split /:\s*/;
$headers{uc $lhs} = $rhs;
}
-
The line
last unless $_;
ends the loop at the first blank line.
The last
statement tells a looping structure to go to the last possible increment of the loop, complete the cycle, then exit. We will be looking at last
in more detail in Module 5.
- The line
next unless /^\w*:/;
skips old Unix-style headers that do not have a colon.
The next
statement tells a looping structure to skip the remaining steps in a given cycle of the loop and go to the next increment.
We will be looking at next
in more detail in Module 5.
- The
%headers
hash will get the mail headers (except duplicate header lines).
Now you can easily do something like this:
print "on $headers{DATE},
$headers{FROM} said: . . . \n";
The following simple program prints the name, home directory, and login shell of all the users on a Unix system:
#!/usr/bin/perl -w
my ($lhsogin, $passwd, $uid, $gid,
$gcos, $home, $shell);
open(PASSWD, '</etc/passwd');
while (<PASSWD>) {
chomp;
($lhsogin, $passwd, $uid, $gid, $gcos,
$home, $shell) = split /:/;
print "$lhsogin ($gcos): UID: $uid,
HOME: $home, SHELL: $shell\n";
}
The /etc/passwd
file is a colon-delimited list of all the user-related information for each user on a Unix system (the password is only one component of that information, and it is one-way encoded so it can not be read anyway).