Wednesday, November 9, 2016

Handling messages forwarded as attachment by Outlook with MIME::Parser in Perl

Outlook sends emails that are forwarded as attachment with an .eml extension and the content-type set to application/octet-stream. According to RFC 1341, message/rfc822 should be used. The Perl module, MIME::Parser will automatically parse message/rfc822 attachments, which is useful if you want to do automated processing on an email and its attachments. Outlook's use of application/octet-stream breaks this.

It is possible to fix this. I initially attempted to change the contetn-type and rerun the parser on the file, but that resulted in an empty part. The problem is that according to RFC1341, the Content-Transfer-Encoding field must be 7bit, 8bit or binary for message/rfc822 (Outlook uses base64). Once this is corrected, it works.

A Perl sample: (in this case, the email forwarded as attachment is the second attachment)
#!/usr/bin/perl

use warnings qw(all);
use MIME::Parser;
use strict;

my $fn = '/tmp/input_file.eml';

my $parser = new MIME::Parser;

$parser->output_to_core(1); # Disable the creation of temporary files

my $entity = $parser->parse_open($fn);
$entity->dump_skeleton;   # View initial structure

# Fix the fields
$entity->parts(1)->head->replace('Content-Type','message/rfc822');
$entity->parts(1)->head->replace('Content-Transfer-Encoding','7bit');

# Get encoded message
my $message = $entity->as_string;
#Re-parse
$entity = $parser->parse_data($message);

$entity->dump_skeleton;          # show final structure

\

Here is a general function to handle these. It uses undocumented interfaces, since there does not seem to be a documented method to replace a part with another one.
sub handle_forwarded_messages
{
   my($parser,$entity, undef) = @_;
   return undef unless ($entity && $parser);

   my($part);

   # Recursively process multipart entities, based on number of parts
   if (scalar $entity->parts) # If we have sub-parts
   {
      # Warning, next line uses undocumented interfaces..
      for (my $i = 0; $i <= $#{$entity->{ME_Parts}}; $i++) {
         $part = $entity->{ME_Parts}[$i];
         # Warning, next code line uses undocumented interfaces..
         # Replace part with its expanded version... This seems to be the only way
         $entity->{ME_Parts}[$i] = &handle_forwarded_messages($parser,$part);
      }
   } else { # Once we are at a level that does not have sub-parts...
      # Replace forwarded messages with properly expanded versions...
      if ($entity->effective_type eq 'application/octet-stream' &&
              $entity->head->recommended_filename =~ /\.eml$/) {
          $entity->head->replace('Content-Type','message/rfc822');
          $entity->head->replace('Content-Transfer-Encoding','8bit');
          my $entity_tmp = eval { $parser->parse_data($entity->as_string) };
          $entity = $entity_tmp unless ($@ || $parser->results->errors);
          # And see if they have more levels...
          $entity = &handle_forwarded_messages($parser,$entity);
      }
   }
   # Return the processed result
   return $entity;
}