REBOL [ Title: "Mime Model" Author: "Brett Handley" Date: 19-Jul-2004 Version: 0.9.9 History: [ 0.9.9 [19-Jul-2004 "Change method of line termination discovery." "Brett Handley"] 0.9.8 [6-Feb-2004 "Changed mime-header-parameters? to handle encoded values." "Brett Handley"] 0.9.7 [25-Jul-2002 "Changed mime-boundary? to use mime-header-parameters?" "Brett Handley"] 0.9.6 [18-Jan-2002 "Small tweak to mime-filename?." "Brett Handley"] 0.9.5 [17-Jan-2002 "Added funciton guess-mime-type." "Brett Handley"] 0.9.4 [16-Jan-2002 "Small bug fix in line termination detection." "Brett Handley"] 0.9.3 [15-Jan-2002 "Minor change to loosen strictness of mime-message?" "Brett Handley"] 0.9.2 [6-Jan-2002 "Minor change to bypass any erroneous unix mbox headers." "Brett Handley"] 0.9.1 [31-Dec-2001 "Minor change - due to change in encodings.r" "Brett Handley"] 0.9.0 [29-April-2001 "Partial rewrite of manipulate-emails.r" "Brett Handley"] ] Purpose: "Faciliates manipulation of MIME structures." Acknowledgement: "Phil Bevan for his patient testing and brave adoption of this script. :)" Comment: { This script was developed and tested on Rebol/View 1.1 THE USEFUL BIT MIME-STRUCTURE? is the function that returns an object model to represent the mime structure of a message. The object model stores the headers of the messages/mime entities as fields of an object. The content field represents the body part of an entity. The content field becomes a block of entities for a multipart entity. Decoding of the content according to content-transfer-encoding is done if I have decoder for it (quoted-printable and base64) are handled. See the model-status field to determine the status of the content. MIME-NAMED-ATTACHMENTS? returns the attachments of a mime message. Returns file-name file-data pairs. It obviously makes some assumptions about how attachments are made. PRINT-MIME-STRUCTURE is a handy little function that prints out to the console the structure of the model returned by MIME-STRUCTURE?. It displays the various content-type headers indented appropriately. MIME-EXPORT-FORWARDED-MESSAGES will take an email that was created by forwarding a number of other emails (from Outlook Express - see my web site for why) and writes them out to a directory as *.eml files. MIME-MULTIPART-SUMMARY? takes a multipart email and returns a block that summarises the parts. LEFT TO TO Provide utility function for extracting content-type header parameters. Review line termination issues as described below. Consider a model -> string lineariser. Ie writing the model out again ..prologue and epilogue are discarded at the moment though. Consider a dialect -> model parser. Dialect for constructing models. Create higher level convienience functions for common uses: saving attachments intepreting form-upload data etc. (if there is anything else...) THE DOCUMENTATION/BACKGROUND BIT See RFC 1521. The network form of a mime message uses CRLF to indicate line breaks. Also, mime allows for binary data within the message. By this I am referring to content-transfer-encoding = "binary". This means that the actual binary data is within the message data and has not be encoded using base64 or whatever to make a 7bit stream. An example of this could be file upload by html forms using POST. The post data resembles a multipart mime body with the file data included as straight binary data. Therefore we should acquire mime message using READ/BINARY. But often with Rebol we acquire email messages using READ. This is a convienience and it works because as far as I know emails do not include straight binary data. If an email has an attachment - the attachment binary data is encoded into a form that is not binary and thus can be sent with SMTP. But to reiterate, if a message has binary in it you have to acquire it using a binary read. I would like these routines to handle the message irrespective of whether it has been acquired via READ or READ/BINARY. Acquiring a message using READ though brings into play the conversion of line breaks from CRLF to the rebol internal representation of a line break which is by default NEWLINE. The consequence of this operation in relation to mime is that the normal sequence of encodings steps has been changed. RFC 1521 Appendix G describes a canoncial encoding model. In summary to encode a mime body one transforms data like so Local-form -> canonical-form -> transfer-encoded-form. So to decode you reverse the sequence. For text/plain the canoncial form represents line breaks as CRLFs whereas the local form (Rebol) represents line breaks with NEWLINE (by default). So by acquiring using a READ a transform has already been applied - line breaks are now represented by NEWLINE. But note this has occurred out of step - it should be the last step. In this routine I assume that for text types this out of step transform does not matter. So I detect if it has taken place and if not I transform line breaks at the same step that a non binary READ would. I'm doing this because it has a bearing on my encoding routines too - they assume a rebol based line break representation. I may need to review this. To cope with the line ending situation I have to pass the detected line ending of the message around the various routines. } Rights: {Copyright (c) Brett Handley 2001 Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, subrights, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.} ] ; Use-script is my local script manager. ; Define it if we are in another environment. if not value? 'use-script [ either value? 'load-thru [ use-script: func[script][do load-thru join http://www.codeconscious.com/rebsite/rebol-library/ script] ][use-script: :do] ] ; Supporting Code use-script %encodings.r ; ; User level functions ; mime-structure?: function [ "Returns an object model that represents the MIME structure of a message." message [string! object! binary!] ][msg][ msg: mime-msg-as-object message if not mime-message? msg [ RETURN none ] RETURN mime-entity msg mime-line-termination? msg ] print-mime-structure: func [ "Describe the mime model - an example of how to traverse the model" message [object!] /indent indent-string [string!] ][ if not indent-string [indent-string: copy ""] print join indent-string message/content-type if mime-multipart? message [ indent-string: join indent-string " " for i 1 length? message/content 1 [ print-mime-structure/indent message/content/:i indent-string ] ] RETURN none ] mime-named-attachments?: function [ [catch] "Returns attachments as name value pairs - works off name parameter of content-type." message [object!] /naming naming-func [any-function!] "Specify a function to convert the name parameter into a filename." /typing "Includes mime typing information (pairs become triples)." ][result name entity][ if not naming [naming-func: :to-file] if not mime-multipart? message [ throw make error! "Only works on multipart messages." ] result: copy [] for i 1 length? message/content 1 [ entity: message/content/:i name: mime-filename? entity if name [ either typing [ append result reduce [(naming-func name) (form mime-typing? entity) entity/content] ][ append result reduce [(naming-func name) entity/content] ] ] ] RETURN result ] mime-export-forwarded-messages: function [ "Exports messages - assumes they are contained in a multipart mime message." message [file! url! string!] "The forwarded email." target-directory [file! url!] "The target directory to write the messages to." ][msg-data msg-number prefix extension][ msg-data: either string? message [message][read message] msg-number: 0 prefix: "msg-" naming-function: func [name][ msg-number: msg-number + 1 join target-directory [prefix msg-number %.eml] ] foreach [f m ] mime-named-attachments?/naming mime-structure? msg-data :naming-function [ write f m ] ] mime-multipart-summary?: function [ message-model ][message description alt-desc params pad-string] [ if not mime-multipart? message-model [RETURN false] pad-string: insert/dup copy "" " " length? to-string length? message-model/content part-list: copy [] for part-number 1 length? message-model/content 1 [ message: message-model/content/:part-number description: to-string first parse message/content-type ";" either in message 'content-disposition [ params: mime-header-parameters? message/content-disposition ][ params: mime-header-parameters? message/content-type ] if all [ params any [ alt-desc: select params 'filename alt-desc: select params 'name ] ] [append description rejoin [" " alt-desc]] description: join at pad-string (length? to-string part-number) [part-number " " description] append part-list description ] ] ; ; Model Information functions ; mime-message?: function [ "Checks to see if this is mime message." message [string! object! binary!] /strict "Check that mime-version header is present." ][msg version-tests assertions][ msg: mime-msg-as-object message version-tests: [ in msg 'mime-version msg/mime-version ] either strict [ all [ in msg 'content all version-tests ] ][ all [ in msg 'content any [ all version-tests all [ in msg 'content-type msg/content-type ] ] ] ] ] mime-is-rfc822?: func [message [object!]][ RETURN equal? "message/rfc822" form mime-typing? message ] mime-typing?: func [ "Mime type/subtype of mime entity object." entity [object!] ][ either (in entity 'content-type) [ RETURN load pick parse entity/content-type none 1 ][ RETURN none ] ] mime-multipart?: func [ "Returns true if this a multipart entity." entity [object!] ][ RETURN equal? pick mime-typing? entity 1 'multipart ] mime-line-termination?: func [ "Returns the line ending being used." message [object!] ][ ; The method I use here is dependent on the fact that the content field actually ; contains the entire original string. So I look in front of "content" to see the termination. RETURN either find/match back back message/content CRLF [CRLF][to-string newline] ] mime-header-parameters?: function [ "Retrieves the parameters from a header." header [string!] ][header-params parameter result name value][ result: copy [] for i 1 length? (header-params: next parse/all header ";") 1 [ if not empty? header-params/:i [ if not parse/all trim/head/tail copy header-params/:i [ copy name to #"=" skip copy value ["=?" 2 [thru #"?"] thru "?="] (parameter: reduce [name value]) ][ parameter: parse/all trim/head/tail copy header-params/:i "=" ] append result reduce [to-word parameter/1 parameter/2] ] ] RETURN result ] mime-boundary?: function [ "Mime boundary used in a multipart entity." entity [object!] ][content-type-header boundary start][ content-type-header: entity/content-type either string? content-type-header [ boundary: select mime-header-parameters? content-type-header 'boundary ] [ if start: find entity/content "^/--" [ boundary: copy/part next start find next start join mime-line-termination? entity ] ] RETURN boundary ] mime-filename?: function [ "Returns a filename for the entity or none" entity [object!] ][name][ name: any [ all [ in entity 'content-disposition any [ select mime-header-parameters? entity/content-disposition 'filename select mime-header-parameters? entity/content-type 'filename select mime-header-parameters? entity/content-type 'name ] ] select mime-header-parameters? entity/content-type 'name ] RETURN either found? name [to-file name][none] ] ; ; Model construction functions ; mime-model-template: context [ mime-version: content-type: content: content-transfer-encoding: model-status: none ] mime-entity: function [ "Returns an object model that represents the MIME structure of a message." message [object!] line-termination [string!] /default-encoding encoding-default [string!] ][msg body-parts][ msg: make mime-model-template message ; Handle defaults if not default-encoding [encoding-default: "7bit"] if not msg/content-transfer-encoding [msg/content-transfer-encoding: encoding-default] if not msg/content-type [msg/content-type: "text/plain; charset=us-ascii"] either mime-multipart? msg [ body-parts: mime-body-parts? msg/content mime-boundary? msg line-termination for i 1 length? body-parts 1 [ poke body-parts i mime-entity/default-encoding (mime-msg-as-object body-parts/:i) line-termination encoding-default ] msg/content: body-parts ][ msg/content: decode-inmost-mime-entity-body msg/content msg/content-type msg/content-transfer-encoding line-termination ] msg/model-status: 'decoded RETURN msg ] decode-inmost-mime-entity-body: function [ "Decodes the body of the mime entity back into a rebol representation - modifies the entity." body [string!] content-type [string!] content-transfer-encoding [string!] line-termination ][result decode-type][ result: body ; If it is text handle local/network line termination. ; See notes at top of the script regarding line termination. switch/default pick load pick parse content-type none 1 1 [ text [ if equal? line-termination CRLF [replace/all result CRLF NEWLINE] ] ][ ; 31-dec-2001 - disabled next line because of change in encodings.r ; result: to-binary result ] ; Reverse the transfer encoding if all [ content-transfer-encoding not equal? content-transfer-encoding "7bit" not equal? content-transfer-encoding "8bit" not equal? content-transfer-encoding "binary" decode-type: encoding? content-transfer-encoding ] [ result: decode decode-type result ] ; If it is text or a message ensure a string result. ; Will leave image as binary - so that Core can use the script for automatic extraction and save. switch/default pick load pick parse content-type none 1 1 [ message [ result: to-string result ] text [ result: to-string result ] ][result: to-binary result] RETURN result ] msg-as-object: function [ "Just ensures that the message is an object." message [string! object! binary!] ][msg tmp][ either object? message [ msg: message ][ ; Bypass any erroneous unix MBOX from_ header parse/all message ["From " thru newline tmp: (message: :tmp)] ; Parse the headers msg: parse-header none message ] RETURN msg ] mime-msg-as-object: :msg-as-object ; Name change ; ; Low Level Support functions ; mime-body-parts?: function [ [catch] "The separate parts of multipart mime content." content [string! binary!] boundary [string!] line-terminator [string!] "The line terminator - either CRLF or newline." ] [ normal-delimiter close-delimiter prologue epilogue msg-part parse-result message-part-pattern return-block ] [ ; Get the parts return-block: make block! 2 normal-delimiter: rejoin [line-terminator "--" boundary] close-delimiter: rejoin [line-terminator "--" boundary "--"] message-part-pattern: [ line-terminator copy msg-part [to normal-delimiter | to close-delimiter] (append return-block msg-part) ] parse-result: parse/all content [ (epilogue: prologue: none) [ "--" boundary message-part-pattern | copy prologue to normal-delimiter ] some [ close-delimiter copy epilogue to end | normal-delimiter message-part-pattern ] end ] if not parse-result [throw make error! "Error: Assumption failed while parsing parts."] RETURN return-block ] ; Could be smarter. ; Could have a peek at the value for special format markers. guess-mime-type: function [name value][ext][ ext: find/last form name "." if found? ext [ext: next ext ] RETURN copy any [ if find ["html" "htm"] ext ["text/html"] if find ["jpeg" "jpg" "jpe"] ext ["image/jpeg"] if equal? "png" ext ["image/png"] if equal? "gif" ext ["image/gif"] if find ["text" "txt"] ext ["text/plain"] if all [equal? ext "r" find/match trim/head copy/part value 1024 "REBOL"] ["text/plain"] "application/octet-stream" ] ]