REBOL [ Title: "Markup Tools" Date: 23-May-2002 Version: 1.0.4 File: %markup-tools.r Author: "Brett Handley" Email: brett@codeconscious.com Purpose: "Provide some functions to interpret/manipulate markup." History: [ 1.0.0 [10-Oct-2000 { Original code } "Brett Handley"] 1.0.1 [9-Dec-2000 { simple-table-extract should handle TH now. Reorganised script and added comments to make it more readable. } "Brett Handley"] 1.0.2 [18-Mar-2001 { Added a more parse friendly dialect. Showed how the dialect can be parsed. Created a simple interpreter that generates the markup from the dialect - seems much quicker than the original form-markup. I'm having real troubles coming up with a decent naming convention for all these type of things. } "Brett Handley"] 1.0.3 [20-may-2001 { Changed naming from html to markup to reflect the fact that the functions can works with some XML. } "Brett Handley"] 1.0.4 [6-May-2002 {Fixed bug in element-type function.} "Brettt Handley"] 1.0.5 [23-May-2002 {Modified to deal with names as issue!} "Brettt Handley"] ] Comment: { This structure makes querying/changing markup a bit easier. Requires the scripts tag-tool.r and highfun.r This structure makes no attempt at interpreting markup structure it merely encodes it as a stream of more accessible elements than straight tags. Some of the functions below do simplistic interpretation of the markup structure as represented by my encoding in order to achieve some specific tasks. The encoding uses blocks to represent tags, everything else is strings. Replacing tag! with block! types makes the tag attributes easily accessible. Comments are tags therefore they become blocks. But they are not further interpreted so you end up with a block with a single element of type string. While this encoding is simple, you may be suprised at how much you can achieve with it. A by-product of import-tag is that all attributes become quoted. So the following simple line will "automatically" fix all your unquoted markup attributes. write %some-file.markup form-markup load-markup %some-file.markup I've not stressed it, nor tested it with CGI. You are responsible for how you use this script not me! The same goes for any script this one relies on. ----- Need to add entity replace stuff. eg. & --> & ; Example of querying and manipulation of markup pages. ; ; This example generates a new markup page and lists all the ; href type links it can find in the home page of Rebol. ; test-block: [[markup] [body] [p] "List of links" [/p] [/body] [/markup]] ; Set some attributes - could have put them in blocks instead... set-attribute-value test-block/2 'bgcolor "gray" set-attribute-value test-block/3 'align "center" ; Find out where we want to insert new content into our new page. insert-here: find/only test-block [/p] ; Foreach href type tag we find in the home page add it to our new page. foreach link unique tags/attribute load-markup http://www.rebol.com/ 'href [ insert-here: insert insert-here reduce [newline
link (select link 'href) ] ] ; Finally write out the new markup page. write %tf.markup form-markup test-block } Rights: {Copyright © Brett Handley 2001 Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.} ] do load-thru http://www.rebol.org/advanced/highfun.r do load-thru http://www.codeconscious.com/rebsite/rebol-library/tag-tool.r do load-thru http://www.codeconscious.com/rebsite/rebol-library/import-rebol-datatypes.r ; ; Helper functions - used for querying and manipulating the structure after it ; has been loaded. ; get-lookup-value: func[ block lookup-name ][ select block lookup-name ] set-lookup-value: func[ block attr-name attr-value ][ if not get-lookup-value block attr-name [ append block reduce [ attr-name none] ] head change/only next attr-pos: find block attr-name attr-value if none? attr-value [ remove/part attr-pos 2 ] block ] get-attribute-value: func[ "Get the value of a tag attribute." tag-block attr-name ][ get-lookup-value tag-block attr-name ] set-attribute-value: function [ "Set the value of a tag attribute." tag-block attr-name attr-value ][attr-pos][ set-lookup-value tag-block attr-name attr-value ] ; ; Calculated types (assumes results of import-tag) ; element-type: function [ "Returns the type of element." element-value ][][ either block? element-value [ either string? element-value/1 [ RETURN 'uninterpreted-tag ] [ either equal? first mold element-value/1 #"/" [ RETURN 'close-tag ][ either any [ equal? to-string last element-value "/" find [meta br img] first element-value ] [ RETURN 'unpaired-tag ][ RETURN 'open-tag ] ] ] ] [ RETURN 'char-data ] ] ; ; Encoding markup into my structure. ; As you can see it is pretty basic because it leverages ; the work of "import-tag". Tags are converted to blocks. ; load-markup: function [ "Loads markup into a simple dialect using blocks to represent tags." markup ] [markup-markup cursor] [ cursor: load/markup markup while [not tail? cursor] [ if tag? cursor/1 [ change/only cursor import-tag first cursor ] cursor: next cursor ] head cursor ] ; ; "Decoding" my structure back to markup. ; form-element: function [ item-value ][result temp-value temp-pos item-type cursor][ item-type: element-type item-value if equal? item-type 'char-data [ RETURN item-value ] if equal? item-type 'uninterpreted-tag [RETURN to-tag item-value/1] parse item-value [ issue! any [ cursor: issue! any-type! (change cursor to-lit-word to-string cursor/1) ] to end ] temp-value: build-tag item-value if equal? item-type 'unpaired-tag [ if find/match temp-pos: back back tail temp-value " /" [ remove temp-pos ] ] RETURN temp-value ] form-markup: function [ "Forms markup from the simple markup block structure." markup-block /add-lines "Add a line break after each element." ] [result cursor item] [ result: make block! length? markup-block cursor: markup-block while [not tail? cursor] [ insert tail result item: form-element cursor/1 if all [add-lines not tag? item] [append result "^/"] cursor: next cursor ] to-string result ] ; ; Searching functions ; tags: function [ [catch] "Returns a block of all tag blocks - by default excluding comments." markup-block /name tag-name "Specific tag type" /attribute attr-name "Match attribute type" /value attr-value "Match attribute value - requires /attribute" /uninterpreted "Includes uninterpreted tags (comment tags)." /index "Return index as result." /position "Return block reference as result." ][x selected result-block][ if all [value not attribute] [ throw make error! "/value requires /attribute for tags function." ] result-block: copy [] for i 1 length? markup-block 1 [ x: pick markup-block i selected: false if block? x [ either uninterpreted [ selected: true ][ if not string? x/1 [ selected: true ] ] ] if all [selected name] [ selected: (equal? x/1 tag-name) ] if all [selected attribute] [ selected: (select x attr-name) ] if all [selected value] [ selected: (find/match (select x attr-name) attr-value) ] if selected [ insert/only tail result-block either position [ result-block at markup-block i ][ either index [i][x] ] ] ] RETURN result-block ] strings: function [ "Returns a block of all string elements." markup-block /index "Return index as result." /position "Return block reference as result." /with-trim "Trims each string found." /no-empty "Exclude empty strings from result." ][x result-block][ result-block: copy [] for i 1 length? markup-block 1 [ x: pick markup-block i if not block? x [ if with-trim [ x: trim copy x ] if any [not no-empty not equal? x ""] [ insert/only tail result-block either position [ result-block at markup-block i ][ either index [i][x] ] ] ] ] RETURN result-block ] find-tag: function [ "Finds the first occurrence of the named tag." markup-block tag-name /tail "Finds the last occurrence instead." ][cursor][ cursor: markup-block found: none while [all [not tail? cursor not found]][ if block? cursor/1 [ if not string? cursor/1 [ if equal? cursor/1/1 tag-name [ found: true ] ] ] if not found [cursor: next cursor] ] either tail? cursor [first reduce [none] ][either tail [next cursor][cursor]] ] ; ; Filtering/changing functions ; strip-lines: func[ "Returns the markup-block but with line breaks removed from between tags." ; If you some of 'em back check out form-markup/add-lines. markup-block ][ filter (func[x][any [block? x equal? x "" not equal? trim copy x ""]]) markup-block ] fill-empty-cells: function [ "Inserts an empty string into empty cells - simplifying later operations." markup-block ][td-a-indexes td-b-indexes insertion-points][ td-a-indexes: map (func[x][add x 1]) ( union (tags/name/index markup-block 'td) (tags/name/index markup-block 'th) ) td-b-indexes: union tags/name/index markup-block to-lit-word "/td" tags/name/index markup-block to-lit-word "/th" insertion-points: head reverse sort intersect td-a-indexes td-b-indexes foreach p insertion-points [ insert at markup-block p copy {} ] markup-block ] simple-table-extract: function [ "Returns the string data table - nested tables not handled." markup-block /number table-index "Specify table number to extract." ][table-start table-end table-block][ if not number [table-index: 1] ; Default is first table. ; Find the start of the table. table-start: markup-block for i 1 table-index 1 [ table-start: find-tag/tail table-start 'table ] table-start: back table-start table-end: find-tag/tail table-start to-word "/table" table-block: copy/deep/part table-start table-end strings fill-empty-cells strip-lines table-block ] simple-table-extract2: function [ "Returns the string data table as an array - nested tables not handled." markup-block /number table-index "Specify table number to extract." ][table-start table-end table-block row-start row-end next-row-start result-block][ if not number [table-index: 1] ; Default is first table. ; Find the start of the table. table-start: markup-block for i 1 table-index 1 [ table-start: find-tag/tail table-start 'table ] table-start: back table-start table-end: find-tag/tail table-start to-word "/table" table-block: copy/deep/part table-start table-end table-block: fill-empty-cells strip-lines table-block result-block: copy [] ; Find the first row row-start: find-tag table-block 'tr if not next-row-start [next-row-start: tail table-block] ; Process each row while [not tail? row-start] [ ; Find the start of the next row next-row-start: find-tag next row-start 'tr if not next-row-start [next-row-start: tail table-block] ; Find the end of the current row row-end: find-tag row-start to-lit-word "/tr" if not row-end [row-end: tail table-block] if lesser? next-row-start row-end [ row-end: next-row-start ] ; Extract the row data append/only result-block strings copy/part row-start row-end ; Advance the cursor row-start: next-row-start ] RETURN result-block ] ; ; Encoding "my structure" to a dialect ; markup-dialect: function [ "Describes markup in a simple dialect." markup-spec [ file! url! string! block!] ][result markup-block][ either equal? type? markup-spec block! [ markup-block: markup-spec ][ markup-block: load-markup markup-spec ] result: copy [] foreach e markup-block [ insert tail result element-type e either equal? element-type e 'close-tag [ insert/only tail result to-word next to-string first e ][ either find [unpaired-tag open-tag] element-type e [ insert/only tail result first e insert/only tail result next e ][ insert tail result e ] ] ] RETURN result ] markup-dialect-parser: context [ _tagname: none _tagattributes: none _chardata: none _unpaired: none markup-dialect-rules: [ some [ ['open-tag (_unpaired: false) | 'unpaired-tag (_unpaired: true)] set _tagname word! set _tagattributes block! | 'close-tag set _tagname word! (_tagname: to-word join "/" _tagname) | 'char-data set _chardata string! | 'uninterpreted-tag set _chardata string! ] ] set 'parse-markup-dialect func [ data ][ parse data markup-dialect-rules ] ] form-markup-from-dialect-interpreter: context [ _output: none open-tag: func [ 'name [word!] attributes [block!] ][ insert tail _output build-tag head insert copy attributes name ] unpaired-tag: func [ 'name [word!] attributes [block!] ][ open-tag name attributes ] close-tag: func [ 'name [word!] ][ insert tail _output build-tag reduce [to-word join "/" name] ] uninterpreted-tag: func [ data [string!] ][ insert tail _output build-tag reduce [data] ] char-data: func [ data [string!] ][ insert tail _output data ] interpret: func [ dialect ][ if not parse-markup-dialect dialect [ make error! "Input doesn't conform to the dialect grammar." ] _output: copy "" do bind/copy dialect 'self RETURN _output ] set 'form-markup-from-dialect :interpret ] ; ; Convenience functions ; html-file?: function [ "Determine if an html file, returns the extension or none." test-file [file!] ][file-ext][ either find [%.html %.htm] file-ext: get in import-file test-file 'extension [ file-ext ][ none ] ] html-files?: func ["Returns the html files in a specified dir-spec" dir-spec [file! url!]][ map func[x] [either equal? dir-spec %./ [x][join dir-spec x]] filter :html-file? read dir-spec: dirize dir-spec ]