codeconscious.com - Learning about Rebol and networking

This page is a collection of bits of information that people may find useful. It reflects information that should be suitable for programmers learning about networking and the Rebol language.

CGI Safety

Reducing or DOing the decoded cgi-query block is not the recommended approach. That method allows the world to assign arbitrary words in your script to strings. Imagine this REBOL cgi program:

#! /path /to /rebol -cs
REBOL []
print "Content-type: text/html^/"

reduce decode-cgi system/options/cgi/query-string

;- Expecting user to be passed in
page: [print reform ["Hello" user]]
do page
quit

See the danger? I can go up to my browser and type in this url:

http://site/cgi-bin/script.r?page=send+me@somewhere+allyourfiles

Of course, you can still be perfectly safe if the script establishes a sensible sandbox with SECURE, but the point is you don't want the outside world to be able to globally define words to strings in your CGI script. If, in the course of your CGI script, those strings could ever get interpreted as code then anyone can make your script DO arbitrary code.

By making the query block into an object, this problem is completely avoided. REBOL CGI script writers should be aware of this issue. REBOL provides powerful tools to make REBOL cgi scripts air tight secure, but if the script lowers the shields then REBOL can't help you. :-) Jeff

Using Rebol CGI with Personal Web Server

To use PWS, you need to do the following:

Add a new string value to

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\W3Svc\Parameters\Script

Map with name of .r (assuming this is the suffix for the rebol script) and data value of

c:\rebol\rebol.exe -cs %s

(obviously substitute the appropriate path to your executable).

Your rebol cgi scripts should reside in your PWS Scripts directory. In my case, I run from /cgi-bin.

The Registry *definitely* must be handled with care. I've never created a problem that I know of, but the potential is definitely there.

Tim

So I add this disclaimer: Getting a change wrong in your registry could do irreperable harm to your installation, for this reason Microsoft strongly discourages changes to the registry.

Brett Handley

Sending mail

"Can someone tell me why I get this error when trying to send mail through a CGI form? Sending Email to ... ** User Error: No network server for smtp is specified. ** Where: smtp-port: open [scheme: 'smtp] if email?"

Depending on where you have REBOL installed and where your cgi resides, the cgi may not be able to read user.r to know where SMTP server is. If the webserver has the REBOL_HOME variable defined in it, then you can can have a user.r in the REBOL_HOME directory. Otherwise you need a user.r file with the network definitions defined in it, resident in the same directory as your cgi scripts for REBOL to read user.r and get its network definitions. If all of this is already done, then I would look at the webserver, possibly check with your Admin group to see if SMTP is turned off by default under cgi or try a simple PERL script to try and send mail to see if it works under other scripting languages.

You can also try to force defining the smtp host in the REBOL cgi script with -

system/schemes/default/host: "your_smtp_host.your_domain.com"

christmn

File locking

Be careful when saving to a file and reading it later, as many users may run your CGI programs simultaneously. You should use some form of inter-process synchronization to serialize access to your file. AFAIK from REBOL you can reliably synchronize by opening TCP sockets or renaming a file. I use these two functions:

obtain-semaphore: func [] [
    loop 100 [
        if not error? try [rename %lockdir/lock %lock.locked] [
            return
        ]
        wait (random 5) / 10
    ]
    print "Could not obtain semaphore"
    quit
]

release-semaphore: func [] [
    loop 100 [
        if not error? try [rename %lockdir/lock.locked %lock] [
            return
        ]
        wait (random 5) / 10
    ]
    print "Could not release semaphore"
    quit
]

The guarded file is not %lockdir/lock, but any other file - because read-only accesses to the file don't need to be guarded, and would fail if the guarded file is currently renamed. You should make a directory %lockdir/ with an empty file %lock

Then you should use:

obtain-semaphore
something: read %my-guarded-file.txt
write %my-guarded-file.txt "something new"
release-semaphore

If you want to only read from your file, you can use just

something: read %my-guarded-file.txt
print something

If I remember correctly, you must only guard one situation, "read-modify-write", because two simultaneous reads do not interfere, and two simultaneous writes are a logic error of your program.

obtain-semaphore function uses loop and random wait interval to have a better chance of succeeding if several processes run it simultaneously. release-semaphore function should be simpler, loop is not necessary:

release-semaphore: func [] [
    if not error? try [rename %lockdir/lock.locked %lock] [
        return
    ]
    print "Could not release semaphore"
    quit
]

Michal Kracik

Killing a CGI

"I've just managed to get Rebol running on one of my websites ( using Solaris ). If I were to start up a script that doesn't end - say the rebocron.r, how can I kill it? I don't have telnet access."

One solution is the file marker. Its just such a simple highly effective cross anything solution, I cant seem to stop finding uses. Not a solution to be proud of--too simple I suppose. Maybe if it was called the "E-file marker" it would be more popular?

if exists? %killme.mkr [ delete %killme.mkr quit ]

Ryan Cole

Multipart post data

"Does anyone have experience with accepting multi-part form data?"

I haven't tried it, but look at this:

-->http://www.rebol.org/cgi/POST.html http://www.rebol.org/cgi/POST.html

Graham Chiu

Not sure if you have done this or not but if uploading a file you have to set enctype="multipart/form-data" in the form tag

Allen Kamp

FTP

Deleting diretories using FTP

In order to delete a directory (and anything under it) you will first have to find out the contents of that specific tree. Then you can delete each (from the bottom up). Unfortunately it's the nature of the FTP beast.

Deryk

FTP using a password with strange chars

"I need to connect to a FTP server but the password contains strange chars like #"@" #"\" #"#" I tried to url-encode the password, but it does not work."

This allows you to avoid that..

read [
    scheme: 'ftp
     user: "user-name"
     pass: "password"
     host: "ftp.securesite.com"
     path: "private/"
     target: "file.dat"
    ]

Allen Kamp

FTP Modes

FTP can be used in the following ways:

- active mode, no proxy. (This is the default) - passive mode, no proxy. - active mode, SOCKS5. - passive mode, SOCKS5. - passive mode, SOCKS4.

Note that the SOCKS4 protocol does not support incoming connections, so the combination "active mode, SOCKS4" is not possible.

Up until REBOL/Core 2.2 "passive mode" would only be used if REBOL detected that "active mode" does not work with a user's setup (automatic fallback). This fallback still exists in Core 2.3, but experience shows that not all FTP servers support the automatic fallback (in particular FTP servers on Win-NT often do not support it because they do not detect "connection refused" errors), and sometimes SOCKS servers have problems with the fallback as well. As a workaround for these server bugs Core 2.3 also allows users to manually switch to passive mode by putting

system/schemes/ftp/passive: true

into user.r. That forces FTP to always use passive mode.

One other enhancement in the FTP implementation from Core 2.2 to 2.3 was that Core 2.3 supports "active mode, SOCKS5". Core 2.2 did not support this, i.e. it always fell back to passive mode when SOCKS5 was used for FTP, which caused problems for some users. Core 2.3 now supports all possible combinations of proxy servers and active/passive modes for FTP.

Holger Kruse

Accessing above home directory

"How can I ftp to a server and go to a directory that is not below my home directory?"

Did you try "../../path/path/path/goodfile.txt" ?

Doug Vos

Hey, that worked!

I did:

read ftp://user:password@111.111.111.111/../../var/local/datamover/test1/testfile1

and it worked.

Jamey Cribbs

Using FTP recursively

"I'd like to write a backup script that backs up the files on an FTP site. This because my cheap mass hosting has already dumped everyone's files. :-/

"Anyhow, there's no shortage of recursive directory routines lying around the place. Unfortunately they don't work with web sites. The reason for that being that Rebol appears to keep trying to open fresh FTP sessions every time you reference a URL - even if there's a current session logged into the same site. This seems like stupid behavior given it doesn't logout any of these threads either."

No, REBOL caches and reuses the FTP control connection across subsequent accesses to the same host and directory. A new data connection has to be created for each file being transfered. That is not a limitation of REBOL, but rather a requirement of the FTP protocol. The only situation when REBOL creates a new control connection is when you change directories. That is necessary because FTP provides no standard way to remotely navigate through the directory tree (in particular up the tree) without side effects.

Holger Kruse

"No, REBOL caches and reuses the FTP control connection across subsequent accesses to the same host and directory.

"But doesn't actually close any. Is there a way of forcing them closed?

I'll double-check with Sterling, but AFAIK current experimental versions do close the control connection first if another connection to the same host has to be opened. At least that is the behavior I get here.

The only situation when REBOL creates a new control connection is when you change directories. That is necessary because FTP provides no standard way to remotely navigate through the directory tree (in particular up the tree) without side effects.

"What, other than cd .. ? I don't get it."

"cd .." does not work if a component in the path is a softlink. E.g. if you have just read "/foo/" and next want to read "/bar/" then "cd .." does not always get you back to "/", because "/foo" might actually point to "/a/b/c/d/", and then "cd .." would only get you to "/a/b/c/".

"cd ~" works on some servers, but not all, "cd /" works on some servers, but not all :-). There seems to be no way to get back to the initial login directory (or to specify absolute path names relative to the login directory) that works on all FTP servers.

> then how would you recommend a recursive directory scan be > implemented in Rebol?

Just do a sequence of reads.

Holger Kruse

Yes. FTP will close the old connection to a given host if a new one is made but in a different directory. That is also the same behavior I see here with the latest experimental (2.4.39 on Linux). I also watched my netstat and only one connection stayed active. The only situation where you could end up with more than one connection to a single host is if you are using two different users to log in as.

If you really want to make sure that all connections are closed immediately then you can set the port cache size down to zero:

system/schemes/ftp/cache-size: 0

This will have the effect that the control connection is closed at the end of the request and no ports at all will ever be cached by REBOL FTP (not so efficient). If you set it to 1 then it should cache only one connection. If you are only connecting to a single host as a single user then you should see no difference in how it's all working. All sequential reads within a single directory will reuse the command port but access of a different directory will create a new command port and close the last one.

If you are seeing different results... like multiple command ports open to the same host, please send as much info as you can into feedback as a bug report so we can track down the problem.

Sterling

HTTP

Submit a form using GET (Automate a query function)

Usually you can so this by requesting a search url from the site. Like this:

print read join http://www.google.com/search ["?q=" ask "Search for: "]

Which finds this results of looking for Rebol through Google like this direct link:

http://www.google.com/search?q=rebol

Andrew Martin

Submit a form using POST

For Forms that require POST for submission. Use read with the custom refinement.

print read/custom http://babelfish.altavista.com/translate.dyn
    reduce ['POST {text=REBOL+Rules&lp=en_fr}]

Allen Kamp

Getting the redirected URL

Here is one way to get the redirected URL:

>> port: open http://www.bebits.com/bob/4515/3KBnewsreader.zip
connecting to: www.bebits.com
connecting to: www.beosjournal.com
>>
>> probe port/url
"http://www.beosjournal.com/rebol/3KBnewsreader.zip"
== "http://www.beosjournal.com/rebol/3KBnewsreader.zip"
>> close port
>>
       Larry Palmiter

Http headers

"Does anybody know how to read the headers after reading a HTTP-request?"

You can find the header info in system/options/cgi. Note that besides providing default header info, system/options/cgi also contains block called other-headers, where remaining header info is stored.

Elan

Http headers2

You are looking for the headers of the HTTP response, right?

a: open http://www.yahoo.com
probe a/locals/headers
copy a ; to get the page
close a
       Sterling

NNTP

Howdy, Ryan, and others interested in REBOL's nntp.r:

NNTP servers lie. They tell you a range of numbers of articles that MAY be there. They may well not be there, though. It's a funny thing.

The only way to really get the true number of articles in a newsgroup is to get the headers for each article (which over a 22kb modem 'aint always a good idea).

NNTP.r, as it is, also provides the optional NNTP "XHDR" command which lets you quickly download just the subject lines (or any other given header field: from, to, keywords, etc..) of all the headers in the group. Having all the subject lines, you can then know for sure (unless some of those articles expire while you are reading) the count of articles in a newsgroup.

One of the things NNTP does when it connects to a news server is determine if it can do XHDR. Interactively you can ask an open news port what it can do by inserting [help] into the port. Here's how you can determine non-interactively if the server you are talking to has XHDR:

np: open news://news.somewhere
found? find np/handler/commands 'xhdr

Using XHDR you can do something like the following:

np: open news://news.somewhere
set [total start end] insert np [count from "alt.test"]
x-mids: rejoin ["Message-ID " start "-" end]
message-ids: insert np [xhdr x-mids from "alt.test" please]

;- please is optional :)

The XHDR command gives you back a big string in a block. Yes, that is a little odd (XHDR was added at the last minute just to help aspiring news bot writers, if you want to know!). The string you will find in the block has the number of each article followed by the message id. It's trivial to parse the string and it'll allow you to ask for individual articles by their message-ids in order. There's examples of getting articles by message id in the NNTP.r howto. Using XHDR, you'll have an efficient way of obtaining true newsgroup ordering with no gaps (for news severs that support the feature ... If they don't well, you probably have to fall back on getting all the message headers in a group if you want to insure total ordering... that's what Forte' free agent does!!).

 -) Hope that info may be useful on your projects. Jeff ===UDP How to use UDP In view you can see that udp listen ports are working: ;-- UDP listen and send-
x: open udp://:9090
close insert open udp://localhost:9090 "hello udp!"
y: wait reduce [x 1]
copy y
== "hello udp!"

Jeff

Miscellaneous

Escaping URLs

REBOL should not escape URLs automatically because in some situations escaped and unescaped characters have different meanings in URLs, and REBOL has no way of knowing what the user's intention in using the character was (i.e. whether the character is supposed to have its special meaning or whether it is just a component of a path name and needs to be escaped).

For instance semicolon and ampersand (";", "&") both have special functions in URLs, but only if they are not escaped. If they are escaped then they can be used in path names. The ";" in its escaped form may e.g. be needed in path names when accessing files on a VMS system.

As a rule, escaping is not something you do when sending a URL to a web server, but something you do when constructing a URL, because that is the time when you know whether characters have special meanings or whether they are part of a path. For correct results you need to URL-encode each component of a URL separately, and then put the components together, separated by the various special characters (":;&/" etc.) in their unescaped form.

Holger Kruse

Filtering emails using Rebol

[Note: this is a compilation of multiple emails - I hope I got it right - Brett.]

"I have this crazy idea that I can use REBOL to filter my mail to different files as I receive it."

Howdy, Tom:

Say you run a REBOL script from a .forward:

| /path/to/filter.r

then executable script filter.r might look like this:

#!/path/to/rebol -qws
REBOL [
    Title: "Mail filter"
]
message: copy ""
stdin: copy system/ports/input
foreach line stdin [
    append message append line newline
]
message: import-email message

foreach [item file][
    "REBOL" %rebol-box
    "Debian" %deb-box
    "me" %me-box
][
    ;- the args to find were backwards
    if find message/content item [
        write/append file stdin 
        QUIT ;- instead of exit (not in function)
    ]
] 
write/append %default-box stdin
quit

Of course you can filter on message/subject, message/to, message/from, etc...

The principle is the same using pop, but you use a pop port to retrieve the mail and you need to export messages you take from pop back to mailbox format (easy). Jeff

Serial Ports

Opening

You shouldn't need a special serial-port refinement to open a serial port. You should only need to call serial://

The default values for opening serial ports are:

device: port1
speed:  9600
data bits:  8
parity: none
stop bits:  1

URL's are encoded with the different fields separated by slashes. For example,

serial://port1/9600/8/none/1

The order of fields is not significant, as the type of field can be determined by the content.

Above, 'port1' is the name of your serial port. On Windows, this may be something like com1. On Linux, it may be something like ttyC0.

The ports available on your particular machine should be noted in the following path:

system/ports/serial

Bo

Serial ports and Linux

I found out some additional info on the serial ports.

On Linux, for example, serial ports can be named in any of the following formats:

ttyC0
ttyS0
cua0

Also, the 0 can be a higher number if you have more than one serial port.

REBOL cannot determine which of these is correct due to technical reasons which are beyond the scope of this email, so you will have to try the following:

1)  You must be running REBOL as ROOT (or somehow change the permissions
    on the serial ports beforehand) as this is a security feature of
    most multi-user systems.

2)  Try setting system/ports/serial to the different naming conventions
    of your system.  For example:

    system/ports/serial: [ttyC0 ttyC1]
    -or-
    system/ports/serial: [ttyS0 ttyS1]
    -or-
    system/ports/serial: [cua0 cua1]

    You can include as many serial ports as there are on your system in
    this block.

3)  Open the serial port like this:

    serial-port: open serial://port1/9600/8/none/1
Use the word PORT1 above (don't replace with the serial port name)
as it refers directly to a location in the system/ports/serial
block.  You can also use PORT2 or PORT3, etc. (if applicable to your
system).

Bo

Changing serial port params

> a mechanisim to change the option after an open.

Just change the parameters in the port structure and call 'update on the port.

Holger Kruse

Ports

Checking for a closed connection

open/lines/direct/no-wait "whatever port scheme"

"The problem is how do I read determine when a port that was opened is now closed on the other end. The docs say to copy first port and filter for the "none" response in order to close the port."

Read data from the port with

data: copy port

If data is none then the other end has been closed. Otherwise you get the available data. For /lines ports, data is always returned as a block of lines. If no data is available at the port (but the other end has not been closed yet), then an empty block is returned.

Holger Kruse

Behaviour of ports

"If I used open/line/direct/no-wait I get a block when doing a:

copy port

"I get a string when I do:

copy first port

"My question is how do I use copy port and convert it to get the same output that copy first provides in a string format without generating a past-end error."

Here is the behavior for ports in line/direct/no-wait:

copy returns a block of all lines waiting to be read, and removes them from the port buffer. Each element of the block is one line. If no lines are waiting then an empty block is returned. If the other end has closed the connection then none is returned.

"first port" is a shortcut for "pick port 1". It returns the next line and removes it from the buffer. The result is none either if no data is pending (this will probably change in the next version [message posted 15-Jan-2001]) or if the end of the stream has been reached. You don't need to "copy" the result.

If you want to be able to distinguish between no data and eof, but still want only a single line returnd to you at a time, then use "copy/part port 1". If the returned value is a block then it is either empty or has exactly one item in it, which you can access using "first".

Holger Kruse

UDP Example

"I have been using a TCP logging server for some stuff here at work for a couple of months now. Works fine with the latest Core experimental. It just waits for line data from a client and displays it when received (code below).

"Now I want to change to UDP instead of TCP. Unfortunately it doesn't appear to be just a simple change of protocol since I'm getting error messages when trying to use first and pick on the port. And more unfortunately, there is little UDP documentation from RT (at least that I can find).

Here is the TCP version - what do I need to do for a UDP version?"

REBOL []
sp: open/lines/no-wait/with tcp://:5599 form newline
forever [
    cp: first wait sp
    d: "none"
    forever [
        w: wait [cp .1]
        if all [port? w none? d] [
            ;connection closed - wait for a new one and continue
            cp: first wait sp
            d: "none"
        ]

        if port? w [
            print d: pick cp 1
        ]
    ]
]

Change

sp: open/lines/no-wait/with tcp://:5599 form newline

to

sp: open/direct/no-wait udp://:5599

and do all port i/o directly on the main port (sp). UDP does not create subports for each incoming packet, i.e. you do not need the "cp: first wait sp" any more. Holger Kruse

Wait and nowait

"When creating server ports such as:

server: open/lines/direct tcp://:113
client: first server

"How do you keep the script from blocking other port activities with in the same script. In otherword this just hangs at client:first server and blocks other port activity until 'server receives data. The no-wait option seems to do the same thing. Any ideas?"

You can insert

p: wait [server otherports...]

To wait for one of several ports to have data. The value returned by 'wait tells you which of the ports has data.

no-wait only has an effect on 'copy. Without no-wait 'copy only returns on end-of-file. With no-wait 'copy returns immediately with whatever data is available. Holger Kruse

Miscellaneous

Catching errors

"How do one catch an error when writing a file to a ftp server. Following the docs, this should be: disarm err: try [write ftp://blahblah read blahblah]

"The problem is that write does not return a value on success. So the script itself results in an error, saying that 'err needs a value."

The simple way is to use set/any

if error? set/any 'err try [ ...] [ print mold disarm err ]
       Elan

Determining IP Addresses

"I was wondering if there is a way to use REBOL to determine the ip addresses currently connected to my local machine if its acting as a proxy. I believe the answer is no but was interested how I might determine that or another language that could do that and interface with REBOL by means of exchange that data via a tcp port."

Do you mean the IP address of the remote end of a connected TCP port ? You can find that in port/remote-ip. The IP address of the local end of the connection is in port/local-ip. Additionally there are port/remote-port and port/local-port for the port numbers.

Or are you trying to find the IP addresses assigned to the interfaces of your machine ? With current experimentals try "get-modes udp:// 'interfaces". This returns a block of objects that contain network configuration paramenters, including IP addresses. Works with most operating systems (not with BeOS and Elate though).

Holger Kruse

mySQL Database Driver

DocKimbel has written a driver for mySQL to use with Rebol

More details at his rebsite:

http://rebol.dhs.org/index.r

Brett Handley

Port Timeouts

"How do I set the timeouts for port openings? For example: open ftp://ftp.microsoft.com is fast but if I spelled it wrong it would take some time to before it timeouts - such as open ftp://ftp.microsofy.com."

At the moment you can only set read timeouts (system/schemes/default/timeout). There are a lot of other situations in which networking causes delays. In the example above it is the DNS lookup. Depending on the operating system it is not always possible to adjust the delay, i.e. sometimes it is handled within the OS.

For DNS lookups the only way to change timeouts is by using async DNS lookups (dns:///async), but that only works on some operating systems.

Holger Kruse