One of the few remaining proprietary Windows applications that I use is FrontPage to maintain the Suneido web site. Originally, someone else set up the web site with FrontPage and I just continued to use it. There’s nothing particularly wrong with FrontPage, but I’d rather not be dependent on any proprietary tools (and I’d prefer to use open source ones).
FrontPage has lots of features, but the main one I’m dependent on is its “include” facility. I really don’t want to live without this – duplicating the common header, menu, and footer on every page seems like a maintenance nightmare. One option would be to use “server side includes” (SSI). One problem with this is that not all web servers have SSI, or have it disabled for various reasons. I could probably get our ISP to enable SSI, but there’s another problem. When you view the pages locally, you wouldn’t see the includes. This would make it hard to see what the pages would look like without either publishing them, or setting up a local web server. FrontPage gets around this by physically duplicating the common content, but marking it with special HTML comments so it can update these sections when the included file changes. It looks like this:
<!--webbot bot="Include" U-Include="dload_inc.htm" TAG="BODY" startspan --> ... <!--webbot bot="Include" i-checksum="7502" endspan -->
I like this approach since it means you can view the files locally and see what the end result will look like, but you can still update the common parts in one place.
If I could reproduce this part of FrontPage, I figure I can live without the rest of its features. It doesn’t seem too hard to write a simple tool to go through a bunch of HTML files and update any included sections. This isn’t all the FrontPage include facility does, but it would be enough for what I need.
Suneido is not necessarily the best option to write this kind of standalone tool. It has a large exe (relative to what’s required – 1.2mb isn’t really that big these days). A C++ program could be fast and small. Or you could use Perl, Python, Ruby, Lua, or one of the other “scripting” languages that are designed for this type of application. But, not surprisingly, I’m most familiar with Suneido, and while perhaps not “ideal”, it’s certainly up to the task. And it never hurts to “eat your own dogfood” (i.e. use your own tools).
I chose the following markup:
<!--htminc filename --> ... <!--endinc filename -->
is an HTML comment, so the markup will be ignored by other tools. Other than that, the specific markup isn’t that critical. I decided the tool should process all the files in a directory.
I started by writing a test. Here’s the shell:
HtmIncludeTest
Test { Setup() { CreateDirectory('htminc', NULL) } Teardown() { for file in Dir('htminc/*.*') DeleteFile('htminc/' $ file) RemoveDirectory('htminc') super.Teardown() } }
Within Suneido (and other languages) I like to use forward slashes in file pathnames to avoid any problems like “bin\tools” turning into “binools”.
Hitting F9 to run it, results in a reassuring “All Tests Succeeded”. Next, I start a test method for a plain file:
Test_plain()
{
file = ‘htminc/plain’
plain = ”
PutFile(file, plain)
AssertEq(GetFile(file), plain)
}
Run still succeeds. Finally we can get to the point of all this:
Test_plain() { file = 'htminc/plain.htm' content = '' PutFile(file, content) AssertEq(GetFile(file), content) HtmInclude('htminc') AssertEq(GetFile(file), content) }
This fails because we don’t have an HtmInclude yet. Here’s the start of it:
HtmInclude
class { CallClass(dir) { for file in Dir(dir $ '/*.htm') .process(dir, dir $ '/' $ file) } process(dir, file) { } }
I made it a class rather than just a function so that I can split it up into separate methods. We use Dir to loop through each of the files and call process on each one. That’s more than enough to make our test pass, since it doesn’t have to do anything yet. Next, we add a test method that requires a include:
Test_single() { file = 'htminc/single.htm' content = '<!--htminc inc.htm -->stuff<!--endinc inc.htm -->' PutFile(file, content) PutFile('htminc/inc.htm', 'things') HtmInclude('htminc') AssertEq(GetFile(file), content.Replace('stuff', 'things')) }
Predictably, this fails. Now we have to implement the core of HtmInclude:
process(dir, file) { dst = "" s = src = GetFile(file) forever { i = s.Find('<!--htminc ') + 11 dst $= s.Substr(0, i) s = s.Substr(i) if s is '' break s = .process1(dir, s) } if dst isnt src PutFile(file, dst) }
First, we get the contents of the file. Then we move everything up to the first include to dst, our new content. We include the opening tag so that it won't be found on the next loop. Then we call process1 to do each include. Finally, if it has changed, we write out the new content.
process1(dir, s) // pre: s starts AFTER '<!--htminc ' { file = s.BeforeFirst('-->').Trim() if file is '' throw "invalid include tag" content = GetFile(dir $ '/' $ file) if content is false throw "can't get " $ dir $ '/' $ file endtag = '<!--endinc ' $ file $ ' -->' return file $ ' -->' $ content $ endtag $ s.AfterFirst(endtag) }
(In some languages, building a big string by concatenating it piece by piece is very inefficient. In Suneido it’s fine. When you concatenate strings, Suneido actually just makes a list of the pieces, not bothering to create a new string until it’s actually needed.)
process1 assumes it is called with the content following a
'<!--htminc '
so we note this in a precondition comment. We extract the file name and retrieve it's contents. We also use the file name to build the ending tag. The error checking could probably be improved, but it'll do for now. Finally, we return the updated content. Sure enough, this is enough to get our test to pass.
If we want to edit and view the include files on their own, we'll want to wrap them in the standard:
<html><head></head><body>...</body></html>
But we don't want to include this wrapping. Not hard to handle, but we'll be good and write a test method first:
Test_wrapped() { file = 'htminc/single.htm' content = '<html><!--htminc inc.htm -->stuff<!--endinc inc.htm -->' PutFile(file, content) PutFile('htminc/inc.htm', 'things') HtmInclude('htminc') AssertEq(GetFile(file), content.Replace('stuff', 'things')) } With that failing, now we can fix process1: <pre class="prettyprint">content = GetFile(dir $ '/' $ file) if content is false throw "can't get " $ dir $ '/' $ file content = content.AfterFirst('').BeforeLast('')
The AfterFirst and BeforeLast string methods make this easy. (We're also using BeforeFirst and there's an AfterLast if you need it.
Oops! Our new test succeeds, but it breaks previous tests. (That's a big reason to have these tests - to act as a "safety net" to ensure we don't break previously working functionality. That way we can keep ratcheting forward instead of backsliding.) We could just require include files to have the wrapping, but the fix is simple, so we might as well allow it:
content = GetFile(dir $ '/' $ file) if content is false throw "can't get " $ dir $ '/' $ file if content.Has?('') content = content.AfterFirst('').BeforeLast('')
We could have used content =~ '' but then we'd have to worry about special regular expression characters (and it would be slower).
There's a potential problem with this code - if the
<head>
had the string
<body>
in a comment or in a script, it wouldn't work properly. That doesn't seem too likely, so for now I'll just add a comment to the code and leave it.
All the tests succeed again, and we're pretty much done. But just to be safe, lets add a test for multiple includes in one file, just to make sure the loop in process is working. While we're at it, we'll also test when the content hasn't been included yet.
Test_multiple() { file = 'htminc/single.htm' content = ' <!--htminc inc.htm --><!--endinc inc.htm --> more <!--htminc inc.htm --><!--endinc inc.htm --> ' PutFile(file, content) PutFile('inc.htm', 'things') HtmInclude('htminc') AssertEq(GetFile(file), content.Replace('-><!', '->things<!')) }
That didn't uncover any problems, so we're done (for now, anyway).
To make it easy to run, I can create a shortcut or batch file with the following command:
C:\Suneido\suneido.exe -n HtmInclude('/sunweb');Exit()
(Don't forget the Exit(), otherwise Suneido will stay in memory and if you try to start it again you'll get "Can't open suneido.db" and you'll have to kill it.)
There is another potential weakness in this code - it doesn't handle recursive includes (include files that include other include files). This isn't a problem for my use, but it's something to be aware of.
Now I just have to convert the web site to the new include format! Hmmm... maybe I can write something to do that for me 🙂
You can download the source from htminc.zip. It's an Export of the two records, so use LibraryView > Import to load it in.