BIG PLANS,
SMALL FILES

by Mario Huys


Among diplomats there are two types: The trim, well-shaven professional with matching socks, a small attach�-case in hand; and the big girth regular with loose tie and old school bag bulking out with non-descript papers. One imagines the first to be young, the other old. What this implies is that as one grows older, the first type naturally evolves into the second. Unless precautions are taken. The key difference between the two is arguably the amount of paperwork taken along. Keep that under control, and the rest follows.

When it comes to Diplomacy and Play by E-mail (PBEM), the paperwork becomes digital. Files are still called files, but the main contents are orders and maps. In the world of PBEM the main file types employed on the judges (at least those that have been around for a long time, like the Njudge and DPjudge) are PostScript and PDF. There's a close relationship between the two, starting with the fact that they were both developed by the same company. Whereas PostScript primarily knew success in the printing industry, PDF was developed as a replacement for PostScript to take better advantage of the new multi-media options emerging at the end of the last century. Hence its focus on online content, and the heavy investment in its flagship, freely available viewer, Acrobat Reader, leaving PostScript viewers to third-party companies with a product such as GhostScript and GhostView. Why bother with viewing as long as it prints?

That makes PDF the younger, cleaner, more energetic participant. Yet here's an odd fact: The PDF-files on the judges are usually much larger than their PS counterparts. And not by a small factor either. File sizes ten times bigger are not unheard of, especially for the bigger maps or longer games. Bigger files imply longer download times, and if you think that's no longer a concern nowadays, the advent of smartphones should tell you differently.

This is all the more surprising as PDF files are internally compressed, PS files not. The same compression algorithms that are used to zip your Excel or Word documents are automatically applied to all pages, or page streams as they are called, inside a PDF file. That's why zipping PDF files usually doesn't gain you much, so that PDF files are commonly attached to mails and uploaded to web servers as is. Not having to bother unzipping them increases their appeal.

So where does it go wrong?

These page streams, when uncompressed, are very much alike the contents you will find in a typical PostScript file. In fact strip out the tables and other structural markups from a PDF file, put in a suitable preamble, and the PDF file becomes a regular PostScript file. The opposite, turning a PostScript file into PDF, is generally a more involved process.

The reason for this is that PostScript is actually a programming language with typical programming features like procedures, variables, arrays and dicts, and a set of primitive functions that make it suited for defining graphics. PDF dumped all this programming stuff and only kept the primitive functions and added tables to keep track of all the parts of a document, its pages, images, embedded links, etc. This made it a lot more efficient to render a page or extract any secondary information. Want to know the number of pages? Just count the entries in the page table. Want to extract all the images? Or render a random page? All that is readily accessible without the need to interpret the whole file and keep it all in internal memory. Flexibility traded for speed.

The advantage of a programing language is that anyone with a bit of skill (and a manual, the PostScript Language Reference Manual, aka the Red Book) can create a valid PostScript file. That was what made it attractive to the first judge developer, and many followed (for example, see this article on How to Make a Floc.net Map). It's easy to handcraft a PS file with just a text editor, it's far more difficult to write a PDF file from scratch. Luckily there are converters.

Another appeal of PostScript and PDF is that they are vector graphics, as opposed to bitmaps (gif, jpeg, png, etc.). Zoom in as far as you will, lines and curves remain as crisp as ever, while with bitmaps pixels become blobs that quickly blur the details.

If you want to draw an army, you first instruct it to draw a circle, then a few lines for the barrel and the stand, you color the inner part in one color and the outline in a different color, and finally you instruct it to draw a letter, the power initial, inside the wheel. These instructions are the primitive functions, or operators, I spoke of before. They are common to both PostScript and PDF.

Suppose now that you want to draw a second army. In PDF you need to repeat the whole operation one by one. In PostScript however you can encapsulate all those instructions in a procedure, let's call it DrawArmy, give it some parameters so that the position, the color and the power initial can be varied, and call this repeatedly for every army that needs to be drawn. That's a lot less code to write than in the PDF case, and so your file becomes smaller.

An even bigger crunch is the map itself, which consists of a lot of small lines and needs to be drawn on every page. The longer the game, the more pages there are. And on every page, in the case of PDF, all these small lines are repeated. And that while the PostScript file has it all wrapped in a single procedure, DrawMap, which it calls at the start of every page. Define once, use many.

Now that we know the problem, what can we do about it? Clearly it would be swell if there was an equivalent mechanism in PDF that allows to define a complex figure once and render it repeatedly. And there is, in the form of a Form. Not the kind of form that you associate with questionnaires. No, a Form is like a page, with its own stream of instructions, that can be inserted inside pages as many times as you want, or even inside other Forms. Better still, the same concept of a Form also exists in PostScript where it was originally created for (along with Patterns, but that's a different story).

The solution seems thus pretty much straightforward. You turn your PostScript procedure to draw an army into a Form object, then convert that to PDF in the hope that your smart PS to PDF converter converts it into an equivalent PDF Form object. And indeed, if you use Acrobat Distiller, it does what you expect. However Acrobat Distiller (developed by the same company that created PS and PDF) is a commercial product. And the Diplomacy hobby is mostly run on no budget, with free tools. So Distiller is out, GhostScript is in. And that's when hopes get dashed. GhostScript's PS to PDF converter does not preserve Forms as Forms. It expands them into the page stream, leaving you no better of than when everything was still a procedure.

That's how things stood around the turn of the century. I had experimented with Forms while working on the 1900 map for the DPjudge, proven that Distiller produced smaller PDF files, but that GhostScript refused to do the right thing. This is however such an obvious shortcoming that it was impossible that the developers of GhostScript were not aware of it. Furthermore GhostScript is open source, so with time on my hand I could envision myself implementing a patch.

As is customary when grand visions appear, I put this off to work on other, more attainable challenges. Other maps were waiting to be readied for the DPjudge, new rules needed to be implemented and tested, games had to be mastered and played, etc. In due course I became the main DPjudge developer and judgekeeper. Then one day, about a year ago, I became curious again about the topic at hand. Surely GhostScript would have been improved in the meantime? The first thing to check was to check the bug tracker, Bugzilla. I found several related issues, but the main one was Bug 687561: "Smaller PDFs when using execform". This execform is the name of the operator to render a Form. The issue had been reported in 2004, but 9 years later still had not been resolved. There was a partial patch that never made it into any release. More interestingly it also had a comment on a different way to create Forms inside PDF files, one that was entirely supported.

This involved the use of the pdfmark operator, an operator that, as it name implies, lets one insert PDF-specific code inside a PostScript stream. Being PDF-specific any other PostScript interpreter simply skips this code. One solution was thus to create a PostScript file specifically targeted for GhostScript's PS to PDF converter, one that would not use Forms, but its pdfmark equivalent. I decided however to go for a more general solution by writing a prolog that would overrule the execform operator, telling it to operate as normal if the pdfmark operator didn't exist or was a dummy, but to switch to pdfmark in case of a conversion to PDF. That way the same file could be rendered correctly by any type of PostScript interpreter. And if ever execform was going to be properly supported by GhostScript it would be sufficient to simply throw out this prolog.

Once decided upon a course of action I went to work. I wanted to make it as general as possible, so that it could be used for any PostScript file using Forms, not just those developed for the judges. In that way anyone who used my patch could profit from it. That meant testing them against many other test files, like those found in the Bugzilla tracker. Many obstacles needed to be surmounted, but finally I was ready to announce my solution to the world.

I had created a small program, called DPghost, that would inject my prolog into a PostScript file, and put it on GoogleCode from where anyone interested could download it. The best way to make sure that people could find it was to add a comment to Bug 687561. I visited the page only to discover that… the issue had been closed! Finally, after nearly ten years, in September 2013, someone (Ken Sharp) had implemented a proper Forms to Forms conversion. It was not in the official release yet, but that was only a question of time. Indeed, just a week ago (March 24, 2014), with the release of 9.14, it finally made it into the official release. My invention had been made obsolete before it had been out.

This minor annoyance notwithstanding I was able to confirm that the current PDF files are much smaller than the old ones. In fact they are almost exactly the same size of the PostScript files from which they originate. To put this in numbers, an Imperial game, one of the biggest maps in circulation, with 53 pages (or 17 game years, as Winters get their own page) that I used as a test, produced the following savings: The original PDF file being 6.3 MB, after turning the map proper into a Form it was reduced to 2.7 MB; the armies and fleets reduced it further to 1.1 MB, the supply centers, which are in fact just circles (or squares), to 1.0 MB, almost the same as the 950 KB of the PS-file.

Most maps on the DPjudge have by now been adapted to use Forms, lowering storage needs and download times. Njudge servers may profit in the same way by adopting these maps or by contacting me. While for you as a player, if you had obstinately kept to PS files all these years because of download worries, perhaps it's finally time to switch to PDF?



Mario Huys
([email protected])

If you wish to e-mail feedback on this article to the author, and clicking on the envelope above does not work for you, feel free to use the "Dear DP..." mail interface.