In this document I describe how I designed and implemented a minimal 32bit OS/2 application that displays the "I'm really small!" string. We will see that it is possible to create such an application that fits completely in an OS/2 32bit header. The first described techniques are of general interest or, at least, useful in some circumstances, such as limited disk space or low bandwidth. The latest described techniques, related to overlapping, are not recommended for use in applications. Please send comments and suggestions to Martin Lafaix, lafaix@online.fr.
Given that you are reading this document, I assume that you want to know how to create small OS/2 executables. Given that you have that desire, I assume that you already know about OS/2 programming.
My purpose in this document is to present a way to develop a minimal working OS/2 program. I describe how OS/2 executables are structured, how careful thinking can reduce code size, and how it is possible to rework the OS/2 executables organization to reduce their sizes.
My intentions are not to point clean or recommended ways to develop small programs. I intend to develop an as small as possible application. By doing so, I will use dirty tricks. I advise you never to use the techniques presented herein.
The ideas leading to this document came from a contest proposed on comp.os.os2.programming.misc by Michal Necasek. The contest was to write an as small as possible application displaying the "I'm really small!" string. I do not think writing such an application is by itself an interesting goal to achieve, but I consider the process leading to it to be rewarding.
I hope you will find reading this document not totally uninteresting.
A 32bit OS/2 application typically consists of a DOS stub, followed by a linear executable module header, followed by a loader section, a fixup section, a non-resident section, and possibly a debug section.
This section only details the structures that are of interest for our little quest. For a full description of the 32bit EXE format, refer to [LXSPEC].
+----------------------------------------+ 00h | DOS 2 Compatible header (MZ) | . . . . +----------------------------------------+ . . . +----------------------------------------+ 3ch | Offset to Linear EXE module header | +----------------------------------------+ . . Dos stub program and data .
The DOS stub contains code that is executed if we attempt to run the application from DOS (or from a DOS window in OS/2). This stub usually displays a message telling that the application cannot run in DOS mode, but in fact any valid DOS program can be used as a stub. (This is a way to have a unique program that run both in DOS mode and in OS/2, the other way being using a bound application.)
The DOS stub is not mandatory. If we attempt to run a 32bit OS/2 application that does not contain a DOS stub in DOS mode, we will get a SYS3175 error. This is less user-friendly than displaying an adequate message, but cause no other problems.
Various tools exist that can remove the DOS stub, LxLite being one of them. If we remove the stub by hand, there are some pointers in the linear executable (32bit) header we have to adjust (namely, those in the non-resident section).
The linear executable module header (LX header for short) either start the executable or, if a DOS stub is present, is pointed to by the value at offset 3ch in the DOS MZ header. It consists of a 176 bytes structure, followed by twenty padding bytes.
Those twenty padding bytes are apparently of no use, but the OS/2 loader refuses to load linear executables if those padding bytes are missing. This implies that a valid linear executable must be at least 196 bytes long.
+-----------+-----+-----+-----------------------+ 00h | "L" "X" |B-ORD|W_ORD| FORMAT LEVEL | +-----------+-----+-----+-----------------------+ 08h | CPU TYPE | OS TYPE | MODULE VERSION | +-----------+-----------+-----------------------+ 10h | MODULE FLAGS | MODULE # OF PAGES | +-----------------------+-----------------------+ 18h | EIP OBJECT # | EIP | +-----------------------+-----------------------+ 20h | ESP OBJECT # | ESP | +-----------------------+-----------------------+ 28h | PAGE SIZE | PAGE OFFSET SHIFT | +-----------------------+-----------------------+ 30h | FIXUP SECTION SIZE | FIXUP SECTION CHECKSUM| +-----------------------+-----------------------+ 38h | LOADER SECTION SIZE |LOADER SECTION CHECKSUM| +-----------------------+-----------------------+ 40h | OBJECT TABLE OFF | # OBJECTS IN MODULE | +-----------------------+-----------------------+ 48h | OBJECT PAGE TABLE OFF | OBJECT ITER PAGES OFF | +-----------------------+-----------------------+ 50h | RESOURCE TABLE OFFSET |#RESOURCE TABLE ENTRIES| +-----------------------+-----------------------+ 58h | RESIDENT NAME TBL OFF | ENTRY TABLE OFFSET | +-----------------------+-----------------------+ 60h | MODULE DIRECTIVES OFF | # MODULE DIRECTIVES | +-----------------------+-----------------------+ 68h | FIXUP PAGE TABLE OFF |FIXUP RECORD TABLE OFF | +-----------------------+-----------------------+ 70h | IMPORT MODULE TBL OFF | # IMPORT MOD ENTRIES | +-----------------------+-----------------------+ 78h | IMPORT PROC TBL OFF | PER-PAGE CHECKSUM OFF | +-----------------------+-----------------------+ 80h | DATA PAGES OFFSET | #PRELOAD PAGES | +-----------------------+-----------------------+ 88h | NON-RES NAME TBL OFF | NON-RES NAME TBL LEN | +-----------------------+-----------------------+ 90h | NON-RES NAME TBL CKSM | AUTO DS OBJECT # | +-----------------------+-----------------------+ 98h | DEBUG INFO OFF | DEBUG INFO LEN | +-----------------------+-----------------------+ a0h | #INSTANCE PRELOAD | #INSTANCE DEMAND | +-----------------------+-----------------------+ a8h | HEAPSIZE | STACKSIZE | +-----------------------+-----------------------+
Whenever a 32bit linear executable module is started, the initial values of registers are defined as follows:
Those are values we can rely on when coding an application.
The loader section consists of the object table, the object page table, the (possibly empty) resource table, the resident name table, the entry table, the (optional) module format directives table, the (optional) resident directives data, and the (again optional) per-page checksum table.
In our simple application case, we will only have an object table, an object page table, a resident name table, and an entry table. This is the minimal case.
The object table begins the loader section. It contains an entry for each object in the module. The entries consist of six 32bits words per object.
+-----------------------+-----------------------+ 00h | VIRTUAL SIZE | RELLOC BASE ADDR | +-----------------------+-----------------------+ 08h | OBJECT FLAGS | PAGE TABLE INDEX | +-----------------------+-----------------------+ 10h | # PAGE TABLE ENTRIES | RESERVED | +-----------------------+-----------------------+
The beginning of the object table denotes the beginning of the loader section. No member of the loader section can precede the object table.
The object page table contains one entry for each valid page data. The page data offset field is relative to the data pages offset in the LX header.
63 32 31 16 15 0 +-----------------------+-----------+-----------+ 00h | PAGE DATA OFFSET | DATA SIZE | FLAGS | +-----------------------+-----------+-----------+
The resident name table consists of a series of abutted entries. A null entry (i.e., the len field is 0) denotes the end of the table. The len field is a 8bit number, the ASCII string is not null terminated, and the ordinal number is a 16bit number.
+-----+---------------- ----+-----------+ 00h | LEN | ASCII STRING . . . | ORDINAL # | +-----+---------------- ----+-----------+
There is at least one entry: the module name, whose ordinal is 0. We should not omit it. Its default value is the same as the EXE file name, but we can redefine it via the NAME directive in a .DEF file.
The entry table can be null. If not, it points to a (possibly empty) table, with one entry for each exported entry points. A null entry (i.e., the cnt field is 0) denotes the end of the table. The cnt and type fields are 8bit numbers, and the bundle info field length depends on the entry type and count.
+-----+-----+-----------------+ 00h | CNT |TYPE | BUNDLE INFO . . | +-----+-----+-----------------+
The important thing to note is that the object table starts the loader section, and that all the section's table must be within loader section size bytes of its start. There can be holes in the loader section, though.
The fixup section consists of the fixup page table, the fixup record table, the import module name table, and the import procedure name table.
The non-resident section consists of the preload pages, the demand load pages, the iterated pages, the non-resident name table, and the (optional) non-resident directives data.
The debug section, if present, contains info for debuggers. It is not used by the OS/2 loader.
Let us remember our intent: we want to create an as small as possible OS/2 application that display the "I'm really small!" string, followed by a line feed. In the next sections, we will explore some ways to achieve this.
We will use the assembly language (ALP), but no real assembly language knowledge is required.
As we are creating an OS/2 application, one of the first way that comes to mind is using the DosWrite API. It requires a file handle to write to, a source buffer, the number of bytes to write, and some place to place the written bytes count.
Standard input and standard output (and standard error) files are of so common use that OS/2 provides predefined handles for them: HF_STDIN, HF_STDOUT, and HF_STDERR. We do not have to open and or close those files. The system will manage that for us.
As do all standard OS/2 API, DosWrite takes its parameters from the stack, in reverse order (i.e., the last argument is pushed first, and the first argument is pushed last).
Our first attempt looks like this:
title small.asm .386 extrn DosWrite:proc extrn DosExit:proc DGROUP group DATA32 DATA32 segment dword use32 public 'CONST' @msg db "I'm really small!",0ah DATA32 ends CODE32 segment dword use32 public 'CODE' public main main proc ; DosWrite(HF_STDOUT, msg, sizeof(msg), &ul); sub esp, 4h; storage for ul push esp push 012h push offset FLAT:@msg push 01h call DosWrite ; DosExit(EXIT_PROCESS, 0); push 00h push 01h call DosExit main endp CODE32 ends end main
This attempt is quite big, though. Two things will help us. First, we will use only one object: we will mix code and data. Second, we will use the fact that, as per the system specifications, the top of the stack contains an address that will call DosExit(EXIT_PROCESS, eax) on invocation.
Here is our slightly revised attempt:
title small.asm .386 extrn DosWrite:proc DGROUP group CODE32 CODE32 segment dword use32 public 'MIXED' align 2 @msg db "I'm really small!",0ah public main main proc ; DosWrite(HF_STDOUT, msg, sizeof(msg), &ul); sub esp, 4h; storage for ul push esp push 012h push offset FLAT:@msg push 01h call DosWrite add esp, 14h ; DosExit(EXIT_PROCESS, eax); ret main endp CODE32 ends end main
This is better, but still not perfect. We do some unnecessary stack manipulation, and we can simplify the execution path. Instead of calling DosWrite, then returning from it, then returning to the DosExit stub, we can simply jump do DosWrite, and, if we provide the DosExit return address, return directly to the DosExit part.
Our final DosWrite attempt will then look like:
title small.asm .386 extrn DosWrite:proc DGROUP group CODE32 CODE32 segment dword use32 public 'MIXED' align 2 @msg db "I'm really small!",0ah public main main proc ; DosWrite(HF_STDOUT, msg, sizeof(msg), &ul); pop eax push esp push 12h push offset FLAT:@msg push 01h push eax jmp DosWrite main endp CODE32 ends end main
There are a couple enhancements we could do to reduce the code size further, but those are either kludgy or dirty. But we will never do kludgy or dirty things, won't we? :-)
The first enhancement reduces the size of the push offset FLAT:@msg statement: instead of taking five bytes, using dec ax / inc eax / push eax only takes four (our application is an EXE, and the message begins the object, so its address is 0x10000). As said above, this is kludgy, as it hard code the message address. If the message is no longer at the beginning of the object, it will not work.
The second enhancement reduces the size of the push 01h statement (i.e., push HF_STDOUT). If we note that we could indeed write to the stdin stream (!!!), we could save one byte by pushing zero instead (push ebx). But this one is dirty :-)
Using the DosWrite API was our first idea, but some less obvious choices are available. Let us explore two of them.
The first alternative API we will explore is DosPutMessage. It is a lesser known API, and it does a bit more than what we want, but it does not harm, does it?
DosPutMessage offers two big advantages. First, it takes one less parameter. Second, and that is the most important, it is provided by the MSG library. This may look trivial, but this will saves bytes where most needed. Six of them compared to DOSCALLS, indeed. (Five due to the shorter name, plus one due to the fact that the exported ordinal fits in just one byte.)
Here is the revised DosPutMessage code:
title pico.asm .386 extrn DosPutMessage:proc DGROUP group CODE32 CODE32 segment dword use32 public 'MIXED' align 2 @msg db "I'm really small!",0ah public main main proc ; DosPutMessage(HF_STDOUT, sizeof(msg), msg); pop eax push offset FLAT:@msg push 12h push 01h push eax jmp DosPutMessage main endp CODE32 ends end main
OS/2 Warp 4 now includes a standard C library as part of its documented libraries. puts is a simple way to achieve our goal.
Now that we have some quite tight code, let us start some EXE packing per se.
In this section we will use the DosPutMessage code as described above as our base.
By compiling our code (using ALP and ILINK), we obtain the following working EXE (it uses 426 bytes here, but your results may vary, if your DOS stub message is in some other language than French):
alp pico.asm ilink pico.obj pico.def /align:1 OFFSET +0 +4 +8 +C 00000000 4D5A8C01 01000000 - 04000000 FFFF0900 *MZ?.........??..* 00000010 00020000 00000000 - 40000000 00000000 *........@.......* 00000020 00000000 00000000 - 00000000 00000000 *................* 00000030 00000000 00000000 - 00000000 90000000 *............?...* 00000040 0E1FBA0E 00B409CD - 21B8014C CD214365 *..?..?.?!?.L?!Ce* 00000050 2070726F 6772616D - 6D65206E 65207065 * programme ne pe* 00000060 75742070 61732073 - 27657882 63757465 *ut pas s'ex?cute* 00000070 7220656E 20736573 - 73696F6E 20444F53 *r en session DOS* 00000080 2E0D0A24 00000000 - 00000000 00000000 *...$............* 00000090 4C580000 00000000 - 02000100 00000200 *LX..............* 000000A0 00020000 01000000 - 01000000 12000000 *................* 000000B0 01000000 24100000 - 00100000 00000000 *....$...........* 000000C0 12000000 00000000 - 26000000 00000000 *........&.......* 000000D0 C4000000 01000000 - DC000000 00000000 *?.......?.......* 000000E0 00000000 00000000 - E4000000 E9000000 *........?...?...* 000000F0 00000000 00000000 - EA000000 F2000000 *........?...?...* 00000100 F8000000 01000000 - FC000000 00000000 *?.......?.......* 00000110 8C010000 00000000 - 00000000 00000000 *?...............* 00000120 00000000 01000000 - 00000000 00000000 *................* 00000130 00000000 01000000 - 00000000 00100000 *................* 00000140 00000000 00000000 - 00000000 00000000 *................* 00000150 00000000 24100000 - 00000100 03200000 *....$........ ..* 00000160 01000000 01000000 - 00000000 00000000 *................* 00000170 1E000000 01500000 - 00000000 00000600 *.....P..........* 00000180 00000881 1E000105 - 034D5347 49276D20 *...?.....MSGI'm * 00000190 7265616C 6C792073 - 6D616C6C 210A5868 *really small!.Xh* 000001A0 00000100 6A126A01 - 50E9 *....j.j.P? *
As we have seen in Section 2.1, we can safely remove the DOS stub. We could do it by using LxLite, but this requires learning its arcane command line options, so maybe using a simple binary editor will be faster (there is only one non-resident table, so only one entry to adjust, at offset 80h---the data pages offset).
Our EXE is now 282 bytes long:
OFFSET +0 +4 +8 +C 00000000 4C580000 00000000 - 02000100 00000200 *LX..............* 00000010 00020000 01000000 - 01000000 12000000 *................* 00000020 01000000 24100000 - 00100000 00000000 *....$...........* 00000030 12000000 00000000 - 26000000 00000000 *........&.......* 00000040 C4000000 01000000 - DC000000 00000000 *?.......?.......* 00000050 00000000 00000000 - E4000000 E9000000 *........?...?...* 00000060 00000000 00000000 - EA000000 F2000000 *........?...?...* 00000070 F8000000 01000000 - FC000000 00000000 *?.......?.......* 00000080 FC000000 00000000 - 00000000 00000000 *?...............* 00000090 00000000 01000000 - 00000000 00000000 *................* 000000A0 00000000 01000000 - 00000000 00100000 *................* 000000B0 00000000 00000000 - 00000000 00000000 *................* 000000C0 00000000 24100000 - 00000100 03200000 *....$........ ..* 000000D0 01000000 01000000 - 00000000 00000000 *................* 000000E0 1E000000 01500000 - 00000000 00000600 *.....P..........* 000000F0 00000881 1E000105 - 034D5347 49276D20 *...?.....MSGI'm * 00000100 7265616C 6C792073 - 6D616C6C 210A5868 *really small!.Xh* 00000110 00000100 6A126A01 - 50E9 *....j.j.P? *
Removing the DOS stub is a safe exercise. With it ends the clean part of this document. The further described techniques are not recommended. There is really no reason to apply them.
Let us enter the kludgy realm. We have seen in Section 2.2 that the LX header was followed by twenty padding bytes. We will remove this padding. We will also move the resident name table (after all, our application name is of no interest, so it can be any value). And finally, we will slightly overlap the loader and fixup sections.
As we remove the padding, we have to adjust the pointers to all table offsets in the LX header. (Or at least to the ones we use, namely: the object table offset, the object page table offset, the resident name table offset, the entry table offset, the fixup page table offset, the fixup record table offset, the import module table offset, the import proc table offset, and the data pages offset yet again.)
To move the resident name table, we have to remember that it must remain in the loader section, and that it should not precede the object table. We will do that by overlapping the resident name table over the object table. We also overlap the entry table there, just after the beginning of the resident name table. As we moved those tables, we modified the loader section. We hence have to adjust its size entry.
As the loader section ends with the same three null bytes that starts the fixup section, we will overlap those two sections slightly.
Our EXE is now 253 bytes long:
OFFSET +0 +4 +8 +C 00000000 4C580000 00000000 - 02000100 00000200 *LX..............* 00000010 00020000 01000000 - 01000000 12000000 *................* 00000020 01000000 24100000 - 00100000 00000000 *....$...........* 00000030 12000000 00000000 - 20000000 00000000 *................* 00000040 B0000000 01000000 - C8000000 00000000 *?.......?.......* 00000050 00000000 00000000 - C0000000 C1000000 *........?...?...* 00000060 00000000 00000000 - CD000000 D5000000 *........?...?...* 00000070 DB000000 01000000 - DF000000 00000000 *?.......?.......* 00000080 DF000000 00000000 - 00000000 00000000 *?...............* 00000090 00000000 01000000 - 00000000 00000000 *................* 000000A0 00000000 01000000 - 00000000 00100000 *................* 000000B0 24100000 00000100 - 03200000 01000000 *$........ ......* 000000C0 01000000 00000000 - 00000000 1E000000 *................* 000000D0 00060000 0008811E - 00010503 4D534749 *......?.....MSGI* 000000E0 276D2072 65616C6C - 7920736D 616C6C21 *'m really small!* 000000F0 0A586800 0001006A - 126A0150 E9 *.Xh....j.j.P? *
The documented LX header is still intact. That will not last long :-)
The LX header above contains plenty of "unused" areas. Taking them to good use is quite tempting. In fact, from offset 7ch up to a8h included, a large free area is imploring us. The data pages offset is not free, but a closer look at the loader section tells us that its second word (the object base) is in fact unused, as, for the first code object of an executable, and as long as the object contains internal fixups, a value of 0x10000h will be assumed. This cannot be a coincidence, can it? Let us put the loader and fixup sections over that free area. (And let us adjust the various usual table offsets accordingly.)
Our code is now 206 bytes long.
OFFSET +0 +4 +8 +C 00000000 4C580000 00000000 - 02000100 00000200 *LX..............* 00000010 00020000 01000000 - 01000000 12000000 *................* 00000020 01000000 24100000 - 00100000 00000000 *....$...........* 00000030 12000000 00000000 - 20000000 00000000 *................* 00000040 7C000000 01000000 - 94000000 00000000 *|.......?.......* 00000050 00000000 00000000 - 8C000000 8D000000 *........?...?...* 00000060 00000000 00000000 - 99000000 A1000000 *........?...?...* 00000070 A7000000 01000000 - AB000000 24100000 *?.......?...$...* 00000080 B0000000 03200000 - 01000000 01000000 *?.... ..........* 00000090 00000000 00000000 - 1E000000 00060000 *................* 000000A0 0008811E 00010503 - 4D534700 00100000 *..?.....MSG.....* 000000B0 49276D20 7265616C - 6C792073 6D616C6C *I'm really small* 000000C0 210A5868 00000100 - 6A126A01 50E9 *!.Xh....j.j.P? *
We know that our theoretical limit is 196 bytes. We are not there yet. More work to do...
Our code section is too big to be inserted verbatim in a free area---there are no big enough free area available anymore. But, still, there are some free areas. What if we would move parts of our code in them?
Here is one solution. We will break our code in small blocks, and we will jump from one block to the others. We will let the message intact in one block.
While splitting our code, there is mostly one thing to take care of: the four bytes following the jmp DosPutMessage instruction must be either after the end of our object or zeros.
The empty areas are either four or eight bytes long, so let us see where we could break our code, considering that relative jumps will take two bytes each. Given our current code size:
A simple partition of our code could then be:
block1: pop eax push offset FLAG:@msg jmp block2 block2: push 12h jmp block3 block3: push 01h push eax jmp DosPutMessage
Block 1 takes eight bytes, block 2 four, and block 3 either four or eight. That suits our needs. By examining the LX header, we see that we have a free area followed by four null bytes at offset 28h: the page size entry is not used, as pages are assumed to be 4096 bytes long anyway. So this is a nice place for us to put block 3 in. The module directives offset and number of module directives entries at offset 60h aren't used either, so we will put block 1 there. The number of resources entry is available (among other) for block 2.
Applying the above explosion leads us to a 196 bytes long executable (actually, 194 bytes, but we have to add two bytes of padding, so that OS/2 can load it). As we have modified the code section, we have to also adjust the data pages offset at offset 80h, the EIP initial value at offset 1ch, and the code section data size in the object page table.
OFFSET +0 +4 +8 +C 00000000 4C580000 00000000 - 02000100 00000200 *LX..............* 00000010 00020000 01000000 - 01000000 38000000 *............8...* 00000020 01000000 24100000 - 6A0150E9 00000000 *....$...j.P?....* 00000030 12000000 00000000 - 20000000 00000000 *................* 00000040 7C000000 01000000 - 94000000 00000000 *|.......?.......* 00000050 00000000 6A12EBD0 - 8C000000 8D000000 *....j.???...?...* 00000060 58688800 0100EBEC - 99000000 A1000000 *Xh?...???...?...* 00000070 A7000000 01000000 - AB000000 24100000 *?.......?...$...* 00000080 28000000 03200000 - 01000000 01000000 *(.... ..........* 00000090 00000000 00000000 - 9A000000 00060000 *........?.......* 000000A0 00088104 00010503 - 4D534700 00100000 *..?.....MSG.....* 000000B0 49276D20 7265616C - 6C792073 6D616C6C *I'm really small* 000000C0 210A0000 - *!... *
Our quest for a minimal 32bit application that displays the "I'm really small!" string has come to an end. There are still "plenty" of unused areas in the LX header, but we have reached the minimum EXE size, so there is no need to go further in this direction.
The code we have obtained in Section 4.5 is working as expected, but its intent is far from being clear. By presenting it differently, we can address this shortcoming:
4C 58 00 00 00 00 00 00 02 00 01 00 00 000200000 2000 0010 0000 00 10 00 00 03 80 00 00 001 00 00 00 24 10 00 00 6A 01 50 E9 00 00 00 00 12 00 00 00 00 00 00 00 20 00 000000 00000 07 C0 00 00 00 10 00 00 09 40 00 00 00 00 00 00 00 00 00 00 06 A1 2E BD 08 C0 00 00 08 D0 00 00 05 86 8880 00100 EB EC 99000 00 0A 10 0000 0A 70 00 00 00 10 00 00 0A B000 000241000 0028 00 00 00 03 20 00 00 01 00 00 00 01 00 00 00 00 00 00 00 00 00 0000 9A 00 00 00000 60 00 00 0 0 88 10 40 00 10 50 34 D5 3 4 7 0 00 01 00 00 04 92 76 D2 07 26 5 6 16C 6C79 20 73 6D 616C6 C2 10 A0 0 0 0
(I used the default font as used by FontEdit to lay out the hexadecimal representation of the EXE file. The single quote shape has been slightly modified to fit the 196 bytes size. You can use a tool such as HexDump to recreate the EXE from this "source code".)
I thank Michal Necasek for initiating this miniaturization contest on comp.os.os2.programming.misc and for emulation. I thank Paul Ratcliffe for having been brave enough to announce his results first, and hence providing a real target to attain (well, OK, he also has bitten my for the "pure C" entry :-). And I finally thank Knut St. Osmundsen for discovering the 196 bytes limit, and for reaching it first.
[LXSPEC] | Linear eXecutable Module Format (LX). Revision 8. IBM Developer Toolbox. |
[ALP] | ALP Programming Guide and Reference. March 1997. IBM Developer Toolbox. |
[TOOLS] | Tools Reference. April 1997. IBM Developer Toolbox. |
0.92 | Initial public release |
0.90 | Initial release |