ARACHNE TODOLIST - PROPOSED SOFTWARE CHANGE DETAILS ----------------------------------------------------------------------------------------------------- Item numbers refer to the 2DoList table. ITEM_17 Table look-up with negative index TOOLBAR.TB messages are not displayed properly when translated into Russian. I tried immediately with Polish messages and got the same result, and immediately guessed the reason: characters with codes above 127 (i.e., so-called national characters) are not handled properly in GUIDRAW.C function 'grabstr'. The problem is far more general: signed vs. unsigned 'char' type. Arachne is compiled with the default setting of signed 'char' type (-K- option). In most cases, it does not matter, but it does when macros like 'isalnum', 'isdigit', 'isspace', etc. (all defined in CTYPE.H) get arguments 0x80..0xFF, which are treated as -0x80..-1 when 'char' is signed. These macros work by looking up a table '_ctype[]' (stored in CTYPE.OBJ inside C*.LIB), which defines characters 0x80..0xFF as neither spaces, nor controls, nor digits, etc., but the table requires a positive index. When the index is negative, some other variables, immediately preceding '_ctype' in the data segment, are read and tested with obviously undefined results. (As always, the OpenWatcom Disassembler (WDIS) was helpful.) By the way, there is a documentation bug in CTYPE.H: the line #define _IS_HEX 16 /* [0..9] or [A-F] or [a-f] */ should read #define _IS_HEX 16 /* [A-F] or [a-f] */ because '_ctype[]' defines characters '0'..'9' as _IS_DIGIT (= 2), not as _IS_DIGIT | _IS_HEX; see also '#define isxdigit'. I changed all 'char' variables in 'grabstr' to 'unsigned char' and it helped: now the Polish messages in the toolbar are displayed correctly! By the way, all that means that a Czech browser would also break on a toolbar with (at least some) Czech messages. Having fixed (well, too much said: quick-and-dirty fix with compiler warnings about mixing signed and unsigned 'char' pointers), I looked for other possibly dangerous uses of is...() macros: isalnum: HTKERNEL.C PARSECSS.C (I am not sure whether a character above 127 can happen there, but...) isdigit: LINGLUE.H (anyone cares?) isspace: GUIDRAW.C (described above) (There is also something in CLEMGLUE.H which we throw away.) Also, some 'char' fields of the following structures (maybe also some others?) would be better replaced with 'unsigned char', I think: struct ib_editor (IE.H) struct Page (ARACHNE.H) I would not recommend the easiest solution (i.e., recompiling everything with 'signed char' as default (-K)), as we do not know whether the code does not rely upon 'char' being signed at some place. Anyway, it is reasonable to (define and) use types like 'byte' (= 'unsigned char'), 'short' (= 'signed char'), 'word' (= 'unsigned int') or similar. I overlooked some occurences. Here is the full list (I hope). The line numbers refer to original sources (do we all have the current Ray's cleaned ones?): isalnum: CGIQUERY.C #605 [no problem here, unsigned char used], HTKERNEL.C #267, IE_KEY.C #713 #736, PARSECSS.C #211 #216 isalpha: HTML.C #652 isdigit: LINGLUE.C #194 [no problem], CLEMGLUE.C #131 #144 [no problems] isspace: GUIDRAW.C #980 #982 #989, CLEMGLUE.C #465 I put '(unsigned char)' typecast in all potentially weak places (except of CLEMGLUE.C) as this requires little changes and does not produce compiler warnings about mixed types. Another possibility would be to globally redefine all these macros, like #define isalnum(c) \ (_ctype[(unsigned char) (c) + 1] & (_IS_DIG | _IS_UPP | _IS_LOW)) instead of the default #define isalnum(c) (_ctype[(c) + 1] & (_IS_DIG | _IS_UPP | _IS_LOW)) But redefining standard C macros may mean asking for trouble. On the other hand, I just checked that OpenWatcom C defines them 'correctly' like (H\CTYPE.H) #define isalnum(__c) \ ((unsigned char)(_IsTable[((unsigned char)(__c))+1]) \ & (_LOWER|_UPPER|_DIGIT)) [all linebreaks inserted by me due to e-mail format restrictions]