gulik

 

SecureSqueak Localisation

Page history last edited by Michael van der Gulik 4 mos ago

Essentially, Localisation in SecureSqueak is going to be implemented by add-on packages. As little localisation as possible will occur in the kernel because it cannot be updated there.

 

Issues

  • Character sort order often differs by country and alphabet.
  • String sort order offen differs by country and alphabet.
  • Number formatting can differ between and within countries:
    • There are multiple ways of representing negative numbers; scientists and mathematicians use a minus sign, accountants use parenthesis, and there are even more formats.
    • Different countries put commas, dots and other thingies in their numbers for decimal points and thousands separators.
    • Number>>asWords returns an English description.
    • There are multiple ways of notating other bases such as binary, hexadecimal, etc.
  • Dates, times, durations, periods etc have dozens of ways of being formatted and handled.
  • Keyboard layouts differ; character events need to come into Subcanvas as Unicode or something.
  • If fonts are going to be supported by Subcanvas, there are so many complications:
    • Right-to-left text,
    • ligatures which are required by some cultures,
    • ligatures which are made by context,
    • missing characters in the font,

 

Solutions (or ideas)

The ideal solution would be to have any mechanism that can be localised and complex be remotely loadable code.

  • Date, time, duration can be in an externally loaded package. The kernel only needs to provide a millisecond value since the epoch.
  • Canvas targets can implement their own keyboard layouts by processing raw keyboard events. Currently the keyboard events from the VM do (handily) provide Unicode characters. These should perhaps not return a character instance though...
  • Character and String classes should (if possible) be in externally loaded packages.
  • If Character and String remain in the kernel, Character and String ordering is done strictly by numerical Unicode value.
  • Number formatting in >>asString returns a Smalltalk number format.
  • Dates et al are removed from SecureSqueak and are to be provided in an external package. The SecureSqueak kernel provides a method somewhere to return the number of milli/nanoseconds since 1970.
  • Number>>asWords should be removed. Replace with external NumberFormatter>>asWords: aNumber.
  • As many English strings as possible are to be removed from the kernel. Error and informative messages should be returned as a code, type of Exception, or something. If this becomes unwieldy, then some localisation of the kernel could be investigated.
  • Ideally, the locale is determined per user rather than by VM. This means the user's information should be made available to applications and is not part of the SecureSqueak kernel.
  • Optionally, the locale is determined by the operating system (??? maybe?)
  • Fonts are managed by external packages; SecureSqueak has no fonts in the kernel.
  • Geographic location and locale are separate: travellers, ex-pats and con-langers have different locales than the people around them.

 

In summary:

  • No English.
  • No fonts.
  • >>asString returns Smalltalk formatted numbers, characters, etc.

 

Links

http://en.wikipedia.org/wiki/Internationalization_and_localization

http://www-01.ibm.com/software/globalization/index.jsp

http://msdn.microsoft.com/en-us/goglobal/bb688110.aspx

http://www.unicode.org/versions/Unicode5.0.0/ch05.pdf

 

Comments (0)

You don't have permission to comment on this page.