NetHack in other languages

From NetHackWiki
Revision as of 13:33, 5 February 2013 by Bhaak (talk | contribs) (Character sets: typo)
Jump to navigation Jump to search

NetHack's text output is in English. Although the program's structure does not easily lend itself to localization (since morphological features of English are hard-wired into the source code on all levels), several localization projects currently exist.

German

Tony Crawford and Karl Breuer have developed a German localized version called NetzHack (note the 'z'), which runs on Linux, *BSD, and OS X (console and X11), and on Win32 (console and Windows graphics). Source and binaries available here.

A different German translation attempt by Patric Mueller called NetHack-De was released as a playable, although incomplete, alpha release on 11 October 2007. The latest release includes source code, a Debian package and a graphical Windows binary.

Japanese

The Japanese version JNetHack by Issei Numata has been in existence for several years. For those who don't read Japanese, there's some outdated information in English at jnethack.org.

Sourceforge.jp also carries a JSlash'em, JSporkHack and JUnNetHack as well as a Japanese NetHack Resources Project.

NetHack brass can be compiled as an English or Japanese version.

Spanish

Ray Chason has published Internationalized NetHack as a work in progress. It presently supports English and Spanish, and will eventually supersede Spanish NetHack.

Incomplete or stalled translations

On January 28th 2009 a Chinese translation called nethack-cn was begun on Google Code but the last update was on June 25th 2009.

A SourceForge project for a French translation called nethack-fr was registered on August 6th 2009. The last update was on October 29th 2009. There is a French translation of the guidebook and some spoilers.

The first commit of GitHub project for a Italian translation called nethack-it was on December 4th 2009. The last commit so far was on January 27th 2010.

A SourceForge project for a Portuguese translation of NetHack and Slash'Em was registered on May 3rd 2004 but this was also the only activity on that project.

Localization strategies

Translations of messages

The usual approach is to substitute hard-coded messages in the target language for the hard-coded messages in English. All known translations take this approach, except Internationalized NetHack, which uses Gettext together with a scriptable printf-like system to handle the grammar bits.

Gettext supports plurals, but this is not adequate for NetHack, which also must deal with verb tenses and person inflections, noun cases, and gender. Some languages have mandatory contractions (Spanish "a el" -> "al", "de el" -> "del") or words that change form according to what precedes or follows (English "a" and "an"), or may change word order under certain conditions:

  • "¡Idefix golpea al orco!" (subject and object are both nouns)
  • "¡Idefix lo golpea!" (object is a pronoun, and goes before the verb)
  • "¡Golpeas al orco!" (subject is a pronoun ("tú") and is omitted; verb changes to second person singular)
  • "¡Lo golpeas!" (both modifications apply)

The message generation must also correctly capitalize after such rules are applied.

Spanish NetHack handles these rules by coding special routines to handle them, much as the unpatched NetHack does. NetHack-i18n encodes such rules in two ways:

  • by extending the printf-like syntax to include formatters such as %3${g/handsome/beautiful}, where the number after the % is a parameter number (this is a POSIX extension to printf) and the part between the braces is interpreted by a Ruby script; and
  • by defining "joining rules" at the start and end of each substitution, to handle mandatory contractions and such rules as "a/an".

Monster and object names

The build process invokes a program called makedefs, which (among other things) generates two files, include/pm.h and include/onames.h, bearing preprocesor symbols based on the names of the objects and monsters defined in objects.c and monst.c, respectively. Changing the names of the objects will change the symbols and almost every other part of NetHack will then fail to build.

Spanish NetHack and NetHack-de replace each string in the monster and object tables with a preprocessor symbol, and provide headers to substitute the names for these symbols, so that they can build distinct versions of objects.o and monst.o with the names in English and in the target language. NetHack-i18n, because it has Gettext available, leaves the monster and object tables in English and converts them at run time. A third approach might be to bite the bullet and replace the preprocessor symbols with their translated versions. No known translation takes this approach.

Input parsing

The largest problem here is support for wishes. Any translation will have to rewrite readobjnam in objnam.c to parse an object name according to the rules of the target language.

NetHack-i18n first removes the dungeon feature wishes, replacing them with a new extended command, called "dfeature" in the English locale; and then splits the rest into a parser, which is placed in the Ruby script, and a rule-enforcer, which remains in the core code.

Character sets

ASCII is inadequate for most languages other than English. All translations use a larger character set for messages. Case mappings and fuzzy matches for wishes and other inputs must take the character set into account; if the user wishes for "cota de escamas de dragon gris", he should get a gray dragon scale mail, even though the correct spelling is "dragón".

JNetHack uses EUC-JP, with tests in the code to detect if the source has been converted to Shift-JIS; EUC-JP is adapted for Unix-like environments, and Shift-JIS for Microsoft Windows.

Spanish NetHack encodes all messages in ISO-8859-1, while leaving the map symbols in code page 437. Reduced IBMgraphics modes are available for users who do not have code page 437 configured. Slight hackery is needed to support the different character sets, because map symbols can appear outside the map in three places:

As NetHack-i18n is meant to be indifferent to language, it uses Unicode throughout. Any user input is encoded in Unicode, and user interfaces are expected to support it. The TTY interface is abandoned in favor of a modified Curses interface, and the Curses library must support wide characters.

NetHack-De encodes all messages in ISO 8859-1. Because of this IBMgraphics doesn't work (as IBMgraphics is using a different character set) although DECgraphics does. User wishes are normalized before being parsed so that the user can enter wishes (example: Armor) as ASCII: "ruestung" (German umlauts entered using the German transcription rules), or in ISO-8859-1: "Rüstung" or UTF-8: "Rästung" (part of the preliminary UTF-8 support, a UTF-8 capable terminal would show "Rüstung" but the rest of Nethack-De's messages would have broken umlauts).