• Re: How to add the second (or other) languages

    From David Brown@21:1/5 to pozz on Mon Feb 17 09:57:35 2025
    On 16/02/2025 22:56, pozz wrote:
    Il 12/02/2025 20:50, David Brown ha scritto:
    On 12/02/2025 18:14, Stefan Reuther wrote:
    Am 12.02.2025 um 17:26 schrieb pozz:
    #if LANGUAGE_ITALIAN
    #  define STRING123            "Evento %d: accensione"
    #elif LANGUAGE_ENGLISH
    #  define STRING123            "Event %d: power up"
    #endif
    [...]
    Another approach is giving the user the possibility to change the
    language at runtime, maybe with an option on the display. In some
    cases,
    I have enough memory to store all the strings in all languages.

    Put the strings into a structure.

       struct Strings {
           const char* power_up_message;
       };

    I hate global variables, so I pass a pointer to the structure to every
    function that needs it (but of course you can also make a global
    variable).

    Then, on language change, just point your structure pointer elsewhere,
    or load the strings from secondary storage.

    One disadvantage is that this loses you the compiler warnings for
    mismatching printf specifiers.

    I know there are many possible solutions, but I'd like to know some
    suggestions from you. For example, it could be nice if there was some
    tool that automatically extracts all the strings used in the source
    code
    and helps managing more languages.

    There's packages like gettext. You tag your strings as
    'printf(_("Event %d"), e)', and the 'xgettext' command will extract them >>> all into a .po file. Other tools help you manage these files (e.g.
    'msgmerge'; Emacs 'po-mode'), and gcc knows how to do proper printf
    warnings.

    The .po file is a mapping from English to Whateverish strings. So you
    would convert that into some space-efficient resource file, and
    implement the '_' macro/function to perform the mapping. The
    disadvantage is that this takes lot of memory because your app needs to
    have both the English and the translated strings in memory. But unless
    you also use a fancy preprocessor that translates your code to
    'printf(getstring(STR123), e)', I don't see how to avoid that. In C++20, >>> you might come up with some compile-time hashing...

    I wouldn't use that on a microcontroller, but it's nice for desktop
    apps.


       Stefan


    You don't need a very fancy pre-processor to handle this yourself, if
    you are happy to make a few changes to the code.  Have your code use
    something like :

    #define DisplayPrintf(id, desc, args...) \
         display_printf(strings[language][string_ ## id], ## x)

    What is the final "## x"?

    It's a gcc extension that skips the extra comma if args is empty
    (combined with a typo in my post - "x" should have been "args").

    If you want to stick to standard C, C23 introduced the __VA_OPT__
    feature to handle this in a less convenient manner.




    Use it like :

         DisplayPrintf(event_type_on, "Event on", ev->idx);


    Other problems that came to my mind.

    There are many functions that accept "translatable" strings, not only DisplayPrintf(). Ok, I can write a macro for each of these functions.

    Yes.

    Or write a single macro for the translation, and use that within those functions:

    DisplayPrintf(trans(event_type_on, "Event on"), ev->idx);



    I could have other C instructions that let the task more complex. For example:

    char msg[32];
    sprintf(mymsg, "Ciao mondo");
    DisplayPrintf(hello_msg, mymsg);

    Python preprocessor isn't able to detect where is the string to translate.


    So don't write your code that way.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to pozz on Mon Feb 17 09:59:50 2025
    On 16/02/2025 23:15, pozz wrote:
    Il 12/02/2025 18:14, Stefan Reuther ha scritto:
    Am 12.02.2025 um 17:26 schrieb pozz:
    #if LANGUAGE_ITALIAN
    #  define STRING123            "Evento %d: accensione"
    #elif LANGUAGE_ENGLISH
    #  define STRING123            "Event %d: power up"
    #endif
    [...]
    Another approach is giving the user the possibility to change the
    language at runtime, maybe with an option on the display. In some cases, >>> I have enough memory to store all the strings in all languages.

    Put the strings into a structure.

       struct Strings {
           const char* power_up_message;
       };

    I hate global variables, so I pass a pointer to the structure to every
    function that needs it (but of course you can also make a global
    variable).

    Then, on language change, just point your structure pointer elsewhere,
    or load the strings from secondary storage.

    One disadvantage is that this loses you the compiler warnings for
    mismatching printf specifiers.

    I know there are many possible solutions, but I'd like to know some
    suggestions from you. For example, it could be nice if there was some
    tool that automatically extracts all the strings used in the source code >>> and helps managing more languages.

    There's packages like gettext. You tag your strings as
    'printf(_("Event %d"), e)', and the 'xgettext' command will extract them
    all into a .po file. Other tools help you manage these files (e.g.
    'msgmerge'; Emacs 'po-mode'), and gcc knows how to do proper printf
    warnings.

    The .po file is a mapping from English to Whateverish strings. So you
    would convert that into some space-efficient resource file, and
    implement the '_' macro/function to perform the mapping. The
    disadvantage is that this takes lot of memory because your app needs to
    have both the English and the translated strings in memory. But unless
    you also use a fancy preprocessor that translates your code to
    'printf(getstring(STR123), e)', I don't see how to avoid that. In C++20,
    you might come up with some compile-time hashing...

    I wouldn't use that on a microcontroller, but it's nice for desktop apps.

    In some projects keeping all the translated strings is not a problem.

    All the gettext tools seem good (xgettext, marking strings to translate
    in the source code, pot file, msginit, msgmerge, msgfmt, po files, mo
    files, ..) except the final step.

    mo files should be installed in a file-system and gettext library automatically loads the correct .mo file from a suitable path. All these things are impractical on microcontroller systems.

    Is it so difficult to import mo files as C const unsigned char arrays
    and implement the gettext() function to search strings from them?


    You know the answer... a little Python script that reads mo files and generates files with C constant arrays. You'd also probably need to
    make a few changes to the gettext language choice functions. (I've used gettext with big Python programs, but never in embedded C code.)

    Another approach could be to rewrite a custom msgfmt tool that converts
    a .po file into a simpler .mo file (or directly a .c file) that can be
    used by a custom gettext() function.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to pozz on Mon Feb 17 09:51:05 2025
    On 16/02/2025 19:59, pozz wrote:
    Il 12/02/2025 20:50, David Brown ha scritto:


    You don't need a very fancy pre-processor to handle this yourself, if
    you are happy to make a few changes to the code.  Have your code use
    something like :

    #define DisplayPrintf(id, desc, args...) \
         display_printf(strings[language][string_ ## id], ## x)

    Use it like :

         DisplayPrintf(event_type_on, "Event on", ev->idx);


    A little Python preprocessor script can chew through all your C files
    and identify each call to "DisplayPrintf".

    Little... yes, it would be little, but not simple, at least for me. How
    to write a correct C preprocessor in Python?

    You don't write a C preprocessor - that's the point.

    Tools like gettext have to handle any C code. That means they need to
    deal with situations with complicated macros, include files, etc.

    You don't need to do that when you make your own tools. You make the
    rules - /you/ decide what limitations you will accept in order to
    simplify the pre-processing script.

    So you would typically decide you only put these DisplayPrintf calls in
    C files, not headers, that you ignore all normal C preprocessor stuff,
    and that you keep each call entirely on one line, and that you'll never
    use the sequence "DisplayPrintf" for anything else. Then your Python preprocessor becomes :

    for this_line in open(filename).readlines() :
    if "DisplayPrintf" in line :
    handle(line)

    This is /vastly/ simpler than dealing with more general C code, without significant restrictions to you as the programmer using the system.

    If you /really/ want to handle include files, conditional compilation
    and all rest of it, get the C compiler to handle that - use "gcc -E" and
    use the output of that. Trying to duplicate that in your own Python
    code would be insane.


    This preprocessor should ingest a C source file after it is preprocessed
    by the standard C preprocessor for the specific build you are doing.

    For example, you could have a C source file that contains:

    #if BUILD == BUILD_FULL
      DisplayPrintf(msg, "Press (1) for simple process, (2) for advanced process");
      x = wait_keypress();
      if (x == '1') do_simple();
      if (x == '2') do_adv();
    #elif BUILD == BUILD_LIGHT
      do_simple();
    #endif


    The really simple answer is, don't do that.


    If I'm building the project as BUILD_FULL, there's at least one
    additional string to translate.

    The slightly more complex answer is that you end up with an extra string
    in one build or the other. Almost certainly, this is not worth
    bothering about. And if it is - say you have a large number of extra
    strings in a debug test build - then I'm sure you can find convenient
    ways to handle that. At a minimum, you'd probably not bother having
    translated versions but fall back to English.


    Another big problem is the Python preprocessor should understand C
    syntax; it shouldn't simply search for DisplayPrintf occurrences.

    Why not?

    For example:

    /* DisplayPrintf(old_string, "This is an old message"); */ DisplayPrintf(new_string, "This is a new message");

    Of course, only one string is present in the source file, but it's not
    simple to extract it.


    It's extremely simple to extract it. Remember - /you/ make the rules.
    If you don't want to bother skipping such commented-out lines, /you/
    pick a convenient way to do so. For example, you would decide that the
    opening comment token must be at the start of the white-space stripped
    line :

    if line.strip().startswith("/*") :
    return False

    if line.strip().startswith("//") :
    return False

    (I've been talking about Python here, because that's the language I use
    for such tools, and it's a very common choice. If you are not familiar
    with Python then you can obviously use any other language you like.)


    Or alternatively, have :

    #define XDisplayPrintf(...)

    And now your commenting system becomes :

    XDisplayPrintf(old_string, "This is an old message");
    DisplayPrintf(new_string, "This is a new message");

    The "XDisplayPrintf" can be inside comments or conditionally uncompiled
    code if you like. (You do have to filter out XDisplayPrintf bits from
    the earlier check for DisplayPrintf.)


    Thanks for the suggestion, the idea is great. However I'm not able to
    write a Python preprocessor that works well.


    Sure you can. You just have to redefine what you mean by "works well"
    to suit what you can write :-)


    For my own use, I probably wouldn't even bother handling commented-out
    strings. I have used this kind of technique for message translation and
    a variety of other situations.


    For more fun, you could switch to modern C++ and use user-defined
    literals combined with constexpr template variables to put together a
    system that is all within the one source language and is fully checked
    at compile-time. I'm not sure it would be clearer, however!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to pozz on Mon Feb 17 19:09:12 2025
    On 17/02/2025 16:05, pozz wrote:
    Il 17/02/2025 09:51, David Brown ha scritto:
    On 16/02/2025 19:59, pozz wrote:
    Il 12/02/2025 20:50, David Brown ha scritto:


    You don't need a very fancy pre-processor to handle this yourself,
    if you are happy to make a few changes to the code.  Have your code
    use something like :

    #define DisplayPrintf(id, desc, args...) \
         display_printf(strings[language][string_ ## id], ## x)

    Use it like :

         DisplayPrintf(event_type_on, "Event on", ev->idx);


    A little Python preprocessor script can chew through all your C
    files and identify each call to "DisplayPrintf".

    Little... yes, it would be little, but not simple, at least for me.
    How to write a correct C preprocessor in Python?

    You don't write a C preprocessor - that's the point.

    Tools like gettext have to handle any C code.  That means they need to
    deal with situations with complicated macros, include files, etc.

    You don't need to do that when you make your own tools.  You make the
    rules - /you/ decide what limitations you will accept in order to
    simplify the pre-processing script.

    So you would typically decide you only put these DisplayPrintf calls
    in C files, not headers, that you ignore all normal C preprocessor
    stuff, and that you keep each call entirely on one line, and that
    you'll never use the sequence "DisplayPrintf" for anything else.  Then
    your Python preprocessor becomes :

         for this_line in open(filename).readlines() :
             if "DisplayPrintf" in line :
                 handle(line)

    This is /vastly/ simpler than dealing with more general C code,
    without significant restrictions to you as the programmer using the
    system.

    If you /really/ want to handle include files, conditional compilation
    and all rest of it, get the C compiler to handle that - use "gcc -E"
    and use the output of that.  Trying to duplicate that in your own
    Python code would be insane.

    And this is the reason why it appeared to me a complex task :-)

    You're right, this is my own tool and I decide the rules. Many times I
    try to solve the complete and general problem when, in the reality, the border of the the problem is much smaller.

    The only drawback is that YOU (and all the developers that work on the project now and in the future) have to remember your own rules forever
    for that project.

    This is embedded development. It is not always easy or straightforward.
    When a problem seems difficult, re-arrange it or subdivide it into
    things that you /can/ solve. Here I've given one solution (of many
    possible solutions) - it makes some things easier, but also requires
    other changes. You can use a big, general solution like gettext and
    document how that should work in your development, or you can make a
    much smaller and simpler, but more limited, custom solution and document /that/. There are /always/ pros and cons, tradeoffs and balances in
    this game.



    This preprocessor should ingest a C source file after it is
    preprocessed by the standard C preprocessor for the specific build
    you are doing.

    For example, you could have a C source file that contains:

    #if BUILD == BUILD_FULL
       DisplayPrintf(msg, "Press (1) for simple process, (2) for advanced
    process");
       x = wait_keypress();
       if (x == '1') do_simple();
       if (x == '2') do_adv();
    #elif BUILD == BUILD_LIGHT
       do_simple();
    #endif


    The really simple answer is, don't do that.


    If I'm building the project as BUILD_FULL, there's at least one
    additional string to translate.

    The slightly more complex answer is that you end up with an extra
    string in one build or the other.  Almost certainly, this is not worth
    bothering about.

    Oh yes, but that was only an example. We can think of other scenarios
    where the preprocessor could change the string depending on the build.


    As the saying goes, you can burn that bridge when you come to it.
    Imagining all the possible ways things can go wrong or be complicated
    can be a lot more effort than getting a solution for the actual
    practical situation.


    I am not guaranteeing that my ideas here will be ideal for your needs.
    But it is roughly in the direction of a system that I have used
    successfully myself, and it's where I would start out in the situation
    you described. Hopefully it gives you a good starting point for your
    own solution - or at least something to compare to other potential
    solutions when judging them.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Reuther@21:1/5 to All on Mon Feb 17 19:00:43 2025
    Am 16.02.2025 um 23:15 schrieb pozz:
    Another approach could be to rewrite a custom msgfmt tool that converts
    a .po file into a simpler .mo file (or directly a .c file) that can be
    used by a custom gettext() function.

    That's precisely what I tried to suggest (and personally use).


    Stefan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Reuther@21:1/5 to All on Wed Feb 12 18:14:26 2025
    Am 12.02.2025 um 17:26 schrieb pozz:
    #if LANGUAGE_ITALIAN
    #  define STRING123            "Evento %d: accensione"
    #elif LANGUAGE_ENGLISH
    #  define STRING123            "Event %d: power up"
    #endif
    [...]
    Another approach is giving the user the possibility to change the
    language at runtime, maybe with an option on the display. In some cases,
    I have enough memory to store all the strings in all languages.

    Put the strings into a structure.

    struct Strings {
    const char* power_up_message;
    };

    I hate global variables, so I pass a pointer to the structure to every
    function that needs it (but of course you can also make a global variable).

    Then, on language change, just point your structure pointer elsewhere,
    or load the strings from secondary storage.

    One disadvantage is that this loses you the compiler warnings for
    mismatching printf specifiers.

    I know there are many possible solutions, but I'd like to know some suggestions from you. For example, it could be nice if there was some
    tool that automatically extracts all the strings used in the source code
    and helps managing more languages.

    There's packages like gettext. You tag your strings as
    'printf(_("Event %d"), e)', and the 'xgettext' command will extract them
    all into a .po file. Other tools help you manage these files (e.g.
    'msgmerge'; Emacs 'po-mode'), and gcc knows how to do proper printf
    warnings.

    The .po file is a mapping from English to Whateverish strings. So you
    would convert that into some space-efficient resource file, and
    implement the '_' macro/function to perform the mapping. The
    disadvantage is that this takes lot of memory because your app needs to
    have both the English and the translated strings in memory. But unless
    you also use a fancy preprocessor that translates your code to 'printf(getstring(STR123), e)', I don't see how to avoid that. In C++20,
    you might come up with some compile-time hashing...

    I wouldn't use that on a microcontroller, but it's nice for desktop apps.


    Stefan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Stefan Reuther on Wed Feb 12 20:50:18 2025
    On 12/02/2025 18:14, Stefan Reuther wrote:
    Am 12.02.2025 um 17:26 schrieb pozz:
    #if LANGUAGE_ITALIAN
    #  define STRING123            "Evento %d: accensione"
    #elif LANGUAGE_ENGLISH
    #  define STRING123            "Event %d: power up"
    #endif
    [...]
    Another approach is giving the user the possibility to change the
    language at runtime, maybe with an option on the display. In some cases,
    I have enough memory to store all the strings in all languages.

    Put the strings into a structure.

    struct Strings {
    const char* power_up_message;
    };

    I hate global variables, so I pass a pointer to the structure to every function that needs it (but of course you can also make a global variable).

    Then, on language change, just point your structure pointer elsewhere,
    or load the strings from secondary storage.

    One disadvantage is that this loses you the compiler warnings for
    mismatching printf specifiers.

    I know there are many possible solutions, but I'd like to know some
    suggestions from you. For example, it could be nice if there was some
    tool that automatically extracts all the strings used in the source code
    and helps managing more languages.

    There's packages like gettext. You tag your strings as
    'printf(_("Event %d"), e)', and the 'xgettext' command will extract them
    all into a .po file. Other tools help you manage these files (e.g. 'msgmerge'; Emacs 'po-mode'), and gcc knows how to do proper printf
    warnings.

    The .po file is a mapping from English to Whateverish strings. So you
    would convert that into some space-efficient resource file, and
    implement the '_' macro/function to perform the mapping. The
    disadvantage is that this takes lot of memory because your app needs to
    have both the English and the translated strings in memory. But unless
    you also use a fancy preprocessor that translates your code to 'printf(getstring(STR123), e)', I don't see how to avoid that. In C++20,
    you might come up with some compile-time hashing...

    I wouldn't use that on a microcontroller, but it's nice for desktop apps.


    Stefan


    You don't need a very fancy pre-processor to handle this yourself, if
    you are happy to make a few changes to the code. Have your code use
    something like :

    #define DisplayPrintf(id, desc, args...) \
    display_printf(strings[language][string_ ## id], ## x)

    Use it like :

    DisplayPrintf(event_type_on, "Event on", ev->idx);


    A little Python preprocessor script can chew through all your C files
    and identify each call to "DisplayPrintf". It can collect together all
    the id's and generate a header with something like :

    typedef enum {
    string_event_type_on, ...
    } string_index;
    enum { no_of_strings = ... };

    enum {
    lang_English, lang_Italian, ...
    } language_index;
    enum { no_of_languages = ... };

    extern language_index language; // global var :-)
    extern const char* strings[no_of_languages][no_of_strings];

    Then it will have a C file :

    #include "language.h"

    language_index language;
    const char* strings[no_of_languages][no_of_strings] = {
    { // English
    "Event %d: power up", // Event on
    ...
    }
    { // Italian
    "Evento %d: accensione", // Event on
    }
    }

    It would generate the strings based on language files:

    # english.txt
    event_type_on : Event %d: power up
    ...

    If the preprocessor finds a use of DisplayPrintf where the id (which can
    be as long or short as you want, but can't have spaces or awkward
    characters) does not match the description, it should give an error -
    duplicate uses of the same pair are skipped. (You could just use an id
    and no description if you prefer.)

    Any ids that are not in the language files will be printed out or put in
    a file, ids that are in the language files but not used in the program
    will give warnings, etc.

    It can all be done in a manner that makes it easy to get right, hard to
    get wrong, and will not cause trouble as strings are added or removed.

    It would be a lot simpler than gettext, and use minimal runtime space
    and time. And it should be straightforward to change if you want to
    have string tables stored externally or something like that. (I've made systems with string tables in an external serial eprom, for example.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Niocl=C3=A1i=C5=BF=C3=ADn@21:1/5 to All on Thu Feb 13 22:51:10 2025
    Pozz ha scritto:
    "Another approach is giving the user the possibility to change the
    language at
    runtime, maybe with an option on the display."

    Ciao!

    This is a good idea.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)