• Re: tcl versa python regarding performance

    From aotto1968@21:1/5 to All on Mon Aug 19 14:26:39 2024
    On 19.08.24 00:29, et99 wrote:
    On 8/18/2024 1:42 PM, aotto1968 wrote:

    add some documentation regarding the performance testing:
    http://thedev.nhi1.de/theLink/main/md_docs_2main_2README__PERFORMANCE.htm


    I recently wrote some C code using Visual Studio 2022 and they have a wonderful performance profiler. I was able to determine
    that 80% of the cost of the module I was developing was caused by calls to some library routines I was using. By writing my own
    versions that didn't need to be so generalized, I got that down to 10%.

    One problem was that once I turned on the compiler optimization, the profiler became pretty much worthless to measure my own
    code's performance, so I couldn't get that 10% any lower.

    But it would be kinda cool to try using those VS tools on the tcl source code, but I don't know of any way to build tcl inside
    VS where one could use those tools.


    With the callgrind tool on linux you can analyze any kind of executable, even executable's without symbols compiled in.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From aotto1968@21:1/5 to All on Mon Aug 26 19:58:09 2024
    Now it's that time again and I've put a small C++ project in between to slowly but surely make better use of the "kernel" via
    the C++ compiler (and some features like templates etc.).

    http://thedev.nhi1.de/NHI1/main/index.htm

    The performance code has been revised again and the unnecessary TCP tests are placed behind the UDS tests. With the C++ "agile"
    kernel, C++ is now on a par with C, while using a much more user-friendly programming interface.

    http://thedev.nhi1.de/theLink/main/md_docs_2main_2README__PERFORMANCE.htm#README_PERFORMANCE

    Also important, the ALC (All-Language Compiler) compiler was written in TCL.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From aotto1968@21:1/5 to All on Tue Aug 13 22:40:04 2024
    Hi, some (unproven) statistics from my SW regarding the performance TCL versa PYTHON

    The --send ... send packages
    The --parent/child ... measure startup time
    The other ... build data structures

    Tcl is except --parent (startup) slower than python → I think the CORE problem is the OO implementation in TCL

    → the basic technology for TCL and PYTHON is a OO wrapper around the C-library this mean the BASIC workload
    for TCL & PYTHON is the same and the TIME difference is just the TCL/PYTHON overload

    TCL
    ===

    setup=release
    > feature=tcl_pipe
    > .../release/inst/sbin/x86_64-suse-linux-gnu-perfclient --timeout 2 --all @ $TCLSH
    .../release/inst/sbin/x86_64-suse-linux-gnu-perfserver-tcl.tcl
    :PerfClientExec }: start ------------------------ : result [ count / sec ]
    :statistics }: --send : 216004.5 [ 432206 / 2.000912 ]
    :statistics }: --send-string : 224700.4 [ 449614 / 2.000949 ]
    :statistics }: --send-and-callback : 121014.5 [ 242218 / 2.001561 ]
    :statistics }: --send-and-wait : 58694.0 [ 117389 / 2.000015 ]
    :statistics }: --send-persistent : 13358.7 [ 26718 / 2.000046 ]
    :statistics }: --parent : 82.5 [ 165 / 2.000311 ]
    :statistics }: --child : 21321.3 [ 42643 / 2.000022 ]
    :statistics }: --bus : 40133.7 [ 80268 / 2.000017 ]
    :statistics }: --bfl : 41333.4 [ 82667 / 2.000005 ]
    :statistics }: --bin : 264818.7 [ 529687 / 2.000187 ]
    :statistics }: --str : 265383.5 [ 530867 / 2.000377 ]
    :PerfClientExec }: end: ----------------------------------------

    PYTHON
    ======

    setup=release
    > feature=py_pipe
    > .../release/inst/sbin/x86_64-suse-linux-gnu-perfclient --timeout 2 --all @ $PYTHON
    .../release/inst/sbin/x86_64-suse-linux-gnu-perfserver-py.py
    :PerfClientExec }: start ------------------------ : result [ count / sec ]
    :statistics }: --send : 292415.7 [ 584872 / 2.000139 ]
    :statistics }: --send-string : 294627.4 [ 589494 / 2.000812 ]
    :statistics }: --send-and-callback : 154291.6 [ 308617 / 2.000219 ]
    :statistics }: --send-and-wait : 73577.8 [ 147156 / 2.000005 ]
    :statistics }: --send-persistent : 13796.6 [ 27594 / 2.000058 ]
    :statistics }: --parent : 71.0 [ 142 / 2.000464 ]
    :statistics }: --child : 21959.2 [ 43919 / 2.000024 ]
    :statistics }: --bus : 67991.3 [ 135983 / 2.000005 ]
    :statistics }: --bfl : 65537.4 [ 131075 / 2.000004 ]
    :statistics }: --bin : 326737.6 [ 653632 / 2.000480 ]
    :statistics }: --str : 327746.7 [ 655625 / 2.000402 ]
    :PerfClientExec }: end: ----------------------------------------


    This is the reference in C only
    ===============================

    setup=release
    > feature=c_pipe
    > .../release/inst/sbin/x86_64-suse-linux-gnu-perfclient --timeout 2 --all @
    .../release/inst/sbin/x86_64-suse-linux-gnu-perfserver-c
    :PerfClientExec }: start ------------------------ : result [ count / sec ]
    :statistics }: --send : 372049.5 [ 744214 / 2.000309 ]
    :statistics }: --send-string : 388248.2 [ 776648 / 2.000390 ]
    :statistics }: --send-and-callback : 221224.1 [ 442541 / 2.000420 ]
    :statistics }: --send-and-wait : 87320.3 [ 174641 / 2.000004 ]
    :statistics }: --send-persistent : 15245.3 [ 30491 / 2.000024 ]
    :statistics }: --parent : 552.4 [ 1105 / 2.000235 ]
    :statistics }: --child : 35888.9 [ 71778 / 2.000004 ]
    :statistics }: --bus : 80124.6 [ 160250 / 2.000011 ]
    :statistics }: --bfl : 86655.9 [ 173312 / 2.000003 ]
    :statistics }: --bin : 400718.2 [ 801590 / 2.000383 ]
    :statistics }: --str : 396122.8 [ 792369 / 2.000312 ]
    :PerfClientExec }: end: ----------------------------------------

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From undroidwish@21:1/5 to All on Tue Aug 13 22:51:19 2024
    On 8/13/24 22:40, aotto1968 wrote:

    Hi, some (unproven) statistics from my SW regarding the performance TCL
    ^^^^^^^^

    Exactly. I tend to go even further and add the attribute useless to
    unproven as long as you publish some numbers with some subjective
    analysis without presenting the implementation and the measurement
    method.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From aotto1968@21:1/5 to undroidwish on Wed Aug 14 11:02:53 2024
    On 13.08.24 22:51, undroidwish wrote:
    On 8/13/24 22:40, aotto1968 wrote:

    Hi, some (unproven) statistics from my SW regarding the performance TCL
                ^^^^^^^^

    Exactly. I tend to go even further and add the attribute useless to
    unproven as long as you publish some numbers with some subjective
    analysis without presenting the implementation and the measurement
    method.

    not really, with "aggressive" optimization the TCL is doing better, but not close to PYTHON

    setup=release
    > feature=cc_pipe
    > .../release/inst/sbin/x86_64-suse-linux-gnu-perfclient --timeout 2 --send @
    .../release/inst/sbin/x86_64-suse-linux-gnu-perfserver-cc
    : start ------------------------ : result [ count / sec ]
    : --send : 387331.8 [ 774702 / 2.000099 ]
    : end: ----------------------------------------
    > feature=c_pipe
    > .../release/inst/sbin/x86_64-suse-linux-gnu-perfclient --timeout 2 --send @
    .../release/inst/sbin/x86_64-suse-linux-gnu-perfserver-c
    : start ------------------------ : result [ count / sec ]
    : --send : 390295.8 [ 780623 / 2.000081 ]
    : end: ----------------------------------------
    > feature=py_pipe
    > .../release/inst/sbin/x86_64-suse-linux-gnu-perfclient --timeout 2 --send @ $PYTHON
    .../release/inst/sbin/x86_64-suse-linux-gnu-perfserver-py.py
    : start ------------------------ : result [ count / sec ]
    : --send : 284371.7 [ 568775 / 2.000111 ]
    : end: ----------------------------------------
    > feature=tcl_pipe
    > .../release/inst/sbin/x86_64-suse-linux-gnu-perfclient --timeout 2 --send @ $TCLSH
    .../release/inst/sbin/x86_64-suse-linux-gnu-perfserver-tcl.tcl
    : start ------------------------ : result [ count / sec ]
    : --send : 215990.2 [ 432027 / 2.000216 ]
    : end: ----------------------------------------

    setup=aggressive
    > feature=cc_pipe
    > .../aggressive/inst/sbin/x86_64-suse-linux-gnu-perfclient --timeout 2 --send @
    .../aggressive/inst/sbin/x86_64-suse-linux-gnu-perfserver-cc
    : start ------------------------ : result [ count / sec ]
    : --send : 398688.2 [ 797433 / 2.000142 ]
    : end: ----------------------------------------
    > feature=c_pipe
    > .../aggressive/inst/sbin/x86_64-suse-linux-gnu-perfclient --timeout 2 --send @
    .../aggressive/inst/sbin/x86_64-suse-linux-gnu-perfserver-c
    : start ------------------------ : result [ count / sec ]
    : --send : 401113.0 [ 802377 / 2.000376 ]
    : end: ----------------------------------------
    > feature=py_pipe
    > .../aggressive/inst/sbin/x86_64-suse-linux-gnu-perfclient --timeout 2 --send @ $PYTHON
    .../aggressive/inst/sbin/x86_64-suse-linux-gnu-perfserver-py.py
    : start ------------------------ : result [ count / sec ]
    : --send : 286609.9 [ 573378 / 2.000552 ]
    : end: ----------------------------------------
    > feature=tcl_pipe
    > .../aggressive/inst/sbin/x86_64-suse-linux-gnu-perfclient --timeout 2 --send @ $TCLSH
    .../aggressive/inst/sbin/x86_64-suse-linux-gnu-perfserver-tcl.tcl
    : start ------------------------ : result [ count / sec ]
    : --send : 237457.9 [ 475001 / 2.000359 ]
    : end: ----------------------------------------

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From aotto1968@21:1/5 to All on Wed Aug 14 11:16:03 2024
    the first analyses is quite simple:

    right now python does NOT support threads in NHI1 (will change soon) and tcl does…
    this has an influence on the "release" build because this is NHI1 without threads in python and with
    threads in tcl.

    → the difference is that the thread-local-storage is an STATIC REFERENCE in python and a POINTER in tcl.

    → the "aggressive" build does NOT use threads at all and the change between python and tcl is more compare-able
    but is still ~20%

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From aotto1968@21:1/5 to All on Wed Sep 11 22:05:00 2024
    Quick update, I removed the "--enable-symboles" from TCL in the release build and the performance has improved significantly but
    it is still ~25% behind PYTHON and JAVA. Apparently "--enable-symboles" has a significant impact on performance in TCL, which is
    good to know because it is often delivered with "--enable-symboles" in order to better analyze an error later during operation.

    x86_64-suse-linux-gnu | send send send send create create data data
    2024-09-11 21:54:55 | NOTHING END CALLBACK WAIT PARENT CHILD BUS BFL
    ------------------------- | -------- -------- -------- -------- --------- -------- -------- --------

    pipe:
    R: C | 530275 400403 222971 90707 3859 37965 89818 81581
    R: C++ | 528852 396473 219499 89816 2470 36635 89619 89994
    R: Python | 492501 304463 159169 73570 101 22109 68875 66767
    R: Tcl | 402439 236730 127712 59048 133 24002 44337 43762
    R: Java | 474683 313162 170157 79324 69 19772 72242 72031


    x86_64-suse-linux-gnu | send send send send create create data data
    2024-09-11 21:54:55 | NOTHING END CALLBACK WAIT PARENT CHILD BUS BFL
    ------------------------- | -------- -------- -------- -------- --------- -------- -------- --------

    uds_fork:
    R: C | 524971 396466 224320 89966 11814 38479 83146 90475
    R: C++ | 520668 390057 216610 89014 8712 36018 89073 89726
    R: Python | 494415 314163 160014 75356 344 22197 69082 67205
    R: Tcl | na. na. na. na. na. na. na. na.
    R: Java | na. na. na. na. na. na. na. na.


    x86_64-suse-linux-gnu | send send send send create create data data
    2024-09-11 21:54:55 | NOTHING END CALLBACK WAIT PARENT CHILD BUS BFL
    ------------------------- | -------- -------- -------- -------- --------- -------- -------- --------

    uds_thread:
    R: C | 504425 375809 212943 88359 32173 37978 87952 88606
    R: C++ | 494135 365464 205933 88317 31582 35011 88070 73875
    R: Python | na. na. na. na. na. na. na. na.
    R: Tcl | 390334 224660 123795 63402 139 24137 44490 38410
    R: Java | 463538 309542 161559 79059 19282 19779 72112 71591


    x86_64-suse-linux-gnu | send send send send create create data data
    2024-09-11 21:54:55 | NOTHING END CALLBACK WAIT PARENT CHILD BUS BFL
    ------------------------- | -------- -------- -------- -------- --------- -------- -------- --------

    uds_spawn:
    R: C | 530643 399273 228727 90660 3795 37906 89667 90450
    R: C++ | 522584 389941 218076 89381 2473 36351 88390 89105
    R: Python | 494427 312230 137856 70085 101 22259 68911 66719
    R: Tcl | 402035 234600 126871 63312 134 22768 41101 39604
    R: Java | 475697 306405 161973 79067 68 18112 67439 66953

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From aotto1968@21:1/5 to All on Wed Sep 11 21:40:32 2024
    The really impressive thing about the result of the new JAVA performance test is that PYTHON (a scripting language) is on the
    SAME performance level as JAVA (a compiled language). PYTHON has obviously invested heavily in performance optimization.

    x86_64-suse-linux-gnu | send send send send create create data data
    2024-09-11 21:27:41 | NOTHING END CALLBACK WAIT PARENT CHILD BUS BFL
    ------------------------- | -------- -------- -------- -------- --------- -------- -------- --------

    pipe:
    R: C | 530275 400403 222971 90707 3859 37965 89818 81581
    R: C++ | 528852 396473 219499 89816 2470 36635 89619 89994
    R: Python | 492501 304463 159169 73570 101 22109 68875 66767
    R: Tcl | 306202 144504 78180 49443 81 18337 25233 25242
    R: Java | 474683 313162 170157 79324 69 19772 72242 72031


    x86_64-suse-linux-gnu | send send send send create create data data
    2024-09-11 21:27:41 | NOTHING END CALLBACK WAIT PARENT CHILD BUS BFL
    ------------------------- | -------- -------- -------- -------- --------- -------- -------- --------

    uds_fork:
    R: C | 524971 396466 224320 89966 11814 38479 83146 90475
    R: C++ | 520668 390057 216610 89014 8712 36018 89073 89726
    R: Python | 494415 314163 160014 75356 344 22197 69082 67205
    R: Tcl | na. na. na. na. na. na. na. na.
    R: Java | na. na. na. na. na. na. na. na.


    x86_64-suse-linux-gnu | send send send send create create data data
    2024-09-11 21:27:41 | NOTHING END CALLBACK WAIT PARENT CHILD BUS BFL
    ------------------------- | -------- -------- -------- -------- --------- -------- -------- --------

    uds_thread:
    R: C | 504425 375809 212943 88359 32173 37978 87952 88606
    R: C++ | 494135 365464 205933 88317 31582 35011 88070 73875
    R: Python | na. na. na. na. na. na. na. na.
    R: Tcl | 296177 139328 75300 47983 82 18166 24879 24783
    R: Java | 463538 309542 161559 79059 19282 19779 72112 71591


    x86_64-suse-linux-gnu | send send send send create create data data
    2024-09-11 21:27:41 | NOTHING END CALLBACK WAIT PARENT CHILD BUS BFL
    ------------------------- | -------- -------- -------- -------- --------- -------- -------- --------

    uds_spawn:
    R: C | 530643 399273 228727 90660 3795 37906 89667 90450
    R: C++ | 522584 389941 218076 89381 2473 36351 88390 89105
    R: Python | 494427 312230 137856 70085 101 22259 68911 66719
    R: Tcl | 315789 147933 79493 49956 79 17944 23599 25445
    R: Java | 475697 306405 161973 79067 68 18112 67439 66953

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From aotto1968@21:1/5 to Gerald Lester on Thu Aug 15 15:04:10 2024
    On 14.08.24 14:04, Gerald Lester wrote:
    On 8/14/24 04:16, aotto1968 wrote:
    the first analyses is quite simple:

    right now python does NOT support threads in NHI1 (will change soon) and tcl does…
    this has an influence on the "release" build because this is NHI1 without threads in python and with
    threads in tcl.

    → the difference is that the thread-local-storage is an STATIC REFERENCE in python and a POINTER in tcl.

    → the "aggressive" build does NOT use threads at all and the change between python and tcl is more compare-able
    but is still ~20%


    I think the point that androwish was making, without seeing the code we can not tell if you did something in a way that takes
    more time than doing it in a slightly different way.


    I use the kcachegrind to debug the performance but there are a lot of "small" points to end-up in the ~20% loss against python.

    I cannot post a "picture" because the "newsgroup does NOT accept pictures …
    must of the code is in the TCL-C-Api for example:

    Example my "ServiceCall" function: at the end of a service call I use:

    if (ret == TCL_OK) {
    Tcl_ResetResult(interp);
    return MkErrorGetCode_0E();
    }

    and this simple "Tcl_ResetResult" eat 0,8% of the total performance → this is 75% of my "ServiceCall" performance.

    not trivial, it seems that the Python people with a lot of “manpower” have already MAXIMIZED the optimization of Python.

    If I step in Tcl_ResetResult the highlight is:

    % eat Total performance -> function name
    1,09% -> ResetObjectResult
    0,54% -> FreeByteArrayInternalRep (this object is variable size around ~ 1000 bytes)

    mfg ao

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From aotto1968@21:1/5 to All on Thu Aug 15 20:43:28 2024
    On 15.08.24 20:27, aotto1968 wrote:

    To be more precise I add an image to show the differences TCL versa PYTHON on an the example wrapper function

    ReadI8

    This is from the debugging environment with tcl/py & extension compiled in debug mode.

    https://i.postimg.cc/NjXccdRC/performance-check-tcl-versa-python.png



    better link → with callgraph https://i.postimg.cc/TYbNKXrn/performance-check-tcl-versa-python.png

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From aotto1968@21:1/5 to All on Thu Aug 15 21:09:40 2024
    On 15.08.24 20:43, aotto1968 wrote:
    On 15.08.24 20:27, aotto1968 wrote:

    To be more precise I add an image to show the differences TCL versa PYTHON on an the example wrapper function

    ReadI8

    This is from the debugging environment with tcl/py & extension compiled in debug mode.

    https://i.postimg.cc/NjXccdRC/performance-check-tcl-versa-python.png



    better link → with callgraph https://i.postimg.cc/TYbNKXrn/performance-check-tcl-versa-python.png

    even better resolution: https://i.postimg.cc/wvpJV4QC/performance-check-tcl-versa-python.png

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From aotto1968@21:1/5 to All on Thu Aug 15 20:27:50 2024
    To be more precise I add an image to show the differences TCL versa PYTHON on an the example wrapper function

    ReadI8

    This is from the debugging environment with tcl/py & extension compiled in debug mode.

    https://i.postimg.cc/NjXccdRC/performance-check-tcl-versa-python.png

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From aotto1968@21:1/5 to All on Thu Aug 15 21:18:29 2024
    On 15.08.24 21:09, aotto1968 wrote:
    On 15.08.24 20:43, aotto1968 wrote:
    On 15.08.24 20:27, aotto1968 wrote:

    To be more precise I add an image to show the differences TCL versa PYTHON on an the example wrapper function

    ReadI8

    This is from the debugging environment with tcl/py & extension compiled in debug mode.

    https://i.postimg.cc/NjXccdRC/performance-check-tcl-versa-python.png



    better link → with callgraph
    https://i.postimg.cc/TYbNKXrn/performance-check-tcl-versa-python.png

    even better resolution: https://i.postimg.cc/wvpJV4QC/performance-check-tcl-versa-python.png

    bad that I can not EDIT old data of this news message… the problem is that the "postimage" stuff
    changes the resolution of the image → bad

    I switch to the good old Facebook to post this screenshot and wait for your comment.

    https://www.facebook.com/share/p/wihmQPR4pBRacLLF/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From aotto1968@21:1/5 to All on Thu Aug 15 23:48:37 2024
    a short conclusion from Facebook …

    "If you analyze the C lib wrapper for MqReadI8, the TCL code adds about 200% wrapper load and the PYTHON code adds about 10%
    wrapper load." (ref: https://www.facebook.com/share/p/wihmQPR4pBRacLLF/)

    → I think TCL has an "performance-problem".

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From aotto1968@21:1/5 to saito on Fri Aug 16 07:34:15 2024
    On 16.08.24 01:39, saito wrote:
    On 8/15/2024 3:09 PM, aotto1968 wrote:

    even better resolution: https://i.postimg.cc/wvpJV4QC/performance-check-tcl-versa-python.png

    Very nice screenshots. Is this some sort a debugger?

    Assuming that you wrote both tcl and python versions and that they both wrap the same core library, wouldn't the call trees look
    the same or at least bear resemblance?


    Yes, both the TCL and PYTHON extensions are wrappers for the same library and the TOOL for writing both wrappers is the NHI1/ALC
    (All-Language-Compiler), that is why both wrappers look similar.

    the memory debugger has two parts
    1) valgrind --tool=callgrind --quiet ... your sw → create callgrind.out.*
    2) kcachegrind callgrind.out.* → create the view

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christian Gollwitzer@21:1/5 to All on Fri Aug 16 09:12:52 2024
    Am 15.08.24 um 23:48 schrieb aotto1968:
    a short conclusion from Facebook …

    "If you analyze the C lib wrapper for MqReadI8, the TCL code adds about
    200% wrapper load and the PYTHON code adds about 10% wrapper load."
    (ref: https://www.facebook.com/share/p/wihmQPR4pBRacLLF/)

    → I think TCL has an "performance-problem".

    I won't solve the problem, just to say: It's impossible to help you with
    this, because you don't explain:
    * who wrote this wrapper
    * where to find the code
    * what benchmark are you running

    It could be, e.g. that your benchmark code introduces shimmering and
    then there's lots of conversion going on. It might be something
    completely different. Or it might be that Tcl is indeed slower than
    Python (in most of my comparisons, it was the opposite - unless you
    offload work to external libraries).

    Regards,

    Christian

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From undroidwish@21:1/5 to Christian Gollwitzer on Fri Aug 16 10:41:32 2024
    On 8/16/24 09:12, Christian Gollwitzer wrote:
    Am 15.08.24 um 23:48 schrieb aotto1968:

    ...
    → I think TCL has an "performance-problem".

    I won't solve the problem, just to say: It's impossible to help you with this, because you don't explain:
    * who wrote this wrapper
    * where to find the code
    * what benchmark are you running
    ...

    +1

    PS: Philosophically, the perpetual perception of performance problems
    is inherent to human design (and possibly inextricable even).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From aotto1968@21:1/5 to Christian Gollwitzer on Fri Aug 16 11:44:26 2024
    On 16.08.24 09:12, Christian Gollwitzer wrote:
    Am 15.08.24 um 23:48 schrieb aotto1968:
    a short conclusion from Facebook …

    "If you analyze the C lib wrapper for MqReadI8, the TCL code adds about 200% wrapper load and the PYTHON code adds about 10%
    wrapper load." (ref: https://www.facebook.com/share/p/wihmQPR4pBRacLLF/)

    → I think TCL has an "performance-problem".

    I won't solve the problem, just to say: It's impossible to help you with this, because you don't explain:
    * who wrote this wrapper
    * where to find the code
    * what benchmark are you running

    It could be, e.g. that your benchmark code introduces shimmering and then there's lots of conversion going on. It might be
    something completely different. Or it might be that Tcl is indeed slower than Python (in most of my comparisons, it was the
    opposite - unless you offload work to external libraries).

    Regards,

          Christian

    1) just the "stupid" Tcl_ObjectGetMetadata to retrieve the pointer associated with an oo-object cost 1/3 of the wrapper
    performance → the whole header of a tcl OO wrapper cost more than everything else in the wrapper.

    if you look into the code it is an hash-table lookup !!!
    in python it is a ZERO-time operation

    2) just to create an INT-object from an integer the TCL create always an object from scratch inclusive malloc etc
    python uses for small numbers (integer) a table of already pre-alloc objects as ZERO-time operation

    3) the set/reset-result have to free all the (stupid) objects that add additional 1/3 of the wrapper cost


    analysis.

    the Tcl_ObjectGetMetadata is clear an design-error
    the missing small-int-object pre-alloc is an programmer-lazy-error


    if someone can setup a screen sharing session than I can explain the problem in more detail
    ( need to test the screen-sharing first because because I not use to it )

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From aotto1968@21:1/5 to All on Fri Aug 16 22:16:25 2024
    I spend some time on research and further optimization ... but ...

    One thing seems clear: "lang-Python" with AGGRESSIVE optimization is NOT far from "lang-C" speed.
    With aggressive optimization, Python creates a runtime optimization (--enable-optimizations) during compilation and that WITH
    threads which, unlike TCL, CANNOT be disabled in Python. Also, the runtime library is FIRMLY integrated into Python.

    TCL with aggressive optimization also uses the static runtime library BUT no threads.

    → for updates check the picture in the comment. https://www.facebook.com/share/p/WYmfnRWybY1Sh42f/

    summary for aggressive …

    .../perf-aggressive/inst/sbin/c/x86_64-suse-linux-gnu-perfclient --timeout 2 --send --sec 4 @
    .../perf-aggressive/inst/sbin/c/x86_64-suse-linux-gnu-perfserver
    :PerfClientExec }: start ------------------------ : result [ count / sec ]
    :statistics }: --send : 403779.6 [ 1615234 / 4.000286 ]
    :PerfClientExec }: end: ----------------------------------------

    .../perf-aggressive/inst/sbin/c/x86_64-suse-linux-gnu-perfclient --timeout 2 --send --sec 4 @ $PYTHON
    .../perf-aggressive/inst/sbin/py/x86_64-suse-linux-gnu-perfserver.py
    :PerfClientExec }: start ------------------------ : result [ count / sec ]
    :statistics }: --send : 311506.7 [ 1246216 / 4.000608 ]
    :PerfClientExec }: end: ----------------------------------------

    .../perf-aggressive/inst/sbin/c/x86_64-suse-linux-gnu-perfclient --timeout 2 --send --sec 4 @ $TCLSH
    .../perf-aggressive/inst/sbin/tcl/x86_64-suse-linux-gnu-perfserver.tcl
    :PerfClientExec }: start ------------------------ : result [ count / sec ]
    :statistics }: --send : 227151.4 [ 908663 / 4.000253 ]
    :PerfClientExec }: end: ----------------------------------------

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From aotto1968@21:1/5 to saito on Fri Aug 16 22:19:27 2024
    On 16.08.24 20:29, saito wrote:
    On 8/16/2024 1:34 AM, aotto1968 wrote:
    On 16.08.24 01:39, saito wrote:
    On 8/15/2024 3:09 PM, aotto1968 wrote:

    even better resolution: https://i.postimg.cc/wvpJV4QC/performance-check-tcl-versa-python.png

    Very nice screenshots. Is this some sort a debugger?

    Assuming that you wrote both tcl and python versions and that they both wrap the same core library, wouldn't the call trees
    look the same or at least bear resemblance?


    Yes, both the TCL and PYTHON extensions are wrappers for the same library and the TOOL for writing both wrappers is the
    NHI1/ALC (All-Language-Compiler), that is why both wrappers look similar.


    What I meant was that the two images look very different. I can't make out what the boxes say, but nevertheless one is wide and
    shallow, the other narrow and deep. So this may not be an apples-to-apples comparison. As has been noted already, shimmering may
    play a role here or extra levels of abstraction via extra proc calls may skew the results in one language vs. the other.



    these two pictures are generated by the tool and not by me…
    the TCL picture is so "wide" because the TCL uses a lot of "overhead"
    the python picture is so "narrow" because PYTHON uses much less "overhead".

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From undroidwish@21:1/5 to All on Sat Aug 17 06:27:22 2024
    On 8/16/24 22:19, aotto1968 wrote:

    ...
    these two pictures are generated by the tool and not by me…
    the TCL picture is so "wide" because the TCL uses a lot of "overhead"
    the python picture is so "narrow" because PYTHON uses much less
    "overhead".

    Hmm, so here we are:

    a) you complain about Tcl's bad performance
    b) you seem to be unwilling to disclose enough information about the
    Python and Tcl implementations in order to get the big picture and
    see the cause of differences and to try to discuss improvements
    with you
    c) due to b) you continue to complain about Tcl's bad performance

    Not quite a fruitful cycle.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From aotto1968@21:1/5 to undroidwish on Sat Aug 17 07:28:59 2024
    On 17.08.24 06:27, undroidwish wrote:
    On 8/16/24 22:19, aotto1968 wrote:

    ...
    these two pictures are generated by the tool and not by me…
    the TCL picture is so "wide" because the TCL uses a lot of "overhead"
    the python picture is so "narrow" because PYTHON uses much less "overhead".

    Hmm, so here we are:

    a) you complain about Tcl's bad performance
    b) you seem to be unwilling to disclose enough information about the
       Python and Tcl implementations in order to get the big picture and
       see the cause of differences and to try to discuss improvements
       with you
    c) due to b) you continue to complain about Tcl's bad performance

    Not quite a fruitful cycle.


    I'm "not" complain about TCL bad performance I just mention that PYTHON has done much more work on performance than TCL.

    If you have 300.000 transaction per second (PYTHON) or 200.000 transaction
    per second (TCL) is just an case for someone who need this difference.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From undroidwish@21:1/5 to All on Sat Aug 17 09:59:41 2024
    On 8/17/24 07:28, aotto1968 wrote:

    ...
    I'm "not" complain about TCL bad performance I just mention that PYTHON has done much more work on performance than TCL.

    Fine, then please elaborate on this claim. What exactly did Python
    better and more in terms of performance? Any pointers welcome.

    If you have 300.000 transaction per second (PYTHON) or 200.000 transaction per second (TCL) is just an case for someone who need this difference.

    Indeed could this be a reason to ask if there are better ways of using
    the Tcl framework in order to get the Tcl implementation be on par with
    the Python one. As stated many times before, to discuss this on c.l.t.
    will require that you provide more implementation details.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)