Sysop: | Amessyroom |
---|---|
Location: | Fayetteville, NC |
Users: | 43 |
Nodes: | 6 (0 / 6) |
Uptime: | 107:33:55 |
Calls: | 290 |
Files: | 905 |
Messages: | 76,677 |
On 8/18/2024 1:42 PM, aotto1968 wrote:
add some documentation regarding the performance testing:
http://thedev.nhi1.de/theLink/main/md_docs_2main_2README__PERFORMANCE.htm
I recently wrote some C code using Visual Studio 2022 and they have a wonderful performance profiler. I was able to determine
that 80% of the cost of the module I was developing was caused by calls to some library routines I was using. By writing my own
versions that didn't need to be so generalized, I got that down to 10%.
One problem was that once I turned on the compiler optimization, the profiler became pretty much worthless to measure my own
code's performance, so I couldn't get that 10% any lower.
But it would be kinda cool to try using those VS tools on the tcl source code, but I don't know of any way to build tcl inside
VS where one could use those tools.
http://thedev.nhi1.de/NHI1/main/index.htm
http://thedev.nhi1.de/theLink/main/md_docs_2main_2README__PERFORMANCE.htm#README_PERFORMANCE
Hi, some (unproven) statistics from my SW regarding the performance TCL^^^^^^^^
On 8/13/24 22:40, aotto1968 wrote:
Hi, some (unproven) statistics from my SW regarding the performance TCL^^^^^^^^
Exactly. I tend to go even further and add the attribute useless to
unproven as long as you publish some numbers with some subjective
analysis without presenting the implementation and the measurement
method.
On 8/14/24 04:16, aotto1968 wrote:
the first analyses is quite simple:
right now python does NOT support threads in NHI1 (will change soon) and tcl does…
this has an influence on the "release" build because this is NHI1 without threads in python and with
threads in tcl.
→ the difference is that the thread-local-storage is an STATIC REFERENCE in python and a POINTER in tcl.
→ the "aggressive" build does NOT use threads at all and the change between python and tcl is more compare-able
but is still ~20%
I think the point that androwish was making, without seeing the code we can not tell if you did something in a way that takes
more time than doing it in a slightly different way.
I cannot post a "picture" because the "newsgroup does NOT accept pictures …
must of the code is in the TCL-C-Api for example:
not trivial, it seems that the Python people with a lot of “manpower” have already MAXIMIZED the optimization of Python.
To be more precise I add an image to show the differences TCL versa PYTHON on an the example wrapper function
ReadI8
This is from the debugging environment with tcl/py & extension compiled in debug mode.
https://i.postimg.cc/NjXccdRC/performance-check-tcl-versa-python.png
On 15.08.24 20:27, aotto1968 wrote:
To be more precise I add an image to show the differences TCL versa PYTHON on an the example wrapper function
ReadI8
This is from the debugging environment with tcl/py & extension compiled in debug mode.
https://i.postimg.cc/NjXccdRC/performance-check-tcl-versa-python.png
better link → with callgraph https://i.postimg.cc/TYbNKXrn/performance-check-tcl-versa-python.png
ReadI8
On 15.08.24 20:43, aotto1968 wrote:
On 15.08.24 20:27, aotto1968 wrote:
To be more precise I add an image to show the differences TCL versa PYTHON on an the example wrapper function
ReadI8
This is from the debugging environment with tcl/py & extension compiled in debug mode.
https://i.postimg.cc/NjXccdRC/performance-check-tcl-versa-python.png
better link → with callgraph
https://i.postimg.cc/TYbNKXrn/performance-check-tcl-versa-python.png
even better resolution: https://i.postimg.cc/wvpJV4QC/performance-check-tcl-versa-python.png
https://www.facebook.com/share/p/wihmQPR4pBRacLLF/
On 8/15/2024 3:09 PM, aotto1968 wrote:
even better resolution: https://i.postimg.cc/wvpJV4QC/performance-check-tcl-versa-python.png
Very nice screenshots. Is this some sort a debugger?
Assuming that you wrote both tcl and python versions and that they both wrap the same core library, wouldn't the call trees look
the same or at least bear resemblance?
a short conclusion from Facebook …
"If you analyze the C lib wrapper for MqReadI8, the TCL code adds about
200% wrapper load and the PYTHON code adds about 10% wrapper load."
(ref: https://www.facebook.com/share/p/wihmQPR4pBRacLLF/)
→ I think TCL has an "performance-problem".
Am 15.08.24 um 23:48 schrieb aotto1968:
...
→ I think TCL has an "performance-problem".
I won't solve the problem, just to say: It's impossible to help you with this, because you don't explain:
* who wrote this wrapper
* where to find the code
* what benchmark are you running
...
Am 15.08.24 um 23:48 schrieb aotto1968:
a short conclusion from Facebook …
"If you analyze the C lib wrapper for MqReadI8, the TCL code adds about 200% wrapper load and the PYTHON code adds about 10%
wrapper load." (ref: https://www.facebook.com/share/p/wihmQPR4pBRacLLF/)
→ I think TCL has an "performance-problem".
I won't solve the problem, just to say: It's impossible to help you with this, because you don't explain:
* who wrote this wrapper
* where to find the code
* what benchmark are you running
It could be, e.g. that your benchmark code introduces shimmering and then there's lots of conversion going on. It might be
something completely different. Or it might be that Tcl is indeed slower than Python (in most of my comparisons, it was the
opposite - unless you offload work to external libraries).
Regards,
Christian
if you look into the code it is an hash-table lookup !!!
in python it is a ZERO-time operation
python uses for small numbers (integer) a table of already pre-alloc objects as ZERO-time operation
the Tcl_ObjectGetMetadata is clear an design-error
the missing small-int-object pre-alloc is an programmer-lazy-error
.../perf-aggressive/inst/sbin/c/x86_64-suse-linux-gnu-perfclient --timeout 2 --send --sec 4 @.../perf-aggressive/inst/sbin/c/x86_64-suse-linux-gnu-perfserver
.../perf-aggressive/inst/sbin/c/x86_64-suse-linux-gnu-perfclient --timeout 2 --send --sec 4 @ $PYTHON.../perf-aggressive/inst/sbin/py/x86_64-suse-linux-gnu-perfserver.py
.../perf-aggressive/inst/sbin/c/x86_64-suse-linux-gnu-perfclient --timeout 2 --send --sec 4 @ $TCLSH.../perf-aggressive/inst/sbin/tcl/x86_64-suse-linux-gnu-perfserver.tcl
On 8/16/2024 1:34 AM, aotto1968 wrote:
On 16.08.24 01:39, saito wrote:
On 8/15/2024 3:09 PM, aotto1968 wrote:
even better resolution: https://i.postimg.cc/wvpJV4QC/performance-check-tcl-versa-python.png
Very nice screenshots. Is this some sort a debugger?
Assuming that you wrote both tcl and python versions and that they both wrap the same core library, wouldn't the call trees
look the same or at least bear resemblance?
Yes, both the TCL and PYTHON extensions are wrappers for the same library and the TOOL for writing both wrappers is the
NHI1/ALC (All-Language-Compiler), that is why both wrappers look similar.
What I meant was that the two images look very different. I can't make out what the boxes say, but nevertheless one is wide and
shallow, the other narrow and deep. So this may not be an apples-to-apples comparison. As has been noted already, shimmering may
play a role here or extra levels of abstraction via extra proc calls may skew the results in one language vs. the other.
the TCL picture is so "wide" because the TCL uses a lot of "overhead"
the python picture is so "narrow" because PYTHON uses much less "overhead".
...
these two pictures are generated by the tool and not by me…
the TCL picture is so "wide" because the TCL uses a lot of "overhead""overhead".
the python picture is so "narrow" because PYTHON uses much less
On 8/16/24 22:19, aotto1968 wrote:
...
these two pictures are generated by the tool and not by me…
the TCL picture is so "wide" because the TCL uses a lot of "overhead"
the python picture is so "narrow" because PYTHON uses much less "overhead".
Hmm, so here we are:
a) you complain about Tcl's bad performance
b) you seem to be unwilling to disclose enough information about the
Python and Tcl implementations in order to get the big picture and
see the cause of differences and to try to discuss improvements
with you
c) due to b) you continue to complain about Tcl's bad performance
Not quite a fruitful cycle.
...
I'm "not" complain about TCL bad performance I just mention that PYTHON has done much more work on performance than TCL.
If you have 300.000 transaction per second (PYTHON) or 200.000 transaction per second (TCL) is just an case for someone who need this difference.