I need to run two external processes (on Linux):
pdftotext -tsv one.pdf
pdftotext -tsv two.pdf
For each one I need to acquire the output and post-process it.
Both are completely independent.
(However, once I've finished post-processing I then do some work on
both sets of post-processed data together.)
Each external process takes about 3 secs so it takes just over 6 secs
to acquire the data from both processes.
When I've done something similar in Python I've used the multiprocessing module and this has got my runtime close to the 3 secs.
In my experiments with Tcl's threading I've found the threading startup overhead to be rather large.
What is the fastest way to run two independent processes concurrently
and acquire their outputs using Tcl?
This takes ~1 sec.So it's 2x faster, as expected.
Am 29.04.2026 um 10:51 schrieb Mark Summerfield:
This takes ~1 sec.So it's 2x faster, as expected.
What's the issue?
In my experiments with Tcl's threading I've found the threading
startup overhead to be rather large.
I need to run two external processes (on Linux):
pdftotext -tsv one.pdf
pdftotext -tsv two.pdf
I need to run two external processes (on Linux):
pdftotext -tsv one.pdf
pdftotext -tsv two.pdf
* Mark Summerfield <m.n.summerfield@gmail.com>
| I created a tiny test program (65 LOC; shown at the end) to
| compare timings. I did multiple timings and here're the averages:
| serial (2 LOC) 2.020 sec
| multiprocess (19 LOC) 1.055 sec
| threaded (13 LOC) 1.061 sec
| Since the difference between the multiprocess and threaded
| approaches is so small and that the threaded code is simpler
| and more appealing, I'm going to use the threaded version in
| my programs (which only ever work with two PDFs at a time)
| rCo so thank you "abu"!
I wonder: you stated in your initial message
Message-ID: <10sschf$3nvs2$1@dont-email.me>
In my experiments with Tcl's threading I've found the threading
startup overhead to be rather large.
Can you tell what is/was the difference to the current solution which obviously has no "startup overhead"?
R'
Shameless plug...
Bit late to the topic, but the simplest way to parallelize multiple processes or threads and wait for completion is promises, if you do not
mind an external package. Bit of a learning curve however.
lappend promises [promise::pexec pdftotext pdf1.pdf pdf1.txt]
lappend promises [promise::pexec pdftotext pdf2.pdf pdf2.txt]
set waiter [promise::all $promises]
# Assumes eventloop not running!
promise::eventloop $waiter
Timing:
% time {demo} <- using promises
2606403 microseconds per iteration
% time {demo2} <- sequential exec's
4762417 microseconds per iteration
https://wiki.tcl-lang.org/page/promise
https://tcl-promise.magicsplat.com/ https://www.magicsplat.com/blog/tags/promises/
* Mark Summerfield <m.n.summerfield@gmail.com>
| > https://wiki.tcl-lang.org/page/promise
| > https://tcl-promise.magicsplat.com/
| > https://www.magicsplat.com/blog/tags/promises/
| I tried it but hit a problem. Here's the code I used:
| proc promised {pdf1 pdf2} {
| set p1 [promise::pexec $::PDFTOTEXT $::OPT $pdf1 - 2>@1]
| set p2 [promise::pexec $::PDFTOTEXT $::OPT $pdf2 - 2>@1]
| set waiter [promise::all [list $p1 $p2]]
| # Assumes eventloop not running!
| promise::eventloop $waiter
| set tsv1 [$p1 getdata]
| set tsv2 [$p2 getdata]
promise::eventloop already returns the result of the 'waiter' promise
(i.e. those registered in promise::all).
So change those two 'getdata' calls to
lassign [promise::eventloop $waiter] tsv1 tsv2
| puts " tsv1=[string length $tsv1] tsv2=[string length $tsv2]"
HTH
R'
Thanks, I've now done that. Here are the new timings (each is the best
of several):
sec method
2.010 serial
1.052 multiprocess
1.065 thread_pool
1.067 threaded
8.366 promised
Ashok,
since coroutines are already part of TCL, any chance of getting promises
into the core? It would seem to me as a 'natural' addition for async features in TCL, and the package looks quite mature...
R'
Thanks, I've now done that. Here are the new timings (each is the best
of several):
sec method
2.010 serial
1.052 multiprocess
1.065 thread_pool
1.067 threaded
8.366 promised
I'm surprised by the promises result below (not that I doubt it). I'll
have to take a look when I have some time.
In my tests that I posted earlier, the promise version took about the
same time as the multiprocess one.
The difference between my example and yours is that in my example,
pdftotext was writing to a file and not to its stdout. In your example,
it is writing back to the pipe and read directly in Tcl.
I wonder if the difference stems from your code essentially doing a busy loop reading data while the promise version goes through the event loop though I cannot explain why that would make that much difference.
Worth investigating further when I have time...
/Ashok
On 5/13/2026 2:24 PM, Mark Summerfield wrote:
Thanks, I've now done that. Here are the new timings (each is the best
of several):
sec method
2.010 serial
1.052 multiprocess
1.065 thread_pool
1.067 threaded
8.366 promised
In the hope it helps, below is the full source for the example I used.Thanks, having a benchmark source helped. However I cannot reproduce
I ran it on Tcl/Tk 9.0.3 (64-bit), Debian GNU/Linux 12 (bookworm)
Linux 6.1.0-44-amd64 (x86_64), 12th Gen Intel Core i7-12700 20 cores.
I used two PDF files both of 647 pages and did several runs of each
method to find the best time.
| Sysop: | Amessyroom |
|---|---|
| Location: | Fayetteville, NC |
| Users: | 65 |
| Nodes: | 6 (0 / 6) |
| Uptime: | 10:19:49 |
| Calls: | 862 |
| Files: | 1,311 |
| D/L today: |
3 files (7,546K bytes) |
| Messages: | 265,185 |