Forum: Too Lazy BBS

Fastest way to run two external processes

From Mark Summerfield@m.n.summerfield@gmail.com to comp.lang.tcl on Wed Apr 29 07:38:23 2026

From Newsgroup: comp.lang.tcl

I need to run two external processes (on Linux):

pdftotext -tsv one.pdf
pdftotext -tsv two.pdf

For each one I need to acquire the output and post-process it.
Both are completely independent.
(However, once I've finished post-processing I then do some work on
both sets of post-processed data together.)

Each external process takes about 3 secs so it takes just over 6 secs
to acquire the data from both processes.

When I've done something similar in Python I've used the multiprocessing
module and this has got my runtime close to the 3 secs.

In my experiments with Tcl's threading I've found the threading startup overhead to be rather large.

What is the fastest way to run two independent processes concurrently
and acquire their outputs using Tcl?
--- Synchronet 3.21f-Linux NewsLink 1.2

From Mark Summerfield@m.n.summerfield@gmail.com to comp.lang.tcl on Wed Apr 29 08:51:17 2026

From Newsgroup: comp.lang.tcl

On Wed, 29 Apr 2026 07:38:23 -0000 (UTC), Mark Summerfield wrote:

I need to run two external processes (on Linux):

pdftotext -tsv one.pdf
pdftotext -tsv two.pdf

For each one I need to acquire the output and post-process it.
Both are completely independent.
(However, once I've finished post-processing I then do some work on
both sets of post-processed data together.)

Each external process takes about 3 secs so it takes just over 6 secs
to acquire the data from both processes.

When I've done something similar in Python I've used the multiprocessing module and this has got my runtime close to the 3 secs.

In my experiments with Tcl's threading I've found the threading startup overhead to be rather large.

What is the fastest way to run two independent processes concurrently
and acquire their outputs using Tcl?

Here's my serial version:

proc app::serial {pdftotext pdf1 pdf2} {
puts serial
set pdf1tsv [exec $pdftotext -tsv $pdf1 -]
set pdf2tsv [exec $pdftotext -tsv $pdf2 -]
list $pdf1tsv $pdf2tsv
}

This takes ~2 sec for two ~650 page PDFs.

With some help from Gemini (after I got past non-working and slow
solutions) I did a multiprocessing version:

proc app::multiprocess {pdftotext pdf1 pdf2} {
set p1 [open "|$pdftotext -tsv $pdf1 - 2>@1" r]
try {
set p2 [open "|$pdftotext -tsv $pdf2 - 2>@1" r]
try {
fconfigure $p1 -blocking 0
fconfigure $p2 -blocking 0
set pdf1tsv ""
set pdf2tsv ""
while {![eof $p1] || ![eof $p2]} {
append pdf1tsv [read $p1]
append pdf2tsv [read $p2]
after 1
}
} finally {
close $p2
}
} finally {
close $p1
}
list $pdf1tsv $pdf2tsv
}

This takes ~1 sec.
--- Synchronet 3.21f-Linux NewsLink 1.2

From meshparts@alexandru.dadalau@meshparts.de to comp.lang.tcl on Wed Apr 29 11:24:34 2026

From Newsgroup: comp.lang.tcl

Am 29.04.2026 um 10:51 schrieb Mark Summerfield:

This takes ~1 sec.

So it's 2x faster, as expected.
What's the issue?
--- Synchronet 3.21f-Linux NewsLink 1.2

From Mark Summerfield@m.n.summerfield@gmail.com to comp.lang.tcl on Wed Apr 29 09:46:56 2026

From Newsgroup: comp.lang.tcl

On Wed, 29 Apr 2026 11:24:34 +0200, meshparts wrote:

Am 29.04.2026 um 10:51 schrieb Mark Summerfield:

This takes ~1 sec.

So it's 2x faster, as expected.
What's the issue?

When I originally asked I only had the serial approach.
I replied to myself once I had the multiprocessing approach which
solved the problem so that people could see it was solved.
--- Synchronet 3.21f-Linux NewsLink 1.2

From Ralf Fassel@ralfixx@gmx.de to comp.lang.tcl on Wed Apr 29 12:30:12 2026

From Newsgroup: comp.lang.tcl

* Mark Summerfield <m.n.summerfield@gmail.com>
| With some help from Gemini (after I got past non-working and slow
| solutions) I did a multiprocessing version:

| proc app::multiprocess {pdftotext pdf1 pdf2} {
| set p1 [open "|$pdftotext -tsv $pdf1 - 2>@1" r]
| try {
| set p2 [open "|$pdftotext -tsv $pdf2 - 2>@1" r]
| try {
| fconfigure $p1 -blocking 0
| fconfigure $p2 -blocking 0

Depending on the output of $pdftotext, some -encoding option might be necessary, too.

| set pdf1tsv ""
| set pdf2tsv ""
| while {![eof $p1] || ![eof $p2]} {
| append pdf1tsv [read $p1]
| append pdf2tsv [read $p2]
| after 1
| }

I don't like the busy-waiting loop for eof, but a solution using
fileevents would require namespace vars or globals to collect the output
and signallig 'done', so ymmv.

R'
--- Synchronet 3.21f-Linux NewsLink 1.2

From abu@user13892@newsgrouper.org.invalid to comp.lang.tcl on Thu Apr 30 00:51:16 2026

From Newsgroup: comp.lang.tcl

I don't understand why Threads are not used (in particular Thread Pools)

Here's my solution. Please tell me if there's a significant speed penalty.

# ===============================

package require Thread

# run up to 3 parallel workers; extra jobs are queued
set mytpool [tpool::create -minworkers 3]

# ..
set jobs {
"exec $pdftotext -tsv $pdf1 - 2>@1"
"exec $pdftotext -tsv $pdf2 - 2>@1"
"exec $pdftotext -tsv $pdf3 - 2>@1"
}

set T0 [clock milliseconds]
set myjobIDs {}
# scheduled all jobs
foreach job $jobs {
lappend myjobIDs [tpool::post -nowait $mytpool $job]
}
unset RESULT
puts "waiting for RESULT..."
while { [llength $myjobIDs] > 0 } {
# get the completed jobs; myjobIDs is updated with the list of the still pending jobs
set completedJobs [tpool::wait $mytpool $myjobIDs myjobIDs]
foreach job $completedJobs {
puts "== Job $job completed at [expr {[clock milliseconds]-$T0}] msec"
set RESULT($job) [tpool::get $mytpool $job]
}
}

puts "Result saved in the RESULT() array"
puts "Total processing time: [expr {[clock milliseconds]-$T0}] msec"
--- Synchronet 3.21f-Linux NewsLink 1.2

From Ralf Fassel@ralfixx@gmx.de to comp.lang.tcl on Thu Apr 30 14:23:58 2026

From Newsgroup: comp.lang.tcl

* abu <user13892@newsgrouper.org.invalid>
| I don't understand why Threads are not used (in particular Thread Pools)

Most probably because Mark stated in Message-ID: <10sschf$3nvs2$1@dont-email.me>

In my experiments with Tcl's threading I've found the threading
startup overhead to be rather large.

| Here's my solution. Please tell me if there's a significant speed penalty.

Did you compare your version to Mark's solution? This would be the best comparison when running on the same hardware...

R'
--- Synchronet 3.21f-Linux NewsLink 1.2

From Mark Summerfield@m.n.summerfield@gmail.com to comp.lang.tcl on Fri May 1 07:04:08 2026

From Newsgroup: comp.lang.tcl

I created a tiny test program (65 LOC; shown at the end) to
compare timings. I did multiple timings and here're the averages:

serial (2 LOC) 2.020 sec
multiprocess (19 LOC) 1.055 sec
threaded (13 LOC) 1.061 sec

Since the difference between the multiprocess and threaded
approaches is so small and that the threaded code is simpler
and more appealing, I'm going to use the threaded version in
my programs (which only ever work with two PDFs at a time)
rCo so thank you "abu"!

#!/usr/bin/env tclsh9
# usage: time ./concurrent.tcl <s|m|t> <file1.pdf> <file2.pdf>

package require thread

proc main {} {
set pdftotext [auto_execok pdftotext]
set pdf1 [lindex $::argv 1]
set pdf2 [lindex $::argv 2]
switch [lindex $::argv 0] {
s { serial $pdftotext $pdf1 $pdf2 }
m { multiprocess $pdftotext $pdf1 $pdf2 }
t { threaded $pdftotext $pdf1 $pdf2 }
}
}

proc serial {pdftotext pdf1 pdf2} {
puts -nonewline "serial "
set tsv1 [exec $pdftotext -tsv $pdf1 - 2>@1]
set tsv2 [exec $pdftotext -tsv $pdf2 - 2>@1]
puts " tsv1=[string length $tsv1] tsv2=[string length $tsv2]"
}

proc multiprocess {pdftotext pdf1 pdf2} {
puts -nonewline multiprocess
set p1 [open "|$pdftotext -tsv $pdf1 - 2>@1" r]
try {
set p2 [open "|$pdftotext -tsv $pdf2 - 2>@1" r]
try {
fconfigure $p1 -blocking 0
fconfigure $p2 -blocking 0
set tsv1 ""
set tsv2 ""
while {![eof $p1] || ![eof $p2]} {
append tsv1 [read $p1]
append tsv2 [read $p2]
after 1
}
} finally {
close $p2
}
} finally {
close $p1
}
puts " tsv1=[string length $tsv1] tsv2=[string length $tsv2]"
}

proc threaded {pdftotext pdf1 pdf2} {
puts -nonewline "threaded "
set pool [tpool::create -minworkers 2]
set job1 [tpool::post -nowait $pool "exec $pdftotext -tsv $pdf1 - 2>@1"]
set job2 [tpool::post -nowait $pool "exec $pdftotext -tsv $pdf2 - 2>@1"]
set job_ids [list $job1 $job2]
while {[llength $job_ids] > 0} {
foreach job_id [tpool::wait $pool $job_ids job_ids] {
if {$job_id eq $job1} {
set tsv1 [tpool::get $pool $job_id]
} else {
set tsv2 [tpool::get $pool $job_id]
}
}
}
puts " tsv1=[string length $tsv1] tsv2=[string length $tsv2]"
}

main
--- Synchronet 3.21f-Linux NewsLink 1.2

From Olivier@user1108@newsgrouper.org.invalid to comp.lang.tcl on Fri May 1 10:06:35 2026

From Newsgroup: comp.lang.tcl

Mark Summerfield <m.n.summerfield@gmail.com> posted:

I need to run two external processes (on Linux):

pdftotext -tsv one.pdf
pdftotext -tsv two.pdf

I am not an expert, but the construction (with Tcl 9.x) :

1) launch both processes in background

2) check the status with ::tcl::process

3) post-process the output of each process as soon as it has ended (*)

seems doable but no one mentions something similar, is this a construction
to avoid ?

(*) with a monolithic script if it is fast, I mean no thread or
different interpreters
--- Synchronet 3.21f-Linux NewsLink 1.2

From Ralf Fassel@ralfixx@gmx.de to comp.lang.tcl on Fri May 1 22:54:54 2026

From Newsgroup: comp.lang.tcl

* Mark Summerfield <m.n.summerfield@gmail.com>
| I created a tiny test program (65 LOC; shown at the end) to
| compare timings. I did multiple timings and here're the averages:

| serial (2 LOC) 2.020 sec
| multiprocess (19 LOC) 1.055 sec
| threaded (13 LOC) 1.061 sec

| Since the difference between the multiprocess and threaded
| approaches is so small and that the threaded code is simpler
| and more appealing, I'm going to use the threaded version in
| my programs (which only ever work with two PDFs at a time)
| rCo so thank you "abu"!

I wonder: you stated in your initial message

Message-ID: <10sschf$3nvs2$1@dont-email.me>
In my experiments with Tcl's threading I've found the threading
startup overhead to be rather large.

Can you tell what is/was the difference to the current solution which
obviously has no "startup overhead"?

R'
--- Synchronet 3.21f-Linux NewsLink 1.2

From Emiliano@emiliano@example.invalid to comp.lang.tcl on Sat May 2 00:34:54 2026

From Newsgroup: comp.lang.tcl

On Wed, 29 Apr 2026 07:38:23 -0000 (UTC)
Mark Summerfield <m.n.summerfield@gmail.com> wrote:

I need to run two external processes (on Linux):

pdftotext -tsv one.pdf
pdftotext -tsv two.pdf

You can use pipes and run the processes in the background, collecting output with the event loop. Here's a rough draft

proc runit {var file} {
lassign [chan pipe] cr cw
exec pdftotext -tsv $file - >@ $cw &
chan close $cw
chan configure $cr -blocking 0
chan event $cr readable [list handle $var $cr]
}
proc handle {var fd} {
global $var
append $var [chan read $fd]
if {[chan eof $fd]} {
chan close $fd
set ::done 1
}
}
puts "sequential: [time {
set out1 [exec pdftotext -tsv one.pdf -]
set out2 [exec pdftotext -tsv two.pdf -]
puts "one.pdf [string length $out1]"
puts "two.pdf [string length $out2]"
}]"
puts "parallel: [time {
runit out1 one.pdf
runit out2 two.pdf
vwait done
vwait done
puts "one.pdf [string length $out1]"
puts "two.pdf [string length $out2]"
}]"

Regards
--
Emiliano
--- Synchronet 3.21f-Linux NewsLink 1.2

From Mark Summerfield@m.n.summerfield@gmail.com to comp.lang.tcl on Sat May 2 06:57:06 2026

From Newsgroup: comp.lang.tcl

On Fri, 01 May 2026 22:54:54 +0200, Ralf Fassel wrote:

* Mark Summerfield <m.n.summerfield@gmail.com>
| I created a tiny test program (65 LOC; shown at the end) to
| compare timings. I did multiple timings and here're the averages:

| serial (2 LOC) 2.020 sec
| multiprocess (19 LOC) 1.055 sec
| threaded (13 LOC) 1.061 sec

| Since the difference between the multiprocess and threaded
| approaches is so small and that the threaded code is simpler
| and more appealing, I'm going to use the threaded version in
| my programs (which only ever work with two PDFs at a time)
| rCo so thank you "abu"!

I wonder: you stated in your initial message

Message-ID: <10sschf$3nvs2$1@dont-email.me>
In my experiments with Tcl's threading I've found the threading
startup overhead to be rather large.

Can you tell what is/was the difference to the current solution which obviously has no "startup overhead"?

R'

Yes, the difference was that I started out using thread::create etc.,
rather than using tpool. I've put a new version that compares them
all at the end. Anyone can compare timings for themselves if they
have one or two big PDF files (the program needs two but for tests
it is fine if it is the same one).

On an old laptop:

serial (2 LOC) 6.37 sec
multiprocess (19 LOC) 3.33 sec
thread pool (15 LOC) 3.60 sec
threaded (22 LOC) 3.66 sec

I've now gone back to using the multiprocess version.
Here's the full test code.

#!/usr/bin/env tclsh9
# usage: time ./concurrent.tcl <s|m|p|t> <file1.pdf> <file2.pdf>

package require thread 3

const OPT -tsv ;# OR if not supported by older pdftotext use: -bbox
const PDFTOTEXT [auto_execok pdftotext]

proc main {} {
set pdf1 [lindex $::argv 1]
set pdf2 [lindex $::argv 2]
switch [lindex $::argv 0] {
s { serial $pdf1 $pdf2 }
m { multiprocess $pdf1 $pdf2 }
p { thread_pool $pdf1 $pdf2 }
t { threaded $pdf1 $pdf2 }
}
}

proc serial {pdf1 pdf2} {
puts -nonewline "serial "
set tsv1 [exec $::PDFTOTEXT $::OPT $pdf1 - 2>@1]
set tsv2 [exec $::PDFTOTEXT $::OPT $pdf2 - 2>@1]
puts " tsv1=[string length $tsv1] tsv2=[string length $tsv2]"
}

proc multiprocess {pdf1 pdf2} {
puts -nonewline multiprocess
set p1 [open "|$::PDFTOTEXT $::OPT $pdf1 - 2>@1" r]
try {
set p2 [open "|$::PDFTOTEXT $::OPT $pdf2 - 2>@1" r]
try {
fconfigure $p1 -blocking 0
fconfigure $p2 -blocking 0
set tsv1 ""
set tsv2 ""
while {![eof $p1] || ![eof $p2]} {
append tsv1 [read $p1]
append tsv2 [read $p2]
after 1
}
} finally {
close $p2
}
} finally {
close $p1
}
puts " tsv1=[string length $tsv1] tsv2=[string length $tsv2]"
}

proc thread_pool {pdf1 pdf2} {
puts -nonewline "thread pool "
set pool [tpool::create -minworkers 2]
set job1 [tpool::post -nowait $pool \
"exec $::PDFTOTEXT $::OPT $pdf1 - 2>@1"]
set job2 [tpool::post -nowait $pool \
"exec $::PDFTOTEXT $::OPT $pdf2 - 2>@1"]
set job_ids [list $job1 $job2]
while {[llength $job_ids] > 0} {
foreach job_id [tpool::wait $pool $job_ids job_ids] {
if {$job_id eq $job1} {
set tsv1 [tpool::get $pool $job_id]
} else {
set tsv2 [tpool::get $pool $job_id]
}
}
}
puts " tsv1=[string length $tsv1] tsv2=[string length $tsv2]"
}

proc threaded {pdf1 pdf2} {
puts -nonewline "threaded "
set tid1 [thread::create -joinable]
set tid2 [thread::create -joinable]
tsv::set shared pdf1 $pdf1
tsv::set shared pdf2 $pdf2
tsv::set shared pdftotext $::PDFTOTEXT
tsv::set shared opt $::OPT
thread::send -async $tid1 {
tsv::set shared tsv1 \
[exec -encoding utf-8 {*}[tsv::get shared pdftotext] \
[tsv::get shared opt] [tsv::get shared pdf1] - 2>@1]
}
thread::send -async $tid2 {
tsv::set shared tsv2 \
[exec -encoding utf-8 {*}[tsv::get shared pdftotext] \
[tsv::get shared opt] [tsv::get shared pdf2] - 2>@1]
}
thread::release $tid1
thread::join $tid1
thread::release $tid2
thread::join $tid2
set tsv1 [tsv::get shared tsv1]
set tsv2 [tsv::get shared tsv2]
puts " tsv1=[string length $tsv1] tsv2=[string length $tsv2]"
}

main
--- Synchronet 3.21f-Linux NewsLink 1.2

From Ashok@apnmbx-public@yahoo.com to comp.lang.tcl on Sat May 9 16:03:16 2026

From Newsgroup: comp.lang.tcl

Shameless plug...

Bit late to the topic, but the simplest way to parallelize multiple
processes or threads and wait for completion is promises, if you do not
mind an external package. Bit of a learning curve however.

lappend promises [promise::pexec pdftotext pdf1.pdf pdf1.txt]
lappend promises [promise::pexec pdftotext pdf2.pdf pdf2.txt]
set waiter [promise::all $promises]
# Assumes eventloop not running!
promise::eventloop $waiter

Timing:

% time {demo} <- using promises
2606403 microseconds per iteration
% time {demo2} <- sequential exec's
4762417 microseconds per iteration

https://wiki.tcl-lang.org/page/promise
https://tcl-promise.magicsplat.com/ https://www.magicsplat.com/blog/tags/promises/

--- Synchronet 3.22a-Linux NewsLink 1.2

From Ralf Fassel@ralfixx@gmx.de to comp.lang.tcl on Mon May 11 11:08:06 2026

From Newsgroup: comp.lang.tcl

* Ashok <apnmbx-public@yahoo.com>
| Shameless plug...

| Bit late to the topic, but the simplest way to parallelize multiple
| processes or threads and wait for completion is promises, if you do
| not mind an external package. Bit of a learning curve however. --<snip-snip>--
| https://tcl-promise.magicsplat.com/

Ashok,
since coroutines are already part of TCL, any chance of getting promises
into the core? It would seem to me as a 'natural' addition for async
features in TCL, and the package looks quite mature...

R'
--- Synchronet 3.22a-Linux NewsLink 1.2

From Mark Summerfield@m.n.summerfield@gmail.com to comp.lang.tcl on Tue May 12 08:40:29 2026

From Newsgroup: comp.lang.tcl

On Sat, 9 May 2026 16:03:16 +0530, Ashok wrote:

Shameless plug...

Bit late to the topic, but the simplest way to parallelize multiple processes or threads and wait for completion is promises, if you do not
mind an external package. Bit of a learning curve however.

lappend promises [promise::pexec pdftotext pdf1.pdf pdf1.txt]
lappend promises [promise::pexec pdftotext pdf2.pdf pdf2.txt]
set waiter [promise::all $promises]
# Assumes eventloop not running!
promise::eventloop $waiter

Timing:

% time {demo} <- using promises
2606403 microseconds per iteration
% time {demo2} <- sequential exec's
4762417 microseconds per iteration

https://wiki.tcl-lang.org/page/promise
https://tcl-promise.magicsplat.com/ https://www.magicsplat.com/blog/tags/promises/

I tried it but hit a problem. Here's the code I used:

proc promised {pdf1 pdf2} {
set p1 [promise::pexec $::PDFTOTEXT $::OPT $pdf1 - 2>@1]
set p2 [promise::pexec $::PDFTOTEXT $::OPT $pdf2 - 2>@1]
set waiter [promise::all [list $p1 $p2]]
# Assumes eventloop not running!
promise::eventloop $waiter
set tsv1 [$p1 getdata]
set tsv2 [$p2 getdata]
puts " tsv1=[string length $tsv1] tsv2=[string length $tsv2]"
}

I used promise-1.2.0.tm. Here's the error:

$ ./concurrent.tcl P ~/commercial/pdfs/boson[12].pdf
invalid command name "::oo::Obj22"
while executing
"$p1 getdata"
(procedure "promised" line 7)
invoked from within
"promised $pdf1 $pdf2 "
(procedure "main" line 9)
invoked from within
"main"
(file "./concurrent.tcl" line 112)

--- Synchronet 3.22a-Linux NewsLink 1.2

From Ralf Fassel@ralfixx@gmx.de to comp.lang.tcl on Tue May 12 16:58:55 2026

From Newsgroup: comp.lang.tcl

* Mark Summerfield <m.n.summerfield@gmail.com>
| > https://wiki.tcl-lang.org/page/promise
| > https://tcl-promise.magicsplat.com/
| > https://www.magicsplat.com/blog/tags/promises/

| I tried it but hit a problem. Here's the code I used:

| proc promised {pdf1 pdf2} {
| set p1 [promise::pexec $::PDFTOTEXT $::OPT $pdf1 - 2>@1]
| set p2 [promise::pexec $::PDFTOTEXT $::OPT $pdf2 - 2>@1]
| set waiter [promise::all [list $p1 $p2]]
| # Assumes eventloop not running!
| promise::eventloop $waiter
| set tsv1 [$p1 getdata]
| set tsv2 [$p2 getdata]

promise::eventloop already returns the result of the 'waiter' promise
(i.e. those registered in promise::all).

So change those two 'getdata' calls to

lassign [promise::eventloop $waiter] tsv1 tsv2

| puts " tsv1=[string length $tsv1] tsv2=[string length $tsv2]"

HTH
R'
--- Synchronet 3.22a-Linux NewsLink 1.2

From Mark Summerfield@m.n.summerfield@gmail.com to comp.lang.tcl on Wed May 13 08:54:56 2026

From Newsgroup: comp.lang.tcl

On Tue, 12 May 2026 16:58:55 +0200, Ralf Fassel wrote:

* Mark Summerfield <m.n.summerfield@gmail.com>
| > https://wiki.tcl-lang.org/page/promise
| > https://tcl-promise.magicsplat.com/
| > https://www.magicsplat.com/blog/tags/promises/

| I tried it but hit a problem. Here's the code I used:

| proc promised {pdf1 pdf2} {
| set p1 [promise::pexec $::PDFTOTEXT $::OPT $pdf1 - 2>@1]
| set p2 [promise::pexec $::PDFTOTEXT $::OPT $pdf2 - 2>@1]
| set waiter [promise::all [list $p1 $p2]]
| # Assumes eventloop not running!
| promise::eventloop $waiter
| set tsv1 [$p1 getdata]
| set tsv2 [$p2 getdata]

promise::eventloop already returns the result of the 'waiter' promise
(i.e. those registered in promise::all).

So change those two 'getdata' calls to

lassign [promise::eventloop $waiter] tsv1 tsv2

| puts " tsv1=[string length $tsv1] tsv2=[string length $tsv2]"

HTH
R'

Thanks, I've now done that. Here are the new timings (each is the best
of several):

sec method
2.010 serial
1.052 multiprocess
1.065 thread_pool
1.067 threaded
8.366 promised
--- Synchronet 3.22a-Linux NewsLink 1.2

From meshparts@alexandru.dadalau@meshparts.de to comp.lang.tcl on Thu May 14 10:00:10 2026

From Newsgroup: comp.lang.tcl

Am 13.05.2026 um 10:54 schrieb Mark Summerfield:

Thanks, I've now done that. Here are the new timings (each is the best
of several):

sec method
2.010 serial
1.052 multiprocess
1.065 thread_pool
1.067 threaded
8.366 promised

So with promisses it's 4x slower than serial?
--- Synchronet 3.22a-Linux NewsLink 1.2

From Ashok@apnmbx-public@yahoo.com to comp.lang.tcl on Thu May 14 17:12:29 2026

From Newsgroup: comp.lang.tcl

I would not be in support of this myself. As it is I'm skeptical of
adding packages to the core because there simply are not enough folks to maintain the packages already there.

Additionally, promises are still a "fringe" idiom in Tcl land and not
widely used or adopted.

/Ashok

On 5/11/2026 2:38 PM, Ralf Fassel wrote:

Ashok,
since coroutines are already part of TCL, any chance of getting promises
into the core? It would seem to me as a 'natural' addition for async features in TCL, and the package looks quite mature...

R'

--- Synchronet 3.22a-Linux NewsLink 1.2

From Ashok@apnmbx-public@yahoo.com to comp.lang.tcl on Thu May 14 17:20:51 2026

From Newsgroup: comp.lang.tcl

I'm surprised by the promises result below (not that I doubt it). I'll
have to take a look when I have some time.

In my tests that I posted earlier, the promise version took about the
same time as the multiprocess one.

The difference between my example and yours is that in my example,
pdftotext was writing to a file and not to its stdout. In your example,
it is writing back to the pipe and read directly in Tcl.

I wonder if the difference stems from your code essentially doing a busy
loop reading data while the promise version goes through the event loop
though I cannot explain why that would make that much difference.

Worth investigating further when I have time...

/Ashok

On 5/13/2026 2:24 PM, Mark Summerfield wrote:

Thanks, I've now done that. Here are the new timings (each is the best
of several):

sec method
2.010 serial
1.052 multiprocess
1.065 thread_pool
1.067 threaded
8.366 promised

--- Synchronet 3.22a-Linux NewsLink 1.2

From Mark Summerfield@m.n.summerfield@gmail.com to comp.lang.tcl on Fri May 15 10:38:01 2026

From Newsgroup: comp.lang.tcl

On Thu, 14 May 2026 17:20:51 +0530, Ashok wrote:

I'm surprised by the promises result below (not that I doubt it). I'll
have to take a look when I have some time.

In my tests that I posted earlier, the promise version took about the
same time as the multiprocess one.

The difference between my example and yours is that in my example,
pdftotext was writing to a file and not to its stdout. In your example,
it is writing back to the pipe and read directly in Tcl.

I wonder if the difference stems from your code essentially doing a busy loop reading data while the promise version goes through the event loop though I cannot explain why that would make that much difference.

Worth investigating further when I have time...

/Ashok

On 5/13/2026 2:24 PM, Mark Summerfield wrote:

Thanks, I've now done that. Here are the new timings (each is the best
of several):

sec method
2.010 serial
1.052 multiprocess
1.065 thread_pool
1.067 threaded
8.366 promised

In the hope it helps, below is the full source for the example I used.
I ran it on Tcl/Tk 9.0.3 (64-bit), Debian GNU/Linux 12 (bookworm)
Linux 6.1.0-44-amd64 (x86_64), 12th Gen Intel Core i7-12700 20 cores.
I used two PDF files both of 647 pages and did several runs of each
method to find the best time.

#!/usr/bin/env tclsh9
# usage: time ./concurrent.tcl <s|m|p|t> <file1.pdf> <file2.pdf>

package require thread 3
tcl::tm::path add .
package require promise

const OPT -tsv ;# If older pdftotext doesn't support -tsv use -bbox
const PDFTOTEXT [auto_execok pdftotext]

proc main {} {
set pdf1 [lindex $::argv 1]
set pdf2 [lindex $::argv 2]
switch [lindex $::argv 0] {
h - -h - --help {
puts "usage: <s|m|p|t|P> <file1.pdf> <file2.pdf"
exit
}
s { serial $pdf1 $pdf2 }
m { multiprocess $pdf1 $pdf2 }
p { thread_pool $pdf1 $pdf2 }
t { threaded $pdf1 $pdf2 }
P { promised $pdf1 $pdf2 }
}
}

proc serial {pdf1 pdf2} {
puts -nonewline "serial "
set tsv1 [exec $::PDFTOTEXT $::OPT $pdf1 - 2>@1]
set tsv2 [exec $::PDFTOTEXT $::OPT $pdf2 - 2>@1]
puts " tsv1=[string length $tsv1] tsv2=[string length $tsv2]"
}

proc multiprocess {pdf1 pdf2} {
puts -nonewline multiprocess
set p1 [open "|$::PDFTOTEXT $::OPT $pdf1 - 2>@1" r]
try {
set p2 [open "|$::PDFTOTEXT $::OPT $pdf2 - 2>@1" r]
try {
fconfigure $p1 -blocking 0
fconfigure $p2 -blocking 0
set tsv1 ""
set tsv2 ""
while {![eof $p1] || ![eof $p2]} {
append tsv1 [read $p1]
append tsv2 [read $p2]
after 1
}
} finally {
close $p2
}
} finally {
close $p1
}
puts " tsv1=[string length $tsv1] tsv2=[string length $tsv2]"
}

proc thread_pool {pdf1 pdf2} {
puts -nonewline "thread pool "
set pool [tpool::create -minworkers 2]
set job1 [tpool::post -nowait $pool \
"exec $::PDFTOTEXT $::OPT $pdf1 - 2>@1"]
set job2 [tpool::post -nowait $pool \
"exec $::PDFTOTEXT $::OPT $pdf2 - 2>@1"]
set job_ids [list $job1 $job2]
while {[llength $job_ids] > 0} {
foreach job_id [tpool::wait $pool $job_ids job_ids] {
if {$job_id eq $job1} {
set tsv1 [tpool::get $pool $job_id]
} else {
set tsv2 [tpool::get $pool $job_id]
}
}
}
puts " tsv1=[string length $tsv1] tsv2=[string length $tsv2]"
}

proc threaded {pdf1 pdf2} {
puts -nonewline "threaded "
set tid1 [thread::create -joinable]
set tid2 [thread::create -joinable]
tsv::set shared pdf1 $pdf1
tsv::set shared pdf2 $pdf2
tsv::set shared pdftotext $::PDFTOTEXT
tsv::set shared opt $::OPT
thread::send -async $tid1 {
tsv::set shared tsv1 \
[exec -encoding utf-8 {*}[tsv::get shared pdftotext] \
[tsv::get shared opt] [tsv::get shared pdf1] - 2>@1]
}
thread::send -async $tid2 {
tsv::set shared tsv2 \
[exec -encoding utf-8 {*}[tsv::get shared pdftotext] \
[tsv::get shared opt] [tsv::get shared pdf2] - 2>@1]
}
thread::release $tid1
thread::join $tid1
thread::release $tid2
thread::join $tid2
set tsv1 [tsv::get shared tsv1]
set tsv2 [tsv::get shared tsv2]
puts " tsv1=[string length $tsv1] tsv2=[string length $tsv2]"
}

proc promised {pdf1 pdf2} {
set p1 [promise::pexec $::PDFTOTEXT $::OPT $pdf1 - 2>@1]
set p2 [promise::pexec $::PDFTOTEXT $::OPT $pdf2 - 2>@1]
set waiter [promise::all [list $p1 $p2]]
# Assumes eventloop not running!
lassign [promise::eventloop $waiter] tsv1 tsv2
puts " tsv1=[string length $tsv1] tsv2=[string length $tsv2]"
}

main
--- Synchronet 3.22a-Linux NewsLink 1.2

From Ashok@apnmbx-public@yahoo.com to comp.lang.tcl on Sun May 17 12:36:55 2026

From Newsgroup: comp.lang.tcl

On 5/15/2026 4:08 PM, Mark Summerfield wrote:

In the hope it helps, below is the full source for the example I used.
I ran it on Tcl/Tk 9.0.3 (64-bit), Debian GNU/Linux 12 (bookworm)
Linux 6.1.0-44-amd64 (x86_64), 12th Gen Intel Core i7-12700 20 cores.
I used two PDF files both of 647 pages and did several runs of each
method to find the best time.

Thanks, having a benchmark source helped. However I cannot reproduce
your results. The promise version is as fast as any other. My laptop is
long in the tooth but that should not make a difference in comparative
terms I think.

Below is what I get using the following shell script:

------
#!/bin/sh

for method in s m p t P; do
for i in $(seq 1 5); do
time -p ~/tcl/9.0.3/x64/bin/tclsh9.0 bench.tcl $method x.pdf
y.pdf 2>&1 | tr '\n' ' '
# Appends to previous line!
echo "Method $method, Run $i"
done
echo "---------------------------------------------------------"
done
-----

I used -bbox instead of -tsv as my pdftotext does not support the
latter. Tests on my Ubuntu 22 WSL. All versions, including promises are
about twice as fast as the serial one. On every run, there seems to be
one or two exceptionally fast anomaly, independent of the method. Not
sure why that is, some fortunate cache or memory effect?

Here are the results, more or less as expected.

serial tsv1=3559848 tsv2=3559848 real 5.59 user 0.82 sys 0.27
Method s, Run 1
serial tsv1=3559848 tsv2=3559848 real 5.50 user 0.82 sys 0.29
Method s, Run 2
serial tsv1=3559848 tsv2=3559848 real 5.61 user 0.82 sys 0.26
Method s, Run 3
serial tsv1=3559848 tsv2=3559848 real 4.24 user 0.83 sys 0.25
Method s, Run 4
serial tsv1=3559848 tsv2=3559848 real 5.59 user 0.81 sys 0.27
Method s, Run 5
---------------------------------------------------------
multiprocess tsv1=3559849 tsv2=3559849 real 3.13 user 0.95 sys 0.33
Method m, Run 1
multiprocess tsv1=3559849 tsv2=3559849 real 3.05 user 0.89 sys 0.36
Method m, Run 2
multiprocess tsv1=3559849 tsv2=3559849 real 3.12 user 0.95 sys 0.31
Method m, Run 3
multiprocess tsv1=3559849 tsv2=3559849 real 3.13 user 0.96 sys 0.30
Method m, Run 4
multiprocess tsv1=3559849 tsv2=3559849 real 3.13 user 0.97 sys 0.30
Method m, Run 5
---------------------------------------------------------
thread pool tsv1=3559848 tsv2=3559848 real 3.21 user 0.93 sys 0.40
Method p, Run 1
thread pool tsv1=3559848 tsv2=3559848 real 3.15 user 0.90 sys 0.39
Method p, Run 2
thread pool tsv1=3559848 tsv2=3559848 real 1.79 user 0.94 sys 0.37
Method p, Run 3
thread pool tsv1=3559848 tsv2=3559848 real 3.14 user 0.97 sys 0.31
Method p, Run 4
thread pool tsv1=3559848 tsv2=3559848 real 3.10 user 0.90 sys 0.37
Method p, Run 5
---------------------------------------------------------
threaded tsv1=3559848 tsv2=3559848 real 3.17 user 0.90 sys 0.41
Method t, Run 1
threaded tsv1=3559848 tsv2=3559848 real 3.14 user 0.90 sys 0.39
Method t, Run 2
threaded tsv1=3559848 tsv2=3559848 real 3.14 user 0.90 sys 0.37
Method t, Run 3
threaded tsv1=3559848 tsv2=3559848 real 3.14 user 0.94 sys 0.31
Method t, Run 4
threaded tsv1=3559848 tsv2=3559848 real 3.14 user 0.92 sys 0.35
Method t, Run 5
---------------------------------------------------------
promise tsv1=3559849 tsv2=3559849 real 3.33 user 2.68 sys 0.42
Method P, Run 1
promise tsv1=3559849 tsv2=3559849 real 3.30 user 2.48 sys 0.46
Method P, Run 2
promise tsv1=3559849 tsv2=3559849 real 1.94 user 2.48 sys 0.46
Method P, Run 3
promise tsv1=3559849 tsv2=3559849 real 3.35 user 2.69 sys 0.39
Method P, Run 4
promise tsv1=3559849 tsv2=3559849 real 3.31 user 2.64 sys 0.41
Method P, Run 5
--- Synchronet 3.22a-Linux NewsLink 1.2

From Mark Summerfield@m.n.summerfield@gmail.com to comp.lang.tcl on Mon May 18 07:11:14 2026

From Newsgroup: comp.lang.tcl

The most obvious difference is that I am running Linux on the hardware
and (I think) you are running Linux on Windows. All I can suggest is
trying the same test on Linux that's running directly on the hardware?
--- Synchronet 3.22a-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Geek2
  Sun May 17 07:06:15 2026
  from Euclid, Oh via Telnet
- Geek2
  Sat May 16 21:25:04 2026
  from Euclid, Oh via Telnet
- Jas Hud
  Sat May 16 00:50:28 2026
  from Bbs.Eob-Bbs.Com,wi via Telnet
- Geek2
  Fri May 15 19:53:20 2026
  from Euclid, Oh via Telnet

System Info

Sysop:	Amessyroom
Location:	Fayetteville, NC
Users:	65
Nodes:	6 (0 / 6)
Uptime:	10:19:49
Calls:	862
Files:	1,311
D/L today:	3 files (7,546K bytes)
Messages:	265,185

Fastest way to run two external processes

Who's Online

Recent Visitors

System Info