Sysop: | Amessyroom |
---|---|
Location: | Fayetteville, NC |
Users: | 26 |
Nodes: | 6 (1 / 5) |
Uptime: | 71:45:15 |
Calls: | 482 |
Calls today: | 1 |
Files: | 1,072 |
Messages: | 96,873 |
In the context p=index(substr(t,s),r)
it would not be necessary to copy the substr(t,s),
the index() function could operate on the original
using some access "descriptor" (say, a pointer and
a length) in read-only mode.
Will (GNU) Awk do a copy of the data value or does
it use a read-only descriptor access to the already
existing substring of variable "t"?
Currently I'm playing with some huge data and copies
of MB sized data is costly (if it's repeatedly done
with various substr() subscripts).
In article <101f9oo$18edp$1@dont-email.me>,
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
In the context p=index(substr(t,s),r)
it would not be necessary to copy the substr(t,s),
the index() function could operate on the original
using some access "descriptor" (say, a pointer and
a length) in read-only mode.
Will (GNU) Awk do a copy of the data value or does
it use a read-only descriptor access to the already
existing substring of variable "t"?
Currently I'm playing with some huge data and copies
of MB sized data is costly (if it's repeatedly done
with various substr() subscripts).
substr() makes a copy. This is clear in the code.
It's almost impossible to do this via read-only descriptor.
Consider something like
x = substr($0, 10, 15)
getline
print x
Gawk manages the storage such that for something like
your example the copy will be released after index()
returns a value.
In article <101f9oo$18edp$1@dont-email.me>,
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
In the context p=index(substr(t,s),r)
it would not be necessary to copy the substr(t,s),
the index() function could operate on the original
using some access "descriptor" (say, a pointer and
a length) in read-only mode.
Will (GNU) Awk do a copy of the data value or does
it use a read-only descriptor access to the already
existing substring of variable "t"?
Currently I'm playing with some huge data and copies
of MB sized data is costly (if it's repeatedly done
with various substr() subscripts).
substr() makes a copy. This is clear in the code.
It's almost impossible to do this via read-only descriptor.
On 31.05.2025 21:07, Mack The Knife wrote:...
In article <101f9oo$18edp$1@dont-email.me>,
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
In the context p=index(substr(t,s),r)
it would not be necessary to copy the substr(t,s),
the index() function could operate on the original
using some access "descriptor" (say, a pointer and
a length) in read-only mode.
Will (GNU) Awk do a copy of the data value or does
it use a read-only descriptor access to the already
existing substring of variable "t"?
Currently I'm playing with some huge data and copies
of MB sized data is costly (if it's repeatedly done
with various substr() subscripts).
substr() makes a copy. This is clear in the code.
Okay. Thanks for checking that!
Okay, maybe I could write an extension to work on memory
mapped files - the data originally stems from a file -
and seek/read through "C" mechanisms. (But that's huge
effort compared to some natively available function. And
then I'd probably better implement that straightly in "C"
instead of using Awk, in the first place, since I'd have
to implement the GNU Awk Extension anyway in "C".)
An alternative (depending on the context) would be to consider an
extension that provides an index function with a third argument giving
the initial offset. I've not looked at how extensions get access to
GAWK strings, so this many not be as easy as it sounds, but I would
guess that it might be relatively simple to do.
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
On 31.05.2025 21:07, Mack The Knife wrote:...
In article <101f9oo$18edp$1@dont-email.me>,
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
In the context p=index(substr(t,s),r)
it would not be necessary to copy the substr(t,s),
the index() function could operate on the original
using some access "descriptor" (say, a pointer and
a length) in read-only mode.
Will (GNU) Awk do a copy of the data value or does
it use a read-only descriptor access to the already
existing substring of variable "t"?
Currently I'm playing with some huge data and copies
of MB sized data is costly (if it's repeatedly done
with various substr() subscripts).
substr() makes a copy. This is clear in the code.
Okay. Thanks for checking that!
Okay, maybe I could write an extension to work on memory
mapped files - the data originally stems from a file -
and seek/read through "C" mechanisms. (But that's huge
effort compared to some natively available function. And
then I'd probably better implement that straightly in "C"
instead of using Awk, in the first place, since I'd have
to implement the GNU Awk Extension anyway in "C".)
An alternative (depending on the context) would be to consider an
extension that provides an index function with a third argument giving
the initial offset. I've not looked at how extensions get access to
GAWK strings, so this many not be as easy as it sounds, but I would
guess that it might be relatively simple to do.
An alternative (depending on the context) would be to consider an
extension that provides an index function with a third argument giving
the initial offset. I've not looked at how extensions get access to
GAWK strings, so this many not be as easy as it sounds, but I would
guess that it might be relatively simple to do.
This, first of all, sounds like a good idea! It would make it
unnecessary to (mis-)use the substr() function as (sort of) a
costly copying-descriptor.[*]
I'm unsure about using an extension here. Would there be a name
clash between an built-in index(haystack,needle) function and an
extension index(haystack,needle,start) function? Should they be
separate functions in the first place? (I don't think so.)
In article <101hecq$22ab2$1@dont-email.me>,
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
...
An alternative (depending on the context) would be to consider an
extension that provides an index function with a third argument giving
the initial offset. I've not looked at how extensions get access to
GAWK strings, so this many not be as easy as it sounds, but I would
guess that it might be relatively simple to do.
This, first of all, sounds like a good idea! It would make it
unnecessary to (mis-)use the substr() function as (sort of) a
costly copying-descriptor.[*]
I'm unsure about using an extension here. Would there be a name
clash between an built-in index(haystack,needle) function and an
extension index(haystack,needle,start) function? Should they be
separate functions in the first place? (I don't think so.)
Nobody is talking about changing the index() built-in function.
This would be a brand new function, written as an extension library.
You could name it something like index_ex() if you like, or you could give
it a brand new name (**).
[...]
(**) My conception of how this would be implemented would handle the "start"-only case (just add the offset to the "haystack" arg of strstr() - with error checking to make sure it doesn't overflow, of course). Implementing "end" would be a bit trickier (but not much).
In article <87h60zrbea.fsf@bsb.me.uk>, Ben Bacarisse <ben@bsb.me.uk> wrote: ...
An alternative (depending on the context) would be to consider an
extension that provides an index function with a third argument giving
the initial offset. I've not looked at how extensions get access to
GAWK strings, so this many not be as easy as it sounds, but I would
guess that it might be relatively simple to do.
The thing about writing GAWK extensions is that the first one is hard, because it is all new stuff to learn (and you have to establish your own conventions for how your extensions are going to look, code-wise). [...]
By the way, if you find the substring at position 900005 (i.e., the 5th (*) char of the searched string), should the function return 5 or 900005?
(*) Or 6th; I'm not sure of my exact notation at this point.
You are describing the individual practical accustoming to writing
own extensions.
Okay. My viewpoint was another.
It's IMO a problem if folks write own index() extension in his/her
own version and code quality. We see proliferations of own versions
in all areas of IT; and I don't see it as a desirable goal.
(Anyway. Core evolutions are deprecated. And it won't happen.)
[*] I find it already a bit, umm, strange to have three different >substitution functions in GNU Awk (two historic/standard and one
somewhat generalized and extended).
[**] Do we need grep, egrep, fgrep - on my system they are not
even hardlinks -, or should grep be used with options grep -E,
grep -F ?
Okay, maybe I could write an extension to work on memory
mapped files - the data originally stems from a file -
and seek/read through "C" mechanisms. (But that's huge
effort compared to some natively available function. And
then I'd probably better implement that straightly in "C"
instead of using Awk, in the first place, since I'd have
to implement the GNU Awk Extension anyway in "C".)
Janis
In article <101fv4s$1g5c8$1@dont-email.me>,
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
Okay, maybe I could write an extension to work on memory
mapped files - the data originally stems from a file -
and seek/read through "C" mechanisms. (But that's huge
effort compared to some natively available function. And
then I'd probably better implement that straightly in "C"
instead of using Awk, in the first place, since I'd have
to implement the GNU Awk Extension anyway in "C".)
Janis
You could check w/Arnold about his consulting rates. Perhaps
he would do a patch for you, or write an extension function
for you.
[...]
You are, of course, free to continue whining that it should be changed in
the core, by the official developers. Lots of luck with that.
You could check w/Arnold about his consulting rates. Perhaps
he would do a patch for you, or write an extension function
for you.
On 03.06.2025 08:56, Mack The Knife wrote:
You could check w/Arnold about his consulting rates. Perhaps
he would do a patch for you, or write an extension function
for you.
This is an extremely stupid comment in so many ways!
In article <1022d33$3copv$1@dont-email.me>,
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
On 03.06.2025 08:56, Mack The Knife wrote:
You could check w/Arnold about his consulting rates. Perhaps
he would do a patch for you, or write an extension function
for you.
This is an extremely stupid comment in so many ways!
Just out of curiousity, why?
[...]
I'm not interested in meta-chat. - But you can derive part
of the answer already from my recent reply to Kenny's post
(only a few days ago). HTH.