Sysop: | Amessyroom |
---|---|
Location: | Fayetteville, NC |
Users: | 43 |
Nodes: | 6 (0 / 6) |
Uptime: | 104:13:00 |
Calls: | 290 |
Files: | 905 |
Messages: | 76,612 |
On 03/09/2024 18:54, Stephen Fuld wrote:
On 9/2/2024 11:23 PM, David Brown wrote:
On 02/09/2024 18:46, Stephen Fuld wrote:
On 9/2/2024 1:23 AM, Terje Mathisen wrote:
Anyway, that is all mostly moot since I'm using Rust for this kind
of programming now. :-)
Can you talk about the advantages and disadvantages of Rust versus C?
And also for Rust versus C++ ?
I asked about C versus Rust as Terje explicitly mentioned those two
languages, but you make a good point in general.
I want to know about both :-)
In my field, small-systems embedded development, C has been dominant for
a long time, but C++ use is increasing. Most of my new stuff in recent times has been C++. There are some in the field who are trying out
Rust, so I need to look into it myself - either because it is a better
choice than C++, or because customers might want it.
My impression - based on hearsay for Rust as I have no experience -
is that the key point of Rust is memory "safety". I use scare-quotes
here, since it is simply about correct use of dynamic memory and
buffers.
I agree that memory safety is the key point, although I gather that it
has other features that many programmers like.
Sure. There are certainly plenty of things that I think are a better
idea in a modern programming language and that make it a good step up compared to C. My key interest is in comparison to C++ - it is a step
up in some ways, a step down in others, and a step sideways in many features. But is it overall up or down, for /my/ uses?
Examples of things that I think are good in Rust are making variables immutable by default and pattern matching. Steps down include lack of function overloading
On Fri, 13 Sep 2024 04:12:21 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
struct {
char x[8]
int y;
} bar;
bar.y = 0; bar.x[8] = 42;
IMHO, here behavior should be fully defined by implementation. And
in practice it is. Just not in theory.
Do you mean union rather than struct? And do you mean bar.x[7]
rather than bar.x[8]? Surely no one would expect that storing
into bar.x[8] should be well-defined behavior.
If the code were this
union {
char x[8];
int y;
} bar;
bar.y = 0; bar.x[7] = 42;
and assuming sizeof(int) == 4, what is it that you think should
be defined by the C standard but is not? And the same question
for a struct if that is what you meant.
No, I mean struct and I mean 8.
And I mean that a typical implementation-defined behavior would be
bar.y==42 on LE machines and bar.y==42*2**24 on BE machines.
As it actually happens in reality with all production compilers.
Michael S <already5chosen@yahoo.com> schrieb:
On Fri, 13 Sep 2024 04:12:21 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
struct {
char x[8]
int y;
} bar;
bar.y = 0; bar.x[8] = 42;
IMHO, here behavior should be fully defined by implementation. And
in practice it is. Just not in theory.
Do you mean union rather than struct? And do you mean bar.x[7]
rather than bar.x[8]? Surely no one would expect that storing
into bar.x[8] should be well-defined behavior.
If the code were this
union {
char x[8];
int y;
} bar;
bar.y = 0; bar.x[7] = 42;
and assuming sizeof(int) == 4, what is it that you think should
be defined by the C standard but is not? And the same question
for a struct if that is what you meant.
No, I mean struct and I mean 8.
And I mean that a typical implementation-defined behavior would be
bar.y==42 on LE machines and bar.y==42*2**24 on BE machines.
As it actually happens in reality with all production compilers.
Ah, you want to re-introduce Fortran's storage association and
common blocks, but without the type safety.
Good idea, that.
That created *really* interesting bugs, and Real Programmers (TM)
have to have something that pays their salaries, right?
SCNR
On Fri, 13 Sep 2024 21:39:39 +0000, Thomas Koenig wrote:
Michael S <already5chosen@yahoo.com> schrieb:
On Fri, 13 Sep 2024 04:12:21 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Michael S <already5chosen@yahoo.com> writes:
struct {
char x[8]
int y;
} bar;
bar.y = 0; bar.x[8] = 42;
IMHO, here behavior should be fully defined by implementation. And
in practice it is. Just not in theory.
Do you mean union rather than struct? And do you mean bar.x[7]
rather than bar.x[8]? Surely no one would expect that storing
into bar.x[8] should be well-defined behavior.
If the code were this
union {
char x[8];
int y;
} bar;
bar.y = 0; bar.x[7] = 42;
and assuming sizeof(int) == 4, what is it that you think should
be defined by the C standard but is not? And the same question
for a struct if that is what you meant.
No, I mean struct and I mean 8.
And I mean that a typical implementation-defined behavior would be
bar.y==42 on LE machines and bar.y==42*2**24 on BE machines.
As it actually happens in reality with all production compilers.
Ah, you want to re-introduce Fortran's storage association and
common blocks, but without the type safety.
FORTAN allowed::
subroutine1:
COMMON /ALPHA/i,j,k,l,m,n
subroutine2:
COMMON /ALPHA/x.y.z
expecting {i,j} which are INT*4 to overlap with x Read*8 ;...
{Completely neglecting the BE/LE problems,...}
On 9/13/2024 10:55 AM, Thomas Koenig wrote:
David Brown <david.brown@hesbynett.no> schrieb:
Most of the commonly used parts of C99 have been "safe" to use for 20
years. There were a few bits that MSVC did not implement until
relatively recently, but I think even have caught up now.
What about VLAs?
IIRC, VLAs and _Complex and similar still don't work in MSVC.
Most of the rest does now at least.
On Thu, 19 Sep 2024 19:12:41 +0000, Brett wrote:
MitchAlsup1 <mitchalsup@aol.com> wrote:
On Thu, 19 Sep 2024 15:07:11 +0000, EricP wrote:
- register specifier fields are either source or dest, never both
I happen to be wishywashy on this
This is deeply interesting, can you expound on why it is fine a register
field can be shared by loads and stores, and sometimes both like x86.
My 66000 encodes store data register in the same field position as it
encodes "what kind of branch" is being performed, and the same position
as all calculation (and load) results.
I started doing this in 1982 with Mc88100 ISA, and never found a problem
with the encoding nor in the decoding nor with the pipelining of it.
Let me be clear, I do not support necessarily damaging a source operand
to fit in another destination as::
ADD SP,SP,#0x40
by specifying SP only once in the instruction.
So,
+------+-----+-----+----------------+
| major| Rd | Rs1 | whatever |
+------+-----+-----+----------------+
| BC | cnd | Rs1 | label offset |
+------+-----+-----+----------------+
| LD | Rd | Rb | displacement |
+------+-----+-----+----------------+
| ST | Rs0 | Rb | displacement |
+------+-----+-----+----------------+
Is:
a) no burden in encoding
b) no burden in decoding
c) no burden in pipelining
d) no burden in stealing the Store data port late in the pipeline
{in particular, this saves lots of flip-flops deferring store
data until after cache hit, TLB hit, and data has arrived at
cache.}
I disagree with things like::
+------+-----+-----+----------------+
| big OpCode | Rds | whatever |
+------+-----+-----+----------------+
Where Rds means the specifier is used as both a source and destination.
Notice in my encoding one can ALWAYS take the register specification
fields and wire them directly into the RF/renamer decoder ports.
You lose this property the other way around.
MitchAlsup1 wrote:
On Thu, 19 Sep 2024 19:12:41 +0000, Brett wrote:
MitchAlsup1 <mitchalsup@aol.com> wrote:
On Thu, 19 Sep 2024 15:07:11 +0000, EricP wrote:
- register specifier fields are either source or dest, never both
I happen to be wishywashy on this
This is deeply interesting, can you expound on why it is fine a register >>> field can be shared by loads and stores, and sometimes both like x86.
My 66000 encodes store data register in the same field position as it
encodes "what kind of branch" is being performed, and the same position
as all calculation (and load) results.
I started doing this in 1982 with Mc88100 ISA, and never found a problem
with the encoding nor in the decoding nor with the pipelining of it.
Let me be clear, I do not support necessarily damaging a source operand
to fit in another destination as::
ADD SP,SP,#0x40
by specifying SP only once in the instruction.
So,
+------+-----+-----+----------------+
| major| Rd | Rs1 | whatever |
+------+-----+-----+----------------+
| BC | cnd | Rs1 | label offset |
+------+-----+-----+----------------+
| LD | Rd | Rb | displacement |
+------+-----+-----+----------------+
| ST | Rs0 | Rb | displacement |
+------+-----+-----+----------------+
Is:
a) no burden in encoding
b) no burden in decoding
c) no burden in pipelining
d) no burden in stealing the Store data port late in the pipeline
{in particular, this saves lots of flip-flops deferring store
data until after cache hit, TLB hit, and data has arrived at
cache.}
I disagree with things like::
+------+-----+-----+----------------+
| big OpCode | Rds | whatever |
+------+-----+-----+----------------+
Where Rds means the specifier is used as both a source and destination.
Notice in my encoding one can ALWAYS take the register specification
fields and wire them directly into the RF/renamer decoder ports.
You lose this property the other way around.
I assume in your examples that you want to start your register file
read access and or rename register lookup access in the decode stage,
and not wait to start at the end of the decode stage.
Effectively pipelining those accesses.
That's fine.
But that's my point - it doesn't make a difference because in both
cases you can wire the reg fields to the reg file or rename directly
and start the access ASAP.
In both cases the enable signal determining what to do shows up
later after decode has done its thing. And the critical path for
that decode enable signal is the same both ways.
And if you are not doing this early access start but the traditional
of latch the decode output THEN start your RegRd or Rename access
it makes no timing difference at all.
By allowing the opcode-Rds style instructions to be *CONSIDERED*
it opens an avenue to potential instructions that cost little or
nothing extra in terms of logic or performance.
And this is particularly useful with fixed width 32-bit instructions
where one is try to pack as much function into a fixed size space as possible. Even more so with 16-bit compact instructions.
For example, a 32-bit fixed format instruction with four 5-bit registers could do a full width integer multiply wide-accumulate
IMAC (Rsd_hi,Rsd_lo) = (Rsd_hi,Rsd_lo) + Rs1 * Rs2
with little more logic than the existing MULL,MULH approach.
It still only needs 2 read ports because Rs1,Rs2 are read first to start
the multiply, then (Rsd_hi,Rsd_lo) second as they aren't needed until
late in the multiply-accumulate.
On 9/18/2024 1:42 PM, MitchAlsup1 wrote:
One simple option would be to assume an instruction looks like:
[Prefix Bytes]
[REX byte]
OP_Byte | 0F+OP_Byte
Mod/RM + SIB + ...
And then use a heuristic to try to guess how to interpret the
instruction stream based on "looks better" (more likely to be aligned
with the instruction stream vs random unaligned garbage).
Though, such a "looks good" heuristic could itself risk skewing the
results.
I may still consider defining an encoding for this, but not yet. It is
in a similar boat as auto-increment. Both add resource cost with
relatively little benefit in terms of overall performance.
Auto-increment because if one has superscalar, the increment can usually >>> be co-executed. And, full [Rb+Ri*Sc+Disp], because it is just too
infrequent to really justify the extra cost of a 3-way adder even if
limited mostly to the low-order bits...
Myopathy--look it up.
OK.
Not sure how that is related (a medical condition involving muscle defects...).
Can also note that a worthwhile design goal is to not add significant
cost over what would be needed for a plain RV64GC implementation, but,
could define a [Rb+Ri*Sc+Disp] encoding or similar if it would likely be beneficial enough to justify its existence.
Myopathy is NEAR SIGHTEDNESS.
On 9/22/24 6:19 PM, MitchAlsup1 wrote:
On 9/19/24 11:07 AM, EricP wrote:
<sound of soap box being dragged out>
This idea that macro-op fusion is some magic solution is bullshit.
The argument is, at best, of Academic Quality, made by a student
at the time as a way to justify RISC-V not having certain easy
for HW to perform calculations.
The RISC-V published argument for fusion is not great, but fusion
(and cracking/fission) seem natural architectural mechanisms *if*
one is stuck with binary compatibility.
On 9/22/24 6:19 PM, MitchAlsup1 wrote:
On Sun, 22 Sep 2024 20:43:38 +0000, Paul A. Clayton wrote:
On 9/19/24 11:07 AM, EricP wrote:
[snip]
If the multiplier is pipelined with a latency of 5 and throughput
of 1,
then MULL takes 5 cycles and MULL,MULH takes 6.
But those two multiplies still are tossing away 50% of their work.
I do not remember how multipliers are actually implemented — and
am not motivated to refresh my memory at the moment — but I
thought a multiply low would not need to generate the upper bits,
so I do not understand where your "50% of their work" is coming
from.
+-----------+ +------------+
\ mplier / \ mcand / Big input mux >> +--------+ +--------+
| |
| +--------------+
| / /
| / /
+-- / /
/ Tree /
/ /--+
/ / |
/ / |
+---------------+-----------+
hi low Products
two n-bit operands are multiplied into a 2×n-bit result.
{{All the rest is HOW not what}}
So are you saying the high bits come for free? This seems
contrary to the conception of sums of partial products, where
some of the partial products are only needed for the upper bits
and so could (it seems to me) be uncalculated if one only wanted
the lower bits.
The high result needs the low result carry-out but not the rest of
the result. (An approximate multiply high for multiply by
reciprocal might be useful, avoiding the low result work. There
might also be ways that a multiplier could be configured to also
provide bit mixing similar to middle result for generating a
hash?)
I seem to recall a PowerPC implementation did semi-pipelined 32-
bit multiplication 16-bits at a time. This presumably saved area
and power
You save 1/2 of the tree area, but ultimately consume more power.
The power consumption would seem to depend on how frequently both
multiplier and multiplicand are larger than 16 bits. (However, I
seem to recall that the mentioned implementation only checked one
operand.) I suspect that for a lot of code, small values are
common.
My 66000's CARRY and PRED are "extender prefixes", admittedly
included in the original architecture so compensating for encoding constraints (e.g., not having 36-bit instruction parcels) rather
than microarchitectural or architectural variation.
[snip]>> (I feel that encoding some of the dependency information
could
be useful to avoid some of this work. In theory, common
dependency detection could also be more broadly useful; e.g.,
operand availability detection and execution/operand routing.)
So useful that it is encoded directly in My 66000 ISA.
How so? My 66000 does not provide any explicit declaration what
operation will be using a result (or where an operand is being
sourced from). Register names express the dependencies so the
dataflow graph is implicit.
I was speculating that _knowing_ when an operand will be available
and where a result should be sent (rather than broadcasting) could
be useful information.
Even with reduced operations per cycle, fusion could still provide
a net energy benefit.
Here I disagree:: but for a different reason::
In order for RISC-V to use a 64-bit constant as an operand, it has
to execute either:: AUPIC-LD to an area of memory containing the
64-bit constant, or a 6-7 instruction stream to build the constant
inline. While an ISA that directly supports 64-bit constants in ISA
does not execute any of those.
Thus, while it may save power seen at the "its my ISA" level it
may save power, but when seem from the perspective of "it is
directly supported in my ISA" it wastes power.
Yes, but "computing" large immediates is obviously less efficient
(except for compression), the computation part is known to be
unnecessary. Fusing a comparison and a branch may be a consequence
of bad ISA design in not properly estimating how much work an
instruction can do (and be encoded in available space) and there
is excess decode overhead with separate instructions, but the
individual operations seem to be doing actual work.
I suspect there can be cases where different microarchitectures
would benefit from different amounts of instruction/operation
complexity such that cracking and/or fusion may be useful even in
an optimally designed generic ISA.
[snip]
- register specifier fields are either source or dest, never both
This seems mostly a code density consideration. I think using a
single name for both a source and a destination is not so
horrible, but I am not a hardware guy.
All we HW guys want is the where ever the field is specified,
it is specified in exactly 1 field in the instruction. So, if
field<a..b> is used to specify Rd in one instruction, there is
no other field<!a..!b> specifies the Rd register. RISC-V blew
this "requirement.
Only with the Compressed extension, I think. The Compressed
extension was somewhat rushed and, in my opinion, philosophically
flawed by being redundant (i.e., every C instruction can be
expanded to a non-C instruction). Things like My 66000's ENTER
provide code density benefits but are contrary to the simplicity
emphasis. Perhaps a Rho (density) extension would have been
better.☺ (The extension letter idea was interesting for an
academic ISA but has been clearly shown to be seriously flawed.)
16-bit instructions could have kept the same register field
placements with masking/truncation for two-register-field
instructions.
Even a non-destructive form might be provided by
different masking or bit inversion for the destination. However,
providing three register fields seems to require significant
irregularity in extracting register names. (Another technique
would be using opcode bits for specifying part or all of a
register name. Some special purpose registers or groups of
registers may not be horrible for compiler register allocation,
but such seems rather funky/clunky.)
It is interesting that RISC-V chose to split the immediate field
for store instructions so that source register names would be in
the same place for all (non-C) instructions.
Comparing an ISA design to RISC-V is not exactly the same as
comparing to "best in class".