Sysop: | Amessyroom |
---|---|
Location: | Fayetteville, NC |
Users: | 43 |
Nodes: | 6 (0 / 6) |
Uptime: | 104:26:54 |
Calls: | 290 |
Files: | 905 |
Messages: | 76,612 |
Yes, but that was a misunderstanding. I'm not suggesting thatSeems like an odd place to put what are in practice just flag bits.
load/store instructions can access things at any bit position and any
bit size. Any load or store with a pointer whose last 3 bits is not 0 would >>>presumably signal en error.
It's a very natural one, tho.
Byte addressing is somewhat arbitrary
(why 8 bits, why not 16 or 4 or 6 or 9 ...?), whereas bit-addressing has
some logic to it (fractional bit addressing would be hard to define).
I don't see why.Seems like an odd place to put what are in practice just flag bits.It's a very natural one, tho.
We agree that nobody expects bit addressing to work, so in fact those
are flag bits.
You can use them to point at bits if you want but there's no
architectural or practical reason to do so.
8-bits won because it was enough (at the time of inception {IBM
360--1963})
8-bits is also a convenient multiple of four bits, which was common
in many machines prior to the 360. The hardware in burroughs
BCD machines could automatically add/remove the zone digit (bits <7:4>) >during data movement.
FORTRAN COMMON blocks require misaligned accesses to double precision
data.
R E Q U I R E in that it is neither optional nor wise to emulate with exceptions. It is just barely tolerable using LD/ST Left/Right
instructions
out of the compiler.
I, personally, went through enough PAIN with misalignment, that over
time my mood swung from "aligned only" to "completely misaligned"::
a) because there is no performant* SW workaround
b) it is SO easy to fix in HW.
c) once fixed in HW, any SW burden is so small as to be barely
..measurable.
MitchAlsup1 <mitchalsup@aol.com> wrote:
FORTRAN COMMON blocks require misaligned accesses to double precision
data.
R E Q U I R E in that it is neither optional nor wise to emulate with
exceptions. It is just barely tolerable using LD/ST Left/Right
instructions
out of the compiler.
I, personally, went through enough PAIN with misalignment, that over
time my mood swung from "aligned only" to "completely misaligned"::
a) because there is no performant* SW workaround
b) it is SO easy to fix in HW.
c) once fixed in HW, any SW burden is so small as to be barely
..measurable.
I'm not so sure (b) is true. Some cases are moderately easy to handle
in hardware (e.g., misaligned loads that stay within a single L1 D-cache line), but some cases are harder (e.g., misaligned writes that cross L1 D-cache line boundaries) and might need a microcode trap (awkward if the design wasn't otherwise using microcode). And some cases are even
harder
(e.g., misaligned writes crossing L1 D-cache line boundaries where the
two lines are owned by different CPUs in a cache-coherent
multiprocessor)
and might need a millicode trap. And some cases may require going all
the
way up to the OS (e.g., misaligned writes that cross virtual-memory-page boundaries where one page is ok but the other is non-resident).
So, allowing this in the architecture has several costs:
* extra hardware implementation effort to make sure the "hardware" cases
don't cost an extra gate delay or two on some critical path
* extra complexity and debugging time in hardware and in system software
(think about writing and *debugging* and *verifying*
microcode/millicode
trap handlers for all those messy write-crossing-cache/page-boundary
cases, especially their interactions with multiprocessor cache
coherency)
* this extra effort means a longer design time and/or greater design
cost,
and hence (so long as the state-of-the-art of competing systems is
still
steadily improving with time) that means a net lower price/performance
relative to competing systems
And, because of the traps
and their overheads (which will likely differ significantly across different implementations of the same architecture, e.g., different multiprocessor cache-coherency protocols), any code that actually *uses* unaligned accesses -- especially unaligned writes --
isn't
performance-portable unless the actual dynamic frequency of unaligned operations is very low.
So yes, allowing unaligned access does help "dusty deck" Fortran code...
but it comes at a significant cost.
On Sat, 21 Dec 2024 23:22:35 +0000, Jonathan Thornburg wrote:
Any competent programmer will ALIGN his data to the extend possible
there is no reason to penalize {Compiler, assembler, linker, ld.so,...}
just because you want to take 5 days out of design.
On Sat, 21 Dec 2024 23:22:35 +0000, Jonathan Thornburg wrote:
So yes, allowing unaligned access does help "dusty deck" Fortran code...
but it comes at a significant cost.
And some cases are even harder
(e.g., misaligned writes crossing L1 D-cache line boundaries where the
two lines are owned by different CPUs in a cache-coherent multiprocessor)
and might need a millicode trap.
And some cases may require going all the
way up to the OS (e.g., misaligned writes that cross virtual-memory-page >boundaries where one page is ok but the other is non-resident).
And, because of the traps and their overheads (which will likely differ >significantly across different implementations of the same architecture, >e.g., different multiprocessor cache-coherency protocols), any code that >actually *uses* unaligned accesses -- especially unaligned writes -- isn't >performance-portable unless the actual dynamic frequency of unaligned >operations is very low.
So yes, allowing unaligned access does help "dusty deck" Fortran code...
but it comes at a significant cost.
These days, the competence of many programmers can be called into
question :-)
ABIs, however, generally require natural alignment for types, so
the point is somewhat moot, at least where user code is concerned.
Consider
ABIs, however, generally require natural alignment for types, so
the point is somehwat moot, at least where user code is concerned.
I think
the VAX was the last major architecture which specified unaligned
struct access.
Aligned data is always best, Misaligned data comes at very low cost.
SW overhead = 0
MitchAlsup1 <mitchalsup@aol.com> schrieb:
Aligned data is always best, Misaligned data comes at very low cost.
SW overhead = 0
Thinking about this for a bit... for a clean-sheet architecture
like My66000, could there actually be an advantage to do
struct layout like the VAX did, with everything aligned on byte
boundaries? On the plus side, there would be lower memory use.
On the minus side... very low cost, as you wrote above.
Aligned data is always best, Misaligned data comes at very low cost.Thinking about this for a bit... for a clean-sheet architecture
SW overhead = 0
like My66000, could there actually be an advantage to do
struct layout like the VAX did, with everything aligned on byte
boundaries?
Aligned data is always best, Misaligned data comes at very low cost.Thinking about this for a bit... for a clean-sheet architecture
SW overhead = 0
like My66000, could there actually be an advantage to do
struct layout like the VAX did, with everything aligned on byte
boundaries?
I highly doubt it. Making unaligned accesses work efficiently is great,
but that's no reason to abuse them:
- Going back to Mitch's description, in case B.1 the misalignment is
truly "free", but for B.2, B.3, and B.4 the misalignment does come at
a cost, not necessarily visible in terms of cycles but at least in
terms of cache bandwidth, which can have an impact on overall speed
and energy use.
Of course, properly aligning your data will also come with costs,
but "packed structs" don't come totally free.
- AFAIK most efforts to support concurrency take it for granted that
atomic accesses are supported only when properly aligned.
I expect it's at least as easy (and more portable) to reorder fields by
order of (expected) size to avoid excessive padding in aligned data,
than it is to add manual padding/alignment to avoid the cost of
misalignment in "packed structs".
Stefan
Aligned data is always best, Misaligned data comes at very low cost.Thinking about this for a bit... for a clean-sheet architecture
SW overhead = 0
like My66000, could there actually be an advantage to do
struct layout like the VAX did, with everything aligned on byte
boundaries?
I highly doubt it. Making unaligned accesses work efficiently is great,
but that's no reason to abuse them:
- Going back to Mitch's description, in case B.1 the misalignment is
truly "free", but for B.2, B.3, and B.4 the misalignment does come at
a cost, not necessarily visible in terms of cycles but at least in
terms of cache bandwidth, which can have an impact on overall speed
and energy use.
Of course, properly aligning your data will also come with costs,
but "packed structs" don't come totally free.
- AFAIK most efforts to support concurrency take it for granted that
atomic accesses are supported only when properly aligned.
I expect it's at least as easy (and more portable) to reorder fields by
order of (expected) size to avoid excessive padding in aligned data,
than it is to add manual padding/alignment to avoid the cost of
misalignment in "packed structs".
Stefan