Sysop: | Amessyroom |
---|---|
Location: | Fayetteville, NC |
Users: | 43 |
Nodes: | 6 (0 / 6) |
Uptime: | 104:32:49 |
Calls: | 290 |
Files: | 905 |
Messages: | 76,619 |
Novell's System Fault Tolerant NetWare 386 (around 1990) supported two complete servers acting like one, so that any hardware component could
fail and the system would keep running, with nothing noticed by the
clients, even those that were in the middle of an update/write
request.
That's fine for workloads that work that way.
Airline reservation systems historically ran on mainframes because when they were invented
that's all there was (original SABRE ran on two 7090s) and they are business critical so
they need to be very reliable.
About 30 years ago some guys at MIT realized that route and fare search, which are some of
the most demanding things that CRS do, are easy to parallelize and don't have to be
particularly reliable -- if your search system crashes and restarts and reruns the search
and the result is a couple of seconds late, that's OK. So they started ITA software which
used racks of PC servers running parallel applications written in Lisp (they were from
MIT) and blew away the competition.
However, that's just the search part. Actually booking the seats and selling tickets stays
on a mainframe or an Oracle system because double booking or giving away free tickets would
be really bad.
There's also a rule of thumb about databases that says one system of performance 100 is
much better than 100 systems of performance 1 because those 100 systems will spend all
their time contending for database locks.
On Fri, 13 Sep 2024 11:20:06 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:
Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
10-15 years ago I talked to another speaker at a conference, he
told me that he was working on high-end open source LDAP software
using _very_ large memory DBs: Their system allowed one US cell
phone company to keep every SIM card (~100M) on a single system,
while a similar-size competitor had been forced to fall back on
17-way sharding (presumably using a hash of the SIM id).
Keeping databases in memory is definitely a thing now... see SAP HANA.
Any architectural implications for this?
Browsing through the SAP pages, it seems they used Intel's Optane
persistent memory, but that is no longer manufactured (?). But
having fast, persistent storage is definitely an advantage for
databases.
Large memory: Of course.
On the ISA level... these databases run on x86, so that seems to
be good enough.
Anything else?
Another thing that SAP HANA seems to use more intensely than anybody
else is Intel TSX. TSX (at least RTM part, I am not sure about HLE
part) still present in the latest Xeon generation, but is strongly de-emphasized.
I had also started pontificating the relative disk throughput had gotten
an order of magnitude slower (disks got 3-5 times faster while systems
got 40-50 times faster) since 360 announce.
There's also a rule of thumb about databases that says one system of
performance 100 is much better than 100 systems of performance 1
because those 100 systems will spend all their time contending for
database locks.
How many transactions per minute does world's biggest company need at
peak hours?
Is not this number small relatively to capabilities of
even 15 y.o. dual-Xeon server with few dozens of spinning rust disks?
Michael S <already5chosen@yahoo.com> schrieb:
On Fri, 13 Sep 2024 11:20:06 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:
Anything else?
Another thing that SAP HANA seems to use more intensely than anybody
else is Intel TSX. TSX (at least RTM part, I am not sure about HLE
part) still present in the latest Xeon generation, but is strongly
de-emphasized.
Sounds like a market niche... Mitch, how good is your ESM for
in-memory databases?
How many transactions per minute does world's biggest company need at
peak hours?
Keeping databases in memory is definitely a thing now... see SAP HANA.
So there's real demand for systems with huge capacity. Not very many of
them, but they have large budgets.
It appears that Michael S <already5chosen@yahoo.com> said:
There's also a rule of thumb about databases that says one system of
performance 100 is much better than 100 systems of performance 1
because those 100 systems will spend all their time contending for
database locks.
How many transactions per minute does world's biggest company need at
peak hours?
Ten years ago Visa could process 56,000 messages/second. It must be a
lot more now. I think a transaction is two or four messages depending
on the transaction type.
Is not this number small relatively to capabilities of
even 15 y.o. dual-Xeon server with few dozens of spinning rust disks?
Uh, no, it is not.
but the question is if
the machine has enough RAM for the database. Our dual-Xeon system
from IIRC 2007 has 24GB of RAM, not sure how big it could be
configured; OTOH, we have a single-Xeon system from 2009 or so with
32GB of RAM (and there were bigger Xeons in the market at the time).
Ten years ago Visa could process 56,000 messages/second.
Brett <ggtgp@yahoo.com> writes:
Speaking of complex things, have you looked at Swift output, as it checks >>all operations for overflow?
You could add an exception type for that, saving huge numbers of correctly >>predicted branch instructions.
The future of programming languages is type safe with checks, you need to >>get on that bandwagon early.
MIPS got on that bandwagon early. It has, e.g., add (which traps on
signed overflow) in addition to addu (which performs modulo
arithmetic). It has been abandoned and replaced by RISC-V several
years ago.
Alpha got on that bandwagon early. It's a descendent of MIPS, but it
renamed add into addv, and addu into add. It has been canceled around
the year 2000.
In article <2024Sep10.094353@mips.complang.tuwien.ac.at>,
Alpha got on that bandwagon early. It's a descendent of MIPS, but it >>renamed add into addv, and addu into add. It has been canceled around
the year 2000.
[ More details about architectures without trapping overflow
instructions ]
Trapping on overflow is basically useless other than as a debug aid,
which clearly nobody values. If you take Rust's approach, and only
detect overflow in debug builds, then you already don't care about performance.
If you want to do almost anything at all other than core dump on
overflow, you need to branch to recovery code. And although it's theoretically possible to recover from the trap, it's worse than any
other approach. So it's added hardware that's HARDER for software to
use. No surprise it's gone away.
But then IEEE 754 exception semantics make even less sense than Linux signals. ...
3) You want to clamp the value to a reasonable range and continue. The
reasonable values need to be looked up somewhere.
On Fri, 20 Sep 2024 22:00:28 +0000, MitchAlsup1 wrote:
But then IEEE 754 exception semantics make even less sense than Linux
signals. ...
Note that what IEEE 754 calls an “exception” is just a bunch of status bits reporting on the current state of the computation: there is no implication of some transfer of control elsewhere.
On Fri, 20 Sep 2024 18:35:26 -0000 (UTC), Kent Dickey wrote:
3) You want to clamp the value to a reasonable range and continue. The
reasonable values need to be looked up somewhere.
This won’t work. The values outside the range are by definition non- representable, so comparisons against them are useless.
On Sat, 21 Sep 2024 1:12:11 +0000, Lawrence D'Oliveiro wrote:
On Fri, 20 Sep 2024 18:35:26 -0000 (UTC), Kent Dickey wrote:
3) You want to clamp the value to a reasonable range and continue. The >>> reasonable values need to be looked up somewhere.
This won’t work. The values outside the range are by definition non-
representable, so comparisons against them are useless.
When a range is 0..10 both -1 and 11 are representable in
the arithmetic of ALL computers, just not in the language
specifying the range.
On Sat, 21 Sep 2024 1:09:43 +0000, Lawrence D'Oliveiro wrote:
On Fri, 20 Sep 2024 22:00:28 +0000, MitchAlsup1 wrote:
But then IEEE 754 exception semantics make even less sense than Linux
signals. ...
Note that what IEEE 754 calls an “exception” is just a bunch of status >> bits reporting on the current state of the computation: there is no
implication of some transfer of control elsewhere.
Then how do you implement the alternate exception model ??? which IS
part of 754-2008 and 754-2019
On Sat, 21 Sep 2024 1:12:11 +0000, Lawrence D'Oliveiro wrote:
On Fri, 20 Sep 2024 18:35:26 -0000 (UTC), Kent Dickey wrote:
3) You want to clamp the value to a reasonable range and continue.
The reasonable values need to be looked up somewhere.
This won’t work. The values outside the range are by definition non-
representable, so comparisons against them are useless.
When a range is 0..10 both -1 and 11 are representable in the arithmetic
of ALL computers, just not in the language specifying the range.
For me error detection of all kinds is useful. It just happens
to not be conveniently supported in C so no one tries it in C.
GCC's -trapv option is not useful for a variety of reasons.
1) its slow, about 50% performance hit
2) its always on for a compilation unit which is not what programmers need
as it triggers for many false positives so people turn it off.
I've always paid for mine. My first C compiler came with the WinNT 3.5
beta in 1992 for $99 and came with the development kit,
editor, source code debugger, tools, documentation.
A few hundred bucks is not going to hurt my business.
In article <2024Oct3.085754@mips.complang.tuwien.ac.at>, >anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
If the RISC companies failed to keep up, they only have themselves to
blame. It seems to me that a number of RISC companies had difficulties
with managing the larger projects that the growing die areas allowed.
Another contributing factor was Itanium, which was quite successful at >disrupting the development cycles of the RISC architectures.
Alpha suffered from DEC's mis-management, which led to DEC being taken
over by Compaq. They killed Alpha when Itanium first became to work, and >before it was clear that it was a turkey.
PA-RISC was intended by HP to be replaced by Itanium. They managed that,
but their success was limited because Linux on x86-64 was so much more >cost-effective.
IBM kept POWER development going through the Itanium period, which is a >significant reason why it's still going.
SGI went into Itanium hard and neglected MIPS development, which never >recovered. It had been losing in the performance race anyway.
Sun kept SPARC development going, but made a different mistake, by
spreading their development resources over too many projects. The ones
that succeeded did so too slowly, and they fell behind.
Also, Linux ate
their web-infrastructure market rather quickly.
Linux could not have had the success it did without the large range of >powerful and cheap hardware designed to run Windows.
Alpha suffered before. The 21264 was late, and did not keep up in the
clock race.
On Thu, 3 Oct 2024 23:49 +0100 (BST), John Dallman wrote:
Given all of IBM's missteps, it's mildly surprising they got that
one right. Even a stopped clock is right once a day ...
SGI decided to embrace the platform that was eating their market,
and try to sell Windows NT boxes. Trouble is, those NT boxes, while
only a fraction of the cost of an IRIX-based product, still cost
about 3╫ what other NT machines were going for.
They could still have sold SPARC hardware running Linux. I can
remember comments saying Linux ran better on that hardware than
Sun's own SunOS/Solaris did.
In article <vdnef0$3uaeh$5@dont-email.me>, ldo@nz.invalid (Lawrence >D'Oliveiro) wrote:
On Thu, 3 Oct 2024 23:49 +0100 (BST), John Dallman wrote:
Then there were the SGI Visual Workstations, which ran NT on x86. The
first generation of them were quite nice, but needed a very custom HAL,
and hence couldn't be upgraded to later versions of Windows once SGI >abandoned them.
By this time, SGI had a department of downsizing, whose job was to get
rid of departments and sites. Being an American company, this department >fought for power and budget share, and nobody inside the company seemed
to think that this would spell doom for SGI.
They [SGI ed.] could still have sold SPARC hardware running Linux. I can
remember comments saying Linux ran better on that hardware than
Sun's own SunOS/Solaris did.
They would not have faced up to that.
Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
Alpha suffered before. The 21264 was late, and did not keep up in the
clock race.
https://www.star.bnl.gov/public/daq/HARDWARE/21264_data_sheet.pdf
gives the clock rate as varying between 466 and 600 MHz, and
Wikipedia gives the clock frequency of the Pentium Pro as between
150 and 200 MHz. The Pentium II Overdrive, according to Wikipedia,
had up to 333 MHz.
Is this information wrong?
George Neuner <gneuner2@comcast.net> writes:<snipping>
I don't agree with all of that, however. E.g., when discussing a VAX instruction similar to IA-32's REP MOVS, he considers it to be a big advantage that the operands of REP MOVS are in registers. That
appears wrong to me; you either have to keep REP MOVS in decoding (and
thus stop decoding any later instructions) until you know the value of
that register coming out of the OoO engine, making REP MOVS a mostly serializing instruction. Or you have a separate OoO logic for REP
MOVS that keeps generating loads and stores inside the OoO engine. If
you have the latter in the VAX, it does not make much difference if
the operand is on a register or memory. The possibility of trapping
during REP MOVS (or the VAX variant) complicates things, though: the
first part of the REP MOVS has to be committed, and the registers
written to the architectural state, and then execution has to start
again with the REP MOVS. Does not seem much harder on the VAX to me, however.
- anton
On 10/3/2024 11:36 PM, Chris M. Thomasson wrote:
On 10/3/2024 9:23 PM, George Neuner wrote:
On Fri, 4 Oct 2024 00:48:43 -0000 (UTC), Lawrence D'Oliveiro
<ldo@nz.invalid> wrote:
On Thu, 03 Oct 2024 06:57:54 GMT, Anton Ertl wrote:
If the RISC companies failed to keep up, they only have themselves to >>>>> blame.
That’s all past history, anyway. RISC very much rules today, and it
is x86
that is struggling to keep up.
You are, of course, aware that the complex "x86" instruction set is an
illusion and that the hardware essentially has been a load-store RISC
with a complex decoder on the front end since the Pentium Pro landed
in 1995.
Yeah. Wrt memory barriers, one is allowed to release a spinlock on "x86"
with a simple store.
The fact that one can release a spinlock using a simple store means that
its basically load-acquire release-store.
So a load will do a load then have an implied acquire barrier.
A store will do an implied release barrier then perform the store.
This release behavior is okay for releasing a spinlock with a simple
store, MOV.
On Fri, 4 Oct 2024 7:05:34 +0000, Anton Ertl wrote:
George Neuner <gneuner2@comcast.net> writes:<snipping>
My 66000 has a MemMove instruction consisting of a 1 word instruction,
that leaves DECODE and enters into one MEMory unit, where it proceeds
to AGEN and Read, AGEN and Write, leaving the rest of the function
units proceeding to whatever is next.
One thing I did different, here, none of the 3 registers is modified,
yet I retain the ability to take exception and re-play the instruction
from where it left off {in state never visible to the instruction
stream except via DECODE stage.}
George Neuner <gneuner2@comcast.net> writes:
You are, of course, aware that the complex "x86" instruction set is an >>illusion and that the hardware essentially has been a load-store RISC
with a complex decoder on the front end since the Pentium Pro landed
in 1995.
Repeating nonsense does not make it any truer, and this nonsense has
been repeated since at least the Pentium Pro (1995), maybe already
since the 486 (1989). CISC and RISC are about the instruction set,
not about the implementation. And even if you look at the
implementation, it's not true: The P6 has microinstructions that are
~100 bits long, whereas RISCs have 32-bit and 16-bit instructions.
The K7 has load-store microinstructions; RISCs don't have that.
In more recent CPUs, AMD tends to work with macro-instructions between
the decoder and the reorder buffer (i.e., in the part that in the
Pentium Pro may have been used as the justification for the RISC
claim); these macro instructions are load-and-op and read-modify-write >instructions.
John Mashey has written about the difference between CISC and RISC
repeatedly <https://homepages.cwi.nl/%7Erobertl/mash/RISCvsCISC>, and
he gives good criteria for classifying instruction sets as RISC or
CISC, and by his criteria the 80286 and IA-32 instruction sets of the
Pentium Pro clearly both are CISCs. I have recently ><2024Jan12.145502@mips.complang.tuwien.ac.at> used his criteria on >instruction sets that Mashey did not classify (mostly because they
were done after his table), and by these criteria AMD64 is clearly a
CISC, while ARM A64 and RISC-V are clearly RISCs.
In searching for whether he has written something specific about
IA-32, I found <https://yarchive.net/comp/vax.html>, which is an
earlier instance of the recent discussion of whether it would have
been better for DEC to stick with VAX, do an OoO implementation and
extend the architecture to 64 bits, like Intel has done: ><https://yarchive.net/comp/vax.html>. He also discusses the problems
of IA-32 there, but mainly in pointing out how much smaller they were
than the VAX ones.
I don't agree with all of that, however. E.g., when discussing a VAX >instruction similar to IA-32's REP MOVS, he considers it to be a big >advantage that the operands of REP MOVS are in registers. That
appears wrong to me; you either have to keep REP MOVS in decoding (and
thus stop decoding any later instructions) until you know the value of
that register coming out of the OoO engine, making REP MOVS a mostly >serializing instruction. Or you have a separate OoO logic for REP
MOVS that keeps generating loads and stores inside the OoO engine. If
you have the latter in the VAX, it does not make much difference if
the operand is on a register or memory. The possibility of trapping
during REP MOVS (or the VAX variant) complicates things, though: the
first part of the REP MOVS has to be committed, and the registers
written to the architectural state, and then execution has to start
again with the REP MOVS. Does not seem much harder on the VAX to me, >however.
- anton
On Fri, 04 Oct 2024 07:05:34 GMT, anton@mips.complang.tuwien.ac.at
(Anton Ertl) wrote:
George Neuner <gneuner2@comcast.net> writes:
You are, of course, aware that the complex "x86" instruction set is an >>>illusion and that the hardware essentially has been a load-store RISC >>>with a complex decoder on the front end since the Pentium Pro landed
in 1995.
Repeating nonsense does not make it any truer, and this nonsense has
been repeated since at least the Pentium Pro (1995), maybe already
since the 486 (1989). CISC and RISC are about the instruction set,
not about the implementation. And even if you look at the
implementation, it's not true: The P6 has microinstructions that are
~100 bits long, whereas RISCs have 32-bit and 16-bit instructions.
The K7 has load-store microinstructions; RISCs don't have that.
Anton, you know very well that the hardware does not execute the "x86" >instruction set but only /emulates/ it. The decoder translates x86 >instructions into sequences of microinstructions that perform the
equivalent operations. The fact that some simple instructions
translate one to one does not change this.
On Mon, 7 Oct 2024 22:26:58 +0300, Michael S wrote:
On Mon, 7 Oct 2024 17:38:54 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:
ARM was rather late to the RISC game, this might have been literally
true.
ARM was rather early to the RISC game. Shipped for profit since late
1986.
Shipped in an actual PC, the Acorn Archimedes range.
That was the first time I ever saw a 3D shaded rendition of a flag waving,
on a computer, generated in real time. No other machine could do it,
unless you got up to the really expensive Unix workstation class (e.g.
SGI, custom Evans & Sutherland hardware etc).
Maybe all add/sub/etc opcodes that are immediately followed by an INTO=20 >could be fused into a single ADDO/SUBO/etc version that takes zero extra =
cycles as long as the trap part isn't hit?
But then, risc processors mostly, started using exceptions for housekeeping
- SPARC for register window sliding, Alpha for byte, word and misaligned >memory access
The solution for Alpha was to add back the byte and word instructions,
and add misaligned access support to all memory ops.
Kent Dickey wrote:[...]
GCC's -trapv option is not useful for a variety of reasons....
1) its slow, about 50% performance hit
2) its always on for a compilation unit which is not what programmers need
as it triggers for many false positives so people turn it off.
So why should any hardware include an instruction to trap-on-overflow?
Because ALL the negative speed and code size consequences do not occur.