• Re: OLTP Transaction

    From Derek Ignatius Asirvadem@derek.asirvadem@gmail.com to comp.databases.theory on Tue Jun 22 17:11:58 2021
    From Newsgroup: comp.databases.theory

    Nicola
    On Wednesday, 23 June 2021 at 09:52:25 UTC+10, Derek Ignatius Asirvadem wrote:
    On Tuesday, 22 June 2021 at 20:35:05 UTC+10, Nicola wrote:
    On 2021-06-19, Derek Ignatius Asirvadem wrote:

    ______
    Client
    There is ordinary interplay between the Client and the Server. The client may now be a GUI or a webpage or middleware in an app server.

    The VALIDATE block [1] is repeated in the client.

    I assume that this means validating data without interacting with the database, e.g., to check that the input data is within valid ranges,
    etc.

    No.
    Any GUI or input system has to do those checks on input fields, that is fundamental, not usually mentioned.

    LetrCOs say there are 20 WIDGETS currently in the inventory:
    __ why would you allow an user to purchase 6 WIJJETS [that do not exist] ??? __ Why would you allow an user to purchase 100 WIDGETS [that are not in stock] ???
    What does normal GUI interaction mean to you ? Why does it exclude database interaction ? Why would you NOT restrict the user to choosing only those Parts that do exist in the database, and to a purchase quantity that is currently in stock ? Otherwise the GUI is insane, the user interaction is insane.
    Cheers
    Derek
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Nicola@nicola@nohost.org to comp.databases.theory on Wed Jun 23 21:59:43 2021
    From Newsgroup: comp.databases.theory

    On 2021-06-22, Derek Ignatius Asirvadem <derek.asirvadem@gmail.com> wrote:
    OTOH, if you are concerned about data in the database changing between
    the point in the VALIDATE block that checks a particular row, and the
    point the EXECUTE block that updates the relevant row, then yes, that
    is not catered for in the above description. However, that [and one
    other condition) is covered by Optimistic Locking.

    Yes, that is what I meant.

    OLAP Reports
    Why do you worry about locks in OLAP reports ?
    Who cares.
    Read locks are extremely short, and transient (held for the duration
    of the read operation [Statement], not the Transaction). We canrCOt control Read
    locks, even implicitly

    Are you saying that OLAP queries run in a lower isolation level (say,
    READ COMMITTED)?

    Of course. That is what that ISOLATION LEVEL[ READ COMMITTED ] is there for.

    Not just OLAP queries, but all SELECTs outside a Transaction. That [ISOLATION LEVEL] is an ANSI SQL requirement, within the ACID
    requirement. We exclude [wait for completion of] uncommitted changes,
    but we do not need SERIALISED, in fact we want massively parallel, and
    we want the whole result set to be integral within itself (all
    internal dependencies intact and true).

    What does "integral within itself" mean? Transactions at READ COMMITTED,
    even if they are read-only, suffer from some anomalies.

    For understanding. Many systems in the MS world do their OLAP queries
    with READ UN-COMMITTED, to avoid Read Locks, but that means (a) the
    result set has no integrity, and (b) contains uncommitted changes
    which may disappear.

    Correct. But queries at READ COMMITTED may view (committed) changes that
    were not there when the transaction started. Do you accept that query
    results may be off sometimes, or do you have some recipe to make them
    avoid inconsistent reads and phantoms?

    For the MVCC world, they have no concept of READ
    COMMITTED/UNCOMMITTED, because they have only the false privatised
    version of the entire database, firmly secured between their big toes,
    which has only nominally rCLcommittedrCY changes, that has no relation to
    the version of other users. That nominally rCLcommittedrCY data blows up
    at COMMIT TRAN, just check the PooGres manuals re all the problems
    that are not supposed to happen in the Stonebraker fantasy, that
    happen, even to people who dutifully sing the mantra.

    According to the manual (https://www.postgresql.org/docs/current/transaction-iso.html), in
    PostgreSQL each isolation level correctly prevents the anomalies that
    are not allowed by the standard to occur at that level, and in some
    cases it prevents more (e.g., READ UNCOMMITTED is like READ COMMITTED).
    What are you referring to when you mention "all the problems that are
    not supposed to happen... that happen"?

    But still, that is not Optimistic Locking. And in this post, there is
    nothing about Optimistic Locking.

    Agreed.

    Add Optimistic Locking, and the methods given in this thread would be complete.

    Ok.

    Nicola
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Derek Ignatius Asirvadem@derek.asirvadem@gmail.com to comp.databases.theory on Thu Jun 24 00:38:26 2021
    From Newsgroup: comp.databases.theory

    Nicola
    On Thursday, 24 June 2021 at 07:59:50 UTC+10, Nicola wrote:
    On 2021-06-22, Derek Ignatius Asirvadem wrote:
    Sorry. Before I respond to the content, I need one clarification.
    For understanding. Many systems in the MS world do their OLAP queries
    with READ UN-COMMITTED, to avoid Read Locks, but that means (a) the
    result set has no integrity, and (b) contains uncommitted changes
    which may disappear.

    Correct. But queries at READ COMMITTED may view (committed) changes that were not there when the transaction started.
    What rCLtransaction startedrCY ??? No transaction is involved.
    The report connection would normally do SELECT [at] READ_COMMITTED which holds ReadLocks for the duration of the Statement, but in this abnormal case does SELECT [at] READ_UNCOMMITTED, holding no locks at all, and reading whatever is there on the disk, including uncommitted changes.
    You might be stuck in the mindset that in order to read at { READ_UNCOMMITTED | READ_COMMITTED, } you must open a Transaction. No, that is the MV-non-CC insanity, the Stonebraker fantasy. No, that is not ACID. No, we do not do that.
    As per ACID, the real ACID, not the perverted form that MV-non-CC pretends, we only need to open a Transaction for { REPEATABLE_READ | SERIALISABLE }.
    (I have to stretch, and be very careful, when I am writing these things ... from my everyday position of never having those problems; that insanity; that mindset. The insanity is foreign to me, but I am aware of it. Otherwise I would make serious errors in understanding, and communication would be difficult (correcting simple errors). Please understand, you have to do the converse: from your everyday MV-non-CC insanity and fantasy, you need to think carefully about the Real ACID; Real Ordinary Locking; Real Transactions. Not the Straw Men, which are burning slowly.)
    ----------
    Separately. What rCLrows that were not there [when the transaction started]rCY ??? How can a transaction change a row that is not there ??? Or even have knowledge about a row that is not there ???
    Cheers
    Derek
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Nicola@nicola@nohost.org to comp.databases.theory on Thu Jun 24 09:32:57 2021
    From Newsgroup: comp.databases.theory

    On 2021-06-24, Derek Ignatius Asirvadem <derek.asirvadem@gmail.com> wrote:
    Nicola

    On Thursday, 24 June 2021 at 07:59:50 UTC+10, Nicola wrote:
    On 2021-06-22, Derek Ignatius Asirvadem wrote:

    Sorry. Before I respond to the content, I need one clarification.

    For understanding. Many systems in the MS world do their OLAP queries
    with READ UN-COMMITTED, to avoid Read Locks, but that means (a) the
    result set has no integrity, and (b) contains uncommitted changes
    which may disappear.

    Correct. But queries at READ COMMITTED may view (committed) changes that
    were not there when the transaction started.

    What rCLtransaction startedrCY ??? No transaction is involved.

    Ok, now that is clear.

    The report connection would normally do SELECT [at] READ_COMMITTED
    which holds ReadLocks for the duration of the Statement,

    Ok. So, I am back to a question I have already posed to you:

    - OLAP queries run at READ COMMITTED;
    - Each OLAP query is a single SELECT statement;
    - Each SELECT holds read locks for the duration of the statement.

    But the duration of the statement may be significant (minutes, hours,
    ...) Don't such queries create contention (other transactions cannot
    update the data read by the OLAP query until the query is over)?

    Separately. What rCLrows that were not there [when the transaction started]rCY ??? How can a transaction change a row that is not there
    ???

    Of course it can't.

    Or even have knowledge about a row that is not there ???

    Of course it can't. But it can see something that was not there before,
    and it appears at a later time. I explain what I mean with an example:

    Time Transaction Operation
    0 T1,T2 start transaction;
    1 T1 select sum(X) from R;
    2 T2 insert into R(X) values (100);
    3 T2 commit;
    4 T1 select sum(X) from R;

    At READ COMMITTED, T1 at time 4 will see the record inserted by T2 at
    time 2, even if the record did non exist when T1 began. The SELECT at
    time 1 and time 4 would then return different results.

    Of course, this is not an issue if the only queries you run at READ
    COMMITTED are single statementsrCoprovided that the system guarantees statement-level consistency (it seems that some don't: https://stackoverflow.com/questions/4772475/read-committed-isolation-level-in-sql-server-for-a-single-statement).

    Nicola
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Daniel Loth@daniel.j.loth@gmail.com to comp.databases.theory on Thu Jun 24 04:10:20 2021
    From Newsgroup: comp.databases.theory

    On Thursday, June 24, 2021 at 7:33:00 PM UTC+10, Nicola wrote:
    On 2021-06-24, Derek Ignatius Asirvadem <derek.a...@gmail.com> wrote:
    Nicola

    On Thursday, 24 June 2021 at 07:59:50 UTC+10, Nicola wrote:
    On 2021-06-22, Derek Ignatius Asirvadem wrote:

    Sorry. Before I respond to the content, I need one clarification.

    For understanding. Many systems in the MS world do their OLAP queries >> > with READ UN-COMMITTED, to avoid Read Locks, but that means (a) the
    result set has no integrity, and (b) contains uncommitted changes
    which may disappear.

    Correct. But queries at READ COMMITTED may view (committed) changes that >> were not there when the transaction started.

    What rCLtransaction startedrCY ??? No transaction is involved.
    Ok, now that is clear.
    The report connection would normally do SELECT [at] READ_COMMITTED
    which holds ReadLocks for the duration of the Statement,
    Ok. So, I am back to a question I have already posed to you:

    - OLAP queries run at READ COMMITTED;
    - Each OLAP query is a single SELECT statement;
    - Each SELECT holds read locks for the duration of the statement.

    But the duration of the statement may be significant (minutes, hours,
    ...) Don't such queries create contention (other transactions cannot
    update the data read by the OLAP query until the query is over)?
    Separately. What rCLrows that were not there [when the transaction started]rCY ??? How can a transaction change a row that is not there
    ???
    Of course it can't.
    Or even have knowledge about a row that is not there ???
    Of course it can't. But it can see something that was not there before,
    and it appears at a later time. I explain what I mean with an example:

    Time Transaction Operation
    0 T1,T2 start transaction;
    1 T1 select sum(X) from R;
    2 T2 insert into R(X) values (100);
    3 T2 commit;
    4 T1 select sum(X) from R;

    At READ COMMITTED, T1 at time 4 will see the record inserted by T2 at
    time 2, even if the record did non exist when T1 began. The SELECT at
    time 1 and time 4 would then return different results.

    Of course, this is not an issue if the only queries you run at READ COMMITTED are single statementsrCoprovided that the system guarantees statement-level consistency (it seems that some don't: https://stackoverflow.com/questions/4772475/read-committed-isolation-level-in-sql-server-for-a-single-statement).

    Nicola
    Hi Nicola,
    If you're running a query in the READ COMMITTED isolation level then a shared lock is acquired for the duration of the read operation on whatever resource the lock protects. That resource may be a row or a page.
    If you have acquired a row-level lock then the lock will be released once reading the row has finished.
    If you have acquired a page lock then the lock will be released once reading the page has finished.
    A query running for hours under READ COMMITTED would be locking and unlocking rows or pages as it encounters and processes them.
    As for your second part concerning the sequence of events described, you are quite right. This is precisely the phenomenon that the REPEATABLE READ isolation protects against.
    When you run a query with REPEATABLE READ isolation, shared locks are acquired and held for rows / pages until the transaction is committed or rolled back.
    So if you then tried to modify data in those rows or pages, which involves acquiring an exclusive lock for those rows or pages, it would block - the exclusive lock being incompatible with the shared lock - thus allowing you to repeat the REPEATABLE READ and get the same result.
    However, it's important to note that REPEATABLE READ isolation level does not acquire range locks (or predicate locks). This means that you can conceivably insert a new record that satisfies your REPEATABLE READ query's WHERE clause predicate.
    For example, you could insert a new record that satisfies the REPEATABLE READ query predicate but ends up stored on a page that is not subject to a shared lock (the page might not have existed when you first ran the REPEATABLE READ query).
    This is a phantom record, which SERIALIZABLE guards against using the aforementioned range (predicate) locks.
    It's important to recognise that your REPEATABLE READ query - or any query for that matter - is executing in the context of a broader, shared, system. Seeing the outcome of newly committed transactions makes sense if you've chosen READ COMMITTED and understand the concurrency phenomenon that it protects against.
    Importantly, and often overlooked as far as MVCC disadvantages go, the data you read in an hours-long query running under MVCC is arbitrarily old data. If your query takes 2 hours to complete then the very last row processed by your query is potentially 2 hours out of date.
    Daniel
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Nicola@nicola@nohost.org to comp.databases.theory on Fri Jun 25 20:25:25 2021
    From Newsgroup: comp.databases.theory

    On 2021-06-24, Daniel Loth <daniel.j.loth@gmail.com> wrote:
    Hi Nicola,

    Hi Daniel,
    great to see someone else diving in! I hope you will last :)

    Thanks for your comments. In particular:

    Importantly, and often overlooked as far as MVCC disadvantages go, the
    data you read in an hours-long query running under MVCC is arbitrarily
    old data. If your query takes 2 hours to complete then the very last
    row processed by your query is potentially 2 hours out of date.

    Sure, that's a fair criticism of MVCC.

    But if Derek can keep all his queries under 12 seconds with an old (as
    in "mature") 2PC system, don't you think that a modern MVCC system can
    achieve the same or better performance under the same load?

    Nicola
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Nicola@nicola@nohost.org to comp.databases.theory on Fri Jun 25 21:41:16 2021
    From Newsgroup: comp.databases.theory

    On 2021-06-25, Nicola <nicola@nohost.org> wrote:
    I have just repeated my experiment. The above query takes ~2.4s with
    cold data and ~800ms with hot data on my laptop.

    There's a bias in my test, because, the way it is generated, the
    transaction data is physically sorted by CustomerID. After correcting
    the bias, I'm still around 3s anyway for "Subquery SUM", with no
    physical optimization. This query performs better:

    https://www.softwaregems.com.au/Documents/Student%20Resolutions/Tony%20Andrews/Inline%20SUM%20ShowPlan.txt

    taking ~1.2s.

    To insert data in random order:

    insert into CustomerTransaction(CustomerId, SequenceNo, Date, TransactionCode, Amount)
    with T as (
    select CustomerId,
    generate_series(1,2000) as SequenceNo,
    (now() - '1 day'::interval * round(random() * 3650))::date as Date,
    case when random() < 0.5 then 'C' else 'D' end as TransactionCode,
    random() * 10000 as Amount
    from REF_Customer
    )
    select * from T order by random();

    Nicola

    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Derek Ignatius Asirvadem@derek.asirvadem@gmail.com to comp.databases.theory on Fri Jun 25 21:10:00 2021
    From Newsgroup: comp.databases.theory

    On Saturday, 26 June 2021 at 06:25:28 UTC+10, Nicola wrote:
    On 2021-06-24, Daniel Loth wrote:

    Hi Daniel,
    great to see someone else diving in! I hope you will last :)
    Yes !
    But if Derek can keep all his queries under 12 seconds with an old (as
    in "mature") 2PC system
    Sorry if I wasn't clear. Almost all my SELECTs, complex or not, execute sub-second. Almost all even at customer sites. The problem is, on their production system, even though I am hired to reconfigure it, I rarely get the power to change everything that requires correction, there is always some area (eg. SAN config) or some config parm (eg. auditing set up) that I cannot change. Thus I have a few SELECTs on customer systems, that take more than 1 sec, which I remember them well, the worst one being 12 secs.
    2PC
    No.
    Not even "1PL"; "2PL"; "C2PL"; "S2PL"; "SS2PL", which are Straw Man understandings of Ordinary Locking.
    I am trying to give you the real deal on Ordinary Locking, to remove you from the filth that the MV-non-CC mob declare it to be.
    Cheers
    Derek
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Daniel Loth@daniel.j.loth@gmail.com to comp.databases.theory on Sat Jun 26 01:16:30 2021
    From Newsgroup: comp.databases.theory

    Sorry, just a follow-up concerning this remark:

    If an update causes a page split in the clustered index, resulting in rows being moved to new pages, then this will necessitate updating all non-clustered indexes (because non-clustered indexes contain the clustered index key).

    In fact, this is likely not true even in products using MVCC such as Postgres.

    Cheers,
    Daniel
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Nicola@nicola@nohost.org to comp.databases.theory on Sat Jun 26 10:56:35 2021
    From Newsgroup: comp.databases.theory

    On 2021-06-26, Derek Ignatius Asirvadem <derek.asirvadem@gmail.com> wrote:
    Trivially, as MVCC does not use locks.

    Are you agreeing that MV-non-CC cannot support SQL/ACID ? Finally, hallelujah !!!

    I am not sure that by ACID compliance we mean the same thing. You seem
    to imply more than what the acronym stands for (or I am taking a narrow interpretation of it). But I am not excluding that you may be right.
    Hopefully, we will be able to clarify this.

    I have just repeated my experiment. The above query takes ~2.4s with
    cold data and ~800ms with hot data on my laptop. Ok, that's flash
    storage and a laptop one decade newer than the one in your document. But
    it's stock PostgreSQL with default parameters (which are extremely
    conservative), with no physical tuning applied.

    Excellent. The rCyhotrCO figure is the one we want (test is logical not physical I/O).

    Ok.

    Now if we bring it back to the context of locking vs MV-non-CC, run:
    __ 10 x OLTP Transactions doing UPDATE CustomerTransaction randomly, and
    __ 5 of those SELECTs
    __ concurrently.

    Time permitting, I'll try.

    My questions are for
    better understanding on my part, not to tell you that you should run
    things differently.

    But that is not what I meant, what >>I<< did is not relevant. Given
    your [then] concern re Read Locks, and such being held for some
    duration, and such contending with OLTP Transactions, you may be
    interested in what that actually looks like, in terms of metrics in
    a production server.

    With the additional context you have provided, I'm fine.

    To proceed with this thread then:
    __ generally an understanding of OLTP, that is the LARGE SHARED DATA
    BANK context, rather than mere speed in a single-user context,
    __ hopefully resolve my declaration that PissGriss does not support ACID
    __ (ie. any MV-non-CC system cannot support ACID)
    ____ and therefore cannot provide OLTP
    __ the two items that remain from my side:
    ___1 there are two major occurrences in OLTP that have not been
    mentioned yet, that must be covered before we can consider this
    subject complete (MV-non-CC types are totally ignorant about this)
    ___2 Optimistic Locking. We are dancing around it, without confirming
    its absolute need in OLTP.

    Although you have no further questions re the Batch Transaction, that understanding is not complete, it can be completed only when the above outstanding elements are completed.

    Good. Hopefully, we'll get to paint the full picture.

    Nicola
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Daniel Loth@daniel.j.loth@gmail.com to comp.databases.theory on Sat Jun 26 07:50:08 2021
    From Newsgroup: comp.databases.theory

    Hi Nicola,
    Can you elaborate on that? Do you mean that MVCC works well only when
    your goal is to micro-optimize, but it is not suitable to obtain
    system-wide good performance?
    Yes, happy to elaborate by way of anecdote.
    Joe Developer discovers that he can examine a query plan in a tool like SQL Server Management Studio. He compares two queries:
    select FirstName, MiddleName, FamilyName
    from Person
    where PersonId = 1;
    and
    select p.FirstName, pmn.MiddleName, p.FamilyName
    from Person p
    left join PersonMiddleName pmn
    on p.PersonId = pmn.PersonId
    where p.PersonId = 1;
    And rightfully concludes that the former is faster. The former query produces a plan comprising a single clustered index seek. The latter comprises two clustered index seeks, so twice the cost.
    Though note that twice the cost, in this example, really means 1 logical read for the former and 2 logical reads for the latter. So twice as expensive, but really that extra expense is marginal.
    What this developer doesn't consider is that we might only actually need to select the middle name as part of two or three queries, and that these queries that do require the middle name actually comprise a very small proportion of the workload that the DBMS must handle.
    The developer, an individual working on a task, is focused on their own work and very easily takes a myopic view with little thought for next month, let alone next year or the next five years.
    This behaviour is what leads to the aforementioned 70 column table, and the perception that de-normalization leads to higher performance. One person does it, someone else follows their lead, and then it becomes the norm.
    Developers make other poor choices too. All of the time, really. Often due to it being expedient to do so due to perceived or real pressure to deliver.
    For example, they might write code that updates a substantial number of records within a single statement (which has been touched on in recent discussions here).
    They test it locally and it's fine, because they're working with a small set of records locally.
    But in production environments with a substantial volume of data this creates an untenable situation in the context of a database that uses a Lock Manager, because the writers-block-readers situation arises.
    A series of often short-term expedient decisions are taken and death spiralling ensues. Developers inevitably reach a point where they struggle with slow transactions, excessive blocking, deadlocks.
    MVCC is seen as a get-out-of-jail-free card when improper transaction management leads to excessive blocking in a database with a Lock Manager.
    This creates a perverse situation where MVCC is seen to be better because it avoids the untenable: A database that might effectively be single-user-only until the aforementioned writers-block-readers situation subsides and normal service resumes.
    You could say that MVCC provides a safety net. A poorly behaved query won't necessarily bring the system to its knees. At least not for the same reason as a system relying on a Lock Manager, anyway.
    In the Microsoft world, the recommendation nowadays from a number of highly regarded thought-leaders (such as Brent Ozar) is to enable 'snapshot isolation' and 'read committed snapshot isolation' by default (i.e., SQL Server's MVCC).
    Pragmatic DBAs accept that developers are going to make a mess of it, and they recognise that in many companies developers are going to be making database design decisions because those companies reject ideas such as Change Advisory Boards having to approve things. The idea being that CAB is a handbrake on development velocity. And I recognise and accept that a CAB does slow things down in practice (though for the purpose of due diligence).
    In this context I would agree that MVCC can work very well in spite of some of the issues previously discussed. But not because of technological superiority when compared with a RDBMS using a Lock Manager.
    But to circle back to my remarks about utility, I don't think MVCC can physically outperform a system that uses a Lock Manager.
    Reduced utility from resources due to more data pages being required to store the same content (i.e., remnant versions of rows take space).
    This means more pages to cache in RAM and thus more contention for this finite RAM.
    In turn, more cache churn. Page Life Expectancy is lower, so it's more likely that a page won't be cached when it needs to be accessed. So then there's additional IO pressure.
    With those physical realities in mind, I'd expect a database using locking to perform better when implemented properly.
    However, I imagine that if you have a database hosted on a system and that database fits entirely in RAM (and has plenty of headroom) then perhaps they perform comparably.
    Cheers,
    Daniel
    --- Synchronet 3.21d-Linux NewsLink 1.2