• SQL Query usage of MAX for Date with a lot of registers

    From Gustavo Bertazzoli@guberta@gmail.com to comp.sys.tandem on Tue Jun 6 22:45:02 2023
    From Newsgroup: comp.sys.tandem

    Hello guys,

    Yes, I'm here again asking for SQL help. :)

    Basically, I have a table with the following fields:
    ID PIC 9(7) DEFAULT SYSTEM NOT NULL
    POINT PIC 9(8) DEFAULT SYSTEM NOT NULL
    TIMESTAMP LARGEINT DEFAULT SYSTEM NOT NULL

    None of them are unique. This means that ID could appear several times for several different points and in different timestamps.

    The result that I need is
    ID,POINT,TIMESTAMP
    But I just want the last one. With last one I mean just the last time that when an ID, reach a point, based on the timestamp, it's possible but none of my queries are really working as I expect.

    Here is one example:
    select
    S1.ID,
    S1.POINT,
    S1.TIMESTAMP
    from
    =information S1
    where
    S1.ID between 7000000 and 8999999
    AND S1.TIMESTAMP = (select MAX(TIMESTAMP) from =information S2
    where
    S2.ID between 7000000 and 8999999 browse access
    )
    browse access;

    With the query above, when I try with only one ID instead of range (in between clause), it works. But with this range, the query is running for more than an hour and there is still no result, so I believe something is not right.

    Can you please help me with that? Or better what I'm doing wrong?

    Thanks a lot in advance and greetings from China!
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Gustavo Bertazzoli@guberta@gmail.com to comp.sys.tandem on Tue Jun 6 22:49:42 2023
    From Newsgroup: comp.sys.tandem

    On Wednesday, 7 June 2023 at 13:45:04 UTC+8, Gustavo Bertazzoli wrote:
    Hello guys,

    Yes, I'm here again asking for SQL help. :)

    Basically, I have a table with the following fields:
    ID PIC 9(7) DEFAULT SYSTEM NOT NULL
    POINT PIC 9(8) DEFAULT SYSTEM NOT NULL
    TIMESTAMP LARGEINT DEFAULT SYSTEM NOT NULL

    None of them are unique. This means that ID could appear several times for several different points and in different timestamps.

    The result that I need is
    ID,POINT,TIMESTAMP
    But I just want the last one. With last one I mean just the last time that when an ID, reach a point, based on the timestamp, it's possible but none of my queries are really working as I expect.

    Here is one example:
    select
    S1.ID,
    S1.POINT,
    S1.TIMESTAMP
    from
    =information S1
    where
    S1.ID between 7000000 and 8999999
    AND S1.TIMESTAMP = (select MAX(TIMESTAMP) from =information S2
    where
    S2.ID between 7000000 and 8999999 browse access
    )
    browse access;

    With the query above, when I try with only one ID instead of range (in between clause), it works. But with this range, the query is running for more than an hour and there is still no result, so I believe something is not right.

    Can you please help me with that? Or better what I'm doing wrong?

    Thanks a lot in advance and greetings from China!

    Just got the result and it's pretty bad. The output is just one line and not all IDs(in the range) with latest timestamp for the last point.

    So, any clue how can I achieve what I need?
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Rob Lesan@rob.lesan@xypro.com to comp.sys.tandem on Wed Jun 7 12:08:37 2023
    From Newsgroup: comp.sys.tandem

    On Wednesday, June 7, 2023 at 1:49:43rC>AM UTC-4, Gustavo Bertazzoli wrote:
    On Wednesday, 7 June 2023 at 13:45:04 UTC+8, Gustavo Bertazzoli wrote:
    Hello guys,

    Yes, I'm here again asking for SQL help. :)

    Basically, I have a table with the following fields:
    ID PIC 9(7) DEFAULT SYSTEM NOT NULL
    POINT PIC 9(8) DEFAULT SYSTEM NOT NULL
    TIMESTAMP LARGEINT DEFAULT SYSTEM NOT NULL

    None of them are unique. This means that ID could appear several times for several different points and in different timestamps.

    The result that I need is
    ID,POINT,TIMESTAMP
    But I just want the last one. With last one I mean just the last time that when an ID, reach a point, based on the timestamp, it's possible but none of my queries are really working as I expect.

    Here is one example:
    select
    S1.ID,
    S1.POINT,
    S1.TIMESTAMP
    from
    =information S1
    where
    S1.ID between 7000000 and 8999999
    AND S1.TIMESTAMP = (select MAX(TIMESTAMP) from =information S2
    where
    S2.ID between 7000000 and 8999999 browse access
    )
    browse access;

    With the query above, when I try with only one ID instead of range (in between clause), it works. But with this range, the query is running for more than an hour and there is still no result, so I believe something is not right.

    Can you please help me with that? Or better what I'm doing wrong?

    Thanks a lot in advance and greetings from China!
    Just got the result and it's pretty bad. The output is just one line and not all IDs(in the range) with latest timestamp for the last point.

    So, any clue how can I achieve what I need?
    How large is the table? When was the last time the statistics where updated? Have you run an explain plain on this query?
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From invalid@invalid@nowhere.com (JShepherd) to comp.sys.tandem on Wed Jun 7 21:30:49 2023
    From Newsgroup: comp.sys.tandem

    In article <64ab14cb-528c-4d9e-8aff-d9600c4b2b17n@googlegroups.com>, guberta@gmail.com says...

    On Wednesday, 7 June 2023 at 13:45:04 UTC+8, Gustavo Bertazzoli wrote:
    Hello guys,

    Yes, I'm here again asking for SQL help. :)

    Basically, I have a table with the following fields:
    ID PIC 9(7) DEFAULT SYSTEM NOT NULL
    POINT PIC 9(8) DEFAULT SYSTEM NOT NULL
    TIMESTAMP LARGEINT DEFAULT SYSTEM NOT NULL

    None of them are unique. This means that ID could appear several times for sev
    eral different points and in different timestamps.

    The result that I need is
    ID,POINT,TIMESTAMP
    But I just want the last one. With last one I mean just the last time that whe
    n an ID, reach a point, based on the timestamp, it's possible but none of my que
    ries are really working as I expect.

    Here is one example:
    select
    S1.ID,
    S1.POINT,
    S1.TIMESTAMP
    from
    =information S1
    where
    S1.ID between 7000000 and 8999999
    AND S1.TIMESTAMP = (select MAX(TIMESTAMP) from =information S2
    where
    S2.ID between 7000000 and 8999999 browse access
    )
    browse access;

    With the query above, when I try with only one ID instead of range (in between
    clause), it works. But with this range, the query is running for more than
    an h
    our and there is still no result, so I believe something is not right.

    Can you please help me with that? Or better what I'm doing wrong?

    Thanks a lot in advance and greetings from China!

    Just got the result and it's pretty bad. The output is just one line and not all
    IDs(in the range) with latest timestamp for the last point.

    So, any clue how can I achieve what I need?


    Are you saying that, given a set of rows like this
    you want only the two starred rows returned ?


    ID POINT TIMESTAMP
    ------- -------- --------------------
    7000000 10 1
    7000000 10 2
    7000000 11 3
    7000000 11 4
    *7000000 11 5
    7000002 2 21
    7000002 3 22
    7000002 4 23
    7000002 5 24
    *7000002 6 25


    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Gustavo Bertazzoli@guberta@gmail.com to comp.sys.tandem on Wed Jun 7 18:01:47 2023
    From Newsgroup: comp.sys.tandem

    On Thursday, 8 June 2023 at 05:30:52 UTC+8, JShepherd wrote:
    In article <64ab14cb-528c-4d9e...@googlegroups.com>,
    gub...@gmail.com says...

    On Wednesday, 7 June 2023 at 13:45:04 UTC+8, Gustavo Bertazzoli wrote:
    Hello guys,

    Yes, I'm here again asking for SQL help. :)

    Basically, I have a table with the following fields:
    ID PIC 9(7) DEFAULT SYSTEM NOT NULL
    POINT PIC 9(8) DEFAULT SYSTEM NOT NULL
    TIMESTAMP LARGEINT DEFAULT SYSTEM NOT NULL

    None of them are unique. This means that ID could appear several times for
    sev
    eral different points and in different timestamps.

    The result that I need is
    ID,POINT,TIMESTAMP
    But I just want the last one. With last one I mean just the last time that
    whe
    n an ID, reach a point, based on the timestamp, it's possible but none of my
    que
    ries are really working as I expect.

    Here is one example:
    select
    S1.ID,
    S1.POINT,
    S1.TIMESTAMP
    from
    =information S1
    where
    S1.ID between 7000000 and 8999999
    AND S1.TIMESTAMP = (select MAX(TIMESTAMP) from =information S2
    where
    S2.ID between 7000000 and 8999999 browse access
    )
    browse access;

    With the query above, when I try with only one ID instead of range (in between
    clause), it works. But with this range, the query is running for more than
    an h
    our and there is still no result, so I believe something is not right.

    Can you please help me with that? Or better what I'm doing wrong?

    Thanks a lot in advance and greetings from China!

    Just got the result and it's pretty bad. The output is just one line and not
    all
    IDs(in the range) with latest timestamp for the last point.

    So, any clue how can I achieve what I need?
    Are you saying that, given a set of rows like this
    you want only the two starred rows returned ?


    ID POINT TIMESTAMP
    ------- -------- --------------------
    7000000 10 1
    7000000 10 2
    7000000 11 3
    7000000 11 4
    *7000000 11 5
    7000002 2 21
    7000002 3 22
    7000002 4 23
    7000002 5 24
    *7000002 6 25


    Hi JShepard,
    This is exactly what I'm looking for, because the 7000000 11 5 have the biggest timestamp and also the 7000002.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Gustavo Bertazzoli@guberta@gmail.com to comp.sys.tandem on Wed Jun 7 19:25:15 2023
    From Newsgroup: comp.sys.tandem

    On Thursday, 8 June 2023 at 03:08:39 UTC+8, Rob Lesan wrote:
    On Wednesday, June 7, 2023 at 1:49:43rC>AM UTC-4, Gustavo Bertazzoli wrote:
    On Wednesday, 7 June 2023 at 13:45:04 UTC+8, Gustavo Bertazzoli wrote:
    Hello guys,

    Yes, I'm here again asking for SQL help. :)

    Basically, I have a table with the following fields:
    ID PIC 9(7) DEFAULT SYSTEM NOT NULL
    POINT PIC 9(8) DEFAULT SYSTEM NOT NULL
    TIMESTAMP LARGEINT DEFAULT SYSTEM NOT NULL

    None of them are unique. This means that ID could appear several times for several different points and in different timestamps.

    The result that I need is
    ID,POINT,TIMESTAMP
    But I just want the last one. With last one I mean just the last time that when an ID, reach a point, based on the timestamp, it's possible but none of my queries are really working as I expect.

    Here is one example:
    select
    S1.ID,
    S1.POINT,
    S1.TIMESTAMP
    from
    =information S1
    where
    S1.ID between 7000000 and 8999999
    AND S1.TIMESTAMP = (select MAX(TIMESTAMP) from =information S2
    where
    S2.ID between 7000000 and 8999999 browse access
    )
    browse access;

    With the query above, when I try with only one ID instead of range (in between clause), it works. But with this range, the query is running for more than an hour and there is still no result, so I believe something is not right.

    Can you please help me with that? Or better what I'm doing wrong?

    Thanks a lot in advance and greetings from China!
    Just got the result and it's pretty bad. The output is just one line and not all IDs(in the range) with latest timestamp for the last point.

    So, any clue how can I achieve what I need?
    How large is the table? When was the last time the statistics where updated?

    Have you run an explain plain on this query?
    Hi Rob,
    The table is pretty big, I believe something around 1 milion registers or even more, in a fee minutes I can check.
    The statistics are updated every 6 months, but itrCOs pretty accurate because the size of the table is always in average the same due to the business process behind.
    Just got the results and actually the table have:
    (EXPR)
    --------------------
    30818846
    Much more then I thought.
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From J G@goodbodyjg@gmail.com to comp.sys.tandem on Thu Jun 8 15:13:20 2023
    From Newsgroup: comp.sys.tandem

    NonStop SQL isn't really my thing (so maybe I'm not the best person to respond!), but does this work at a functional level?:
    select
    S1.ID,
    S1.POINT,
    max(S1.TIMESTAMP)
    from
    =information S1
    where
    S1.ID between 7000000 and 8999999
    group by S1.ID
    browse access;
    I don't have NonStop access to test that, so it's possible NonStop SQL won't like it!
    When it comes to performance clearly you want to avoid a scan of all rows in the table. A composite index of ID and TIMESTAMP may allow the optimizer to skip scan down the index, read just one index entry for each unique ID in the range and then jump across with one read of the main table for each index row read.
    If you don't have an index leading with ID and TIMESTAMP then a full table scan may be unavoidable. Even with it I don't know if the optimizer will actually do a skip scan on NonStop.
    Maybe experiment with a copy of the table with just a few thousand rows and updated stats, look at the query plan and see if any of this is of any help.
    Good luck!
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Gustavo Bertazzoli@guberta@gmail.com to comp.sys.tandem on Fri Jun 9 01:26:53 2023
    From Newsgroup: comp.sys.tandem

    On Friday, 9 June 2023 at 06:13:22 UTC+8, J G wrote:
    NonStop SQL isn't really my thing (so maybe I'm not the best person to respond!), but does this work at a functional level?:

    select
    S1.ID,
    S1.POINT,
    max(S1.TIMESTAMP)
    from
    =information S1
    where
    S1.ID between 7000000 and 8999999
    group by S1.ID
    browse access;

    I don't have NonStop access to test that, so it's possible NonStop SQL won't like it!

    When it comes to performance clearly you want to avoid a scan of all rows in the table. A composite index of ID and TIMESTAMP may allow the optimizer to skip scan down the index, read just one index entry for each unique ID in the range and then jump across with one read of the main table for each index row read.

    If you don't have an index leading with ID and TIMESTAMP then a full table scan may be unavoidable. Even with it I don't know if the optimizer will actually do a skip scan on NonStop.

    Maybe experiment with a copy of the table with just a few thousand rows and updated stats, look at the query plan and see if any of this is of any help.

    Good luck!
    Hello JG,
    Thanks for the suggestions, but unfortunately this idea I already had before and it doesn't return unique IDs. The result is very similar of JShepherd's list.
    Any other suggestions?
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From J G@goodbodyjg@gmail.com to comp.sys.tandem on Fri Jun 9 06:42:16 2023
    From Newsgroup: comp.sys.tandem

    How about:

    select
    S1.ID,
    S1.POINT,
    S1.TIMESTAMP
    from
    =information S1
    INNER JOIN (
    SELECT ID, MAX(TIMESTAMP) TIMESTAMP
    FROM =information
    GROUP BY ID
    ) S2 ON S1.id = S2.id AND S1.TIMESTAMP = S2.TIMESTAMP
    browse access;

    I've run this on a MySQL database (without the DEFINEs and browse access) and with JShepherd's data above and it returns:

    ID POINT TIMESTAMP
    7000000 11 5
    7000002 6 25

    I'm not sure about performance but let's see if we can get the right result set first!
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Roberto Zucchinali@roberto.zucchinali@gmail.com to comp.sys.tandem on Mon Jun 12 16:39:39 2023
    From Newsgroup: comp.sys.tandem

    On Friday, June 9, 2023 at 3:42:18rC>PM UTC+2, J G wrote:
    How about:
    select
    S1.ID,
    S1.POINT,
    S1.TIMESTAMP
    from
    =information S1
    INNER JOIN (
    SELECT ID, MAX(TIMESTAMP) TIMESTAMP
    FROM =information
    GROUP BY ID
    ) S2 ON S1.id = S2.id AND S1.TIMESTAMP = S2.TIMESTAMP
    browse access;

    I've run this on a MySQL database (without the DEFINEs and browse access) and with JShepherd's data above and it returns:

    ID POINT TIMESTAMP
    7000000 11 5
    7000002 6 25

    I'm not sure about performance but let's see if we can get the right result set first!
    Hi Gustavo,
    I changed the WHERE clause in your original subquery
    This is my proposal:
    select
    S1.ID,
    S1.POINT,
    S1.TIMESTAMP
    from
    =information S1
    where
    S1.ID between 7000000 and 8999999
    AND S1.TIMESTAMP = (select MAX(S2.TIMESTAMP) from =information S2
    where
    S2.ID = S1.ID browse access)
    browse access ;
    In order to be able to suggest any changes to optimize the performance of the query, it is mandatory to know the structure of the table (rCLINVOKE =information1rCL from SQLCI) and also, the primary key and the indexes of the table ("FILEINFO =information1, DETAIL" from SQLCI) .

    Other steps that can improve performance include:
    - The reload of the table and/or indexes to reclaim unused space
    - new table and/or index partitions
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Gustavo Bertazzoli@guberta@gmail.com to comp.sys.tandem on Mon Jun 12 19:45:22 2023
    From Newsgroup: comp.sys.tandem

    On Friday, 9 June 2023 at 21:42:18 UTC+8, J G wrote:
    How about:
    select
    S1.ID,
    S1.POINT,
    S1.TIMESTAMP
    from
    =information S1
    INNER JOIN (
    SELECT ID, MAX(TIMESTAMP) TIMESTAMP
    FROM =information
    GROUP BY ID
    ) S2 ON S1.id = S2.id AND S1.TIMESTAMP = S2.TIMESTAMP
    browse access;

    I've run this on a MySQL database (without the DEFINEs and browse access) and with JShepherd's data above and it returns:

    ID POINT TIMESTAMP
    7000000 11 5
    7000002 6 25

    I'm not sure about performance but let's see if we can get the right result set first!

    In SQL/MP it didn't work, I have to change it to adapt a bit. But it took 2 hours and in the end I got an error of too long execution. =[
    --- Synchronet 3.21d-Linux NewsLink 1.2
  • From Gustavo Bertazzoli@guberta@gmail.com to comp.sys.tandem on Mon Jun 12 19:48:06 2023
    From Newsgroup: comp.sys.tandem

    On Tuesday, 13 June 2023 at 07:39:40 UTC+8, Roberto Zucchinali wrote:
    On Friday, June 9, 2023 at 3:42:18rC>PM UTC+2, J G wrote:
    How about:
    select
    S1.ID,
    S1.POINT,
    S1.TIMESTAMP
    from
    =information S1
    INNER JOIN (
    SELECT ID, MAX(TIMESTAMP) TIMESTAMP
    FROM =information
    GROUP BY ID
    ) S2 ON S1.id = S2.id AND S1.TIMESTAMP = S2.TIMESTAMP
    browse access;

    I've run this on a MySQL database (without the DEFINEs and browse access) and with JShepherd's data above and it returns:

    ID POINT TIMESTAMP
    7000000 11 5
    7000002 6 25

    I'm not sure about performance but let's see if we can get the right result set first!
    Hi Gustavo,
    I changed the WHERE clause in your original subquery

    This is my proposal:
    select
    S1.ID,
    S1.POINT,
    S1.TIMESTAMP
    from
    =information S1
    where
    S1.ID between 7000000 and 8999999
    AND S1.TIMESTAMP = (select MAX(S2.TIMESTAMP) from =information S2
    where
    S2.ID = S1.ID browse access)
    browse access ;


    In order to be able to suggest any changes to optimize the performance of the query, it is mandatory to know the structure of the table (rCLINVOKE =information1rCL from SQLCI) and also, the primary key and the indexes of the table ("FILEINFO =information1, DETAIL" from SQLCI) .


    Other steps that can improve performance include:
    - The reload of the table and/or indexes to reclaim unused space
    - new table and/or index partitions
    WOW Roberto! That's perfect! Thanks a lot!
    It worked and actually the result came pretty fast!
    --- Synchronet 3.21d-Linux NewsLink 1.2