• Spotlight database, size, rebuilding?

    From nospam@nospam@de-ster.demon.nl (J. J. Lodder) to uk.comp.sys.mac on Sat Dec 27 22:19:22 2025
    From Newsgroup: uk.comp.sys.mac

    [OSX Sierra, 10.12.6]

    As it happens, because of history, I have three identical big volumes,
    a mother, and two identical daughters,
    both backed up using SuperDuper. [1]
    (but not alwas at the same time)
    They have very different amounts of free space.

    It column view all folders cna be seen to have the same sizes.
    The disks have stopped their rattling, so indexing seems to be complete.
    (and mdsworker is no longer hogging CPU time)

    Turning on invisibility, I can see the sizes of the Spotlight folders.

    On the mother volume, the spotlight database is relatively small,
    and frozen in size. Spotlight never finds anything there,
    so it must be broken. (while still taking up space)
    So I have deleted it.
    On the two daughters the Spotlight databases are much larger,
    but very different in size.
    (but both work, more or less, and find the same things, more or less)

    What is happening here?
    Can a Spotlight database be full with obsolete data
    referring to files that no longer exist?
    Or can they have different efficiencies in storing?
    If so, should the Spotlight database be rebuilt every now and then?
    (but this takes a lot of time and CPU use, so ultimately kWh-s,
    and wears out the disks by doing lots of reading and writing)

    Does Spotlight work better in later OSes?

    Jan

    [1] SuperDuper does not copy the Spotlight database.
    (which would be pointless, because it refers to file locations on disk)
    It saves the databases before doing its smart copy/delete,
    does its copying and deleting, and restores the database when done.
    I ran into this because my SuperDuper ran into 'disk full' problems,
    despite the Finder saying that there should be room enough.

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Bruce@07.013@scorecrow.com to uk.comp.sys.mac on Sun Dec 28 11:42:53 2025
    From Newsgroup: uk.comp.sys.mac

    On 27/12/2025 21:19, J. J. Lodder wrote:
    [OSX Sierra, 10.12.6]

    As it happens, because of history, I have three identical big volumes,
    a mother, and two identical daughters,
    both backed up using SuperDuper. [1]
    (but not alwas at the same time)
    They have very different amounts of free space.

    It column view all folders cna be seen to have the same sizes.
    The disks have stopped their rattling, so indexing seems to be complete.
    (and mdsworker is no longer hogging CPU time)

    Turning on invisibility, I can see the sizes of the Spotlight folders.

    On the mother volume, the spotlight database is relatively small,
    and frozen in size. Spotlight never finds anything there,
    so it must be broken. (while still taking up space)
    So I have deleted it.
    On the two daughters the Spotlight databases are much larger,
    but very different in size.
    (but both work, more or less, and find the same things, more or less)

    What is happening here?
    Can a Spotlight database be full with obsolete data
    referring to files that no longer exist?
    Or can they have different efficiencies in storing?
    If so, should the Spotlight database be rebuilt every now and then?
    (but this takes a lot of time and CPU use, so ultimately kWh-s,
    and wears out the disks by doing lots of reading and writing)

    Does Spotlight work better in later OSes?

    Jan

    [1] SuperDuper does not copy the Spotlight database.
    (which would be pointless, because it refers to file locations on disk)
    It saves the databases before doing its smart copy/delete,
    does its copying and deleting, and restores the database when done.
    I ran into this because my SuperDuper ran into 'disk full' problems,
    despite the Finder saying that there should be room enough.


    If you ever needed to restore a file from these backup disks, would you
    know the filename or would you need Spotlight to find the file first?

    If the former then you could consider using mdutil to disable Spotlight indexing of the backup drives. This would save the space of the index
    and, hopefully, resolve the free space discrepancies that you are seeing between the drives.

    If you do need Spotlight then I can't see why one would be different to
    the other unless indexing had been interrupted in some way. The only way
    to confirm that would be to force a full re-index of each drive, as slow
    and annoying as that would be.

    Regards,
    --
    Bruce Horrocks
    Hampshire, England
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From nospam@nospam@de-ster.demon.nl (J. J. Lodder) to uk.comp.sys.mac on Mon Dec 29 13:12:34 2025
    From Newsgroup: uk.comp.sys.mac

    Bruce <07.013@scorecrow.com> wrote:

    On 27/12/2025 21:19, J. J. Lodder wrote:
    [OSX Sierra, 10.12.6]

    As it happens, because of history, I have three identical big volumes,
    a mother, and two identical daughters,
    both backed up using SuperDuper. [1]
    (but not alwas at the same time)
    They have very different amounts of free space.

    It column view all folders cna be seen to have the same sizes.
    The disks have stopped their rattling, so indexing seems to be complete. (and mdsworker is no longer hogging CPU time)

    Turning on invisibility, I can see the sizes of the Spotlight folders.

    On the mother volume, the spotlight database is relatively small,
    and frozen in size. Spotlight never finds anything there,
    so it must be broken. (while still taking up space)
    So I have deleted it.
    On the two daughters the Spotlight databases are much larger,
    but very different in size.
    (but both work, more or less, and find the same things, more or less)

    What is happening here?
    Can a Spotlight database be full with obsolete data
    referring to files that no longer exist?
    Or can they have different efficiencies in storing?
    If so, should the Spotlight database be rebuilt every now and then?
    (but this takes a lot of time and CPU use, so ultimately kWh-s,
    and wears out the disks by doing lots of reading and writing)

    Does Spotlight work better in later OSes?

    Jan

    [1] SuperDuper does not copy the Spotlight database.
    (which would be pointless, because it refers to file locations on disk)
    It saves the databases before doing its smart copy/delete,
    does its copying and deleting, and restores the database when done.
    I ran into this because my SuperDuper ran into 'disk full' problems, despite the Finder saying that there should be room enough.


    If you ever needed to restore a file from these backup disks, would you
    know the filename or would you need Spotlight to find the file first?

    I use spotlight mostly to find by content.

    If the former then you could consider using mdutil to disable Spotlight indexing of the backup drives.

    I don't have them all connected all the time,
    so I usually search 'On this Mac'.

    This would save the space of the index
    and, hopefully, resolve the free space discrepancies that you are seeing between the drives.

    That is what the developer recommends,
    but I don't want to loose the indexing I have.
    For now I have switched the disk with the largest database
    to be the master.
    This avoids out of space errors.

    If you do need Spotlight then I can't see why one would be different to
    the other unless indexing had been interrupted in some way.

    That was the question. They differ by about a third.

    The only way
    to confirm that would be to force a full re-index of each drive, as slow
    and annoying as that would be.

    I'll do that with the Spotlight I deleted,
    but it may take a week.
    Activity monitor reports 'data read and written'
    as more than the total size of the disk,

    I would like to have a better understanding about
    what Spotlight is actualy doing, while indexing,

    Jan

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Bruce@07.013@scorecrow.com to uk.comp.sys.mac on Mon Dec 29 22:03:39 2025
    From Newsgroup: uk.comp.sys.mac

    On 29/12/2025 12:12, J. J. Lodder wrote:
    Bruce <07.013@scorecrow.com> wrote:

    On 27/12/2025 21:19, J. J. Lodder wrote:
    [OSX Sierra, 10.12.6]

    As it happens, because of history, I have three identical big volumes,
    a mother, and two identical daughters,
    both backed up using SuperDuper. [1]
    (but not alwas at the same time)
    They have very different amounts of free space.

    It column view all folders cna be seen to have the same sizes.
    The disks have stopped their rattling, so indexing seems to be complete. >>> (and mdsworker is no longer hogging CPU time)

    Turning on invisibility, I can see the sizes of the Spotlight folders.

    On the mother volume, the spotlight database is relatively small,
    and frozen in size. Spotlight never finds anything there,
    so it must be broken. (while still taking up space)
    So I have deleted it.
    On the two daughters the Spotlight databases are much larger,
    but very different in size.
    (but both work, more or less, and find the same things, more or less)

    What is happening here?
    Can a Spotlight database be full with obsolete data
    referring to files that no longer exist?
    Or can they have different efficiencies in storing?
    If so, should the Spotlight database be rebuilt every now and then?
    (but this takes a lot of time and CPU use, so ultimately kWh-s,
    and wears out the disks by doing lots of reading and writing)

    Does Spotlight work better in later OSes?

    Jan

    [1] SuperDuper does not copy the Spotlight database.
    (which would be pointless, because it refers to file locations on disk)
    It saves the databases before doing its smart copy/delete,
    does its copying and deleting, and restores the database when done.
    I ran into this because my SuperDuper ran into 'disk full' problems,
    despite the Finder saying that there should be room enough.


    If you ever needed to restore a file from these backup disks, would you
    know the filename or would you need Spotlight to find the file first?

    I use spotlight mostly to find by content.

    Ah, I think I understand what you are doing now: there is a master disk
    which is an external drive and is a disk in its own right, for want of a better term: it is not a backup of another.

    And then you have two separate drives that are backups of the master,
    but taken at different times, e.g. one one day, the other the next day,
    or however frequently you make the backups. It shouldn't really matter.

    This seems like bread and butter for Super Duper, or any backup product,
    tbh.

    Lastly, the master is spotlight indexed because you need it to search by content.

    (Sorry if this seems a bit laborious but it's to help others to better understand and chip in with relevant experience.)


    If the former then you could consider using mdutil to disable Spotlight
    indexing of the backup drives.

    So ignore this - I originally thought you meant all 3 drives were
    backups of a fourth - an internal perhaps.


    I don't have them all connected all the time,
    so I usually search 'On this Mac'.

    This would save the space of the index
    and, hopefully, resolve the free space discrepancies that you are seeing
    between the drives.

    That is what the developer recommends,
    but I don't want to loose the indexing I have.

    As far as I can tell from Super Duper documentation, by default it will
    not copy Spotlight indexes. Instead it assumes you will leave the backup
    drive attached for long-enough after Super Duper has finished doing its
    backup for Spotlight to detect all the changed files and re-index them.

    If it did copy Spotlight databases they would be no use to you (either
    on the backup drive or on a new master following a full restore) as the
    files are referenced by inode and the inode would change from one disk
    to another. I can find nothing to suggest that Super Duper is clever
    enough to re-write the inodes as it copies the Spotlight database.

    However there is an option in Super Duper to copy Spotlight databases -
    so if you have this on then the backup drive will need space for two
    complete Spotlight databases: one copied and one re-generated by Spotlight.

    For now I have switched the disk with the largest database
    to be the master.
    This avoids out of space errors.

    If you do need Spotlight then I can't see why one would be different to
    the other unless indexing had been interrupted in some way.

    That was the question. They differ by about a third.

    The only way
    to confirm that would be to force a full re-index of each drive, as slow
    and annoying as that would be.

    I'll do that with the Spotlight I deleted,
    but it may take a week.
    Activity monitor reports 'data read and written'
    as more than the total size of the disk,

    That would be helpful to know - especially relating to the point about
    Super Duper assuming you will leave the disk attached for long enough
    for Spotlight to complete.

    Which begs the question: how do you know that Spotlight has finished. Unfortunately there's no simple command, afaict.


    I would like to have a better understanding about
    what Spotlight is actualy doing, while indexing,

    Essentially SpotLight records metedata about files, e.g. filenames, data
    and time stamps etc. You can see exactly what by using the mdls command,
    e,g,

    $ mdls ~/Desktop

    And then it keeps information about the content of files, for searching,
    using 'importer' plugins. The role of the plugin is to 'understand' the
    format of various file types e.g. .pages, or .xlsx and translate it to something Spotlight understands. You can get a list of importers using
    the command mdimport.

    There's a more complete explanation with examples here:

    <https://eclecticlight.co/2024/11/29/using-and-troubleshooting-spotlight-in-sequoia-summary/>

    There's a useful diagnostic tree to follow but it mostly boils down to:
    once the index is corrupt or not working then it's too complicated to
    fix by any means other than to start over and re-index.

    Finally, if you scroll down to the heading "Flooding the Zone" here <https://www.shirt-pocket.com/blog/index.php/shadedgrey/C5>

    there's mention of a user with slow Super Duper performance who found
    there were upwards of 6 million files being created by Spotlight
    relating to Photos images. If something similar is happening to you then
    that might explain why your disks are running out of space when they
    should theoretically have enough.

    It doesn't clearly say this in that article but I assume the problem is
    that Spotlight is creating 6m temporary files that are being deleted on
    a rolling basis (otherwise the main disk would run out of space as well)
    but SuperDuper still backs-up a bunch of them that it won't know have
    been deleted until next time it runs.

    Sorry there is nothing terribly concrete here. I think your best course
    of action is to ensure that Super Duper is configured to ignore
    spotlight databases and other temporary files. Then try to determine if Spotlight has completed on the daughter drives before disconnecting.

    Regards,
    --
    Bruce Horrocks
    Hampshire, England
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From nospam@nospam@de-ster.demon.nl (J. J. Lodder) to uk.comp.sys.mac on Sun Jan 4 21:24:50 2026
    From Newsgroup: uk.comp.sys.mac

    Bruce <07.013@scorecrow.com> wrote:

    On 29/12/2025 12:12, J. J. Lodder wrote:
    Bruce <07.013@scorecrow.com> wrote:

    On 27/12/2025 21:19, J. J. Lodder wrote:
    [OSX Sierra, 10.12.6]

    As it happens, because of history, I have three identical big volumes, >>> a mother, and two identical daughters,
    both backed up using SuperDuper. [1]
    (but not alwas at the same time)
    They have very different amounts of free space.

    It column view all folders cna be seen to have the same sizes.
    The disks have stopped their rattling, so indexing seems to be complete. >>> (and mdsworker is no longer hogging CPU time)

    Turning on invisibility, I can see the sizes of the Spotlight folders. >>>
    On the mother volume, the spotlight database is relatively small,
    and frozen in size. Spotlight never finds anything there,
    so it must be broken. (while still taking up space)
    So I have deleted it.
    On the two daughters the Spotlight databases are much larger,
    but very different in size.
    (but both work, more or less, and find the same things, more or less)

    What is happening here?
    Can a Spotlight database be full with obsolete data
    referring to files that no longer exist?
    Or can they have different efficiencies in storing?
    If so, should the Spotlight database be rebuilt every now and then?
    (but this takes a lot of time and CPU use, so ultimately kWh-s,
    and wears out the disks by doing lots of reading and writing)

    Does Spotlight work better in later OSes?

    Jan

    [1] SuperDuper does not copy the Spotlight database.
    (which would be pointless, because it refers to file locations on disk) >>> It saves the databases before doing its smart copy/delete,
    does its copying and deleting, and restores the database when done.
    I ran into this because my SuperDuper ran into 'disk full' problems,
    despite the Finder saying that there should be room enough.


    If you ever needed to restore a file from these backup disks, would you
    know the filename or would you need Spotlight to find the file first?

    I use spotlight mostly to find by content.

    Ah, I think I understand what you are doing now: there is a master disk
    which is an external drive and is a disk in its own right, for want of a better term: it is not a backup of another.

    And then you have two separate drives that are backups of the master,
    but taken at different times, e.g. one one day, the other the next day,
    or however frequently you make the backups. It shouldn't really matter.

    This seems like bread and butter for Super Duper, or any backup product,
    tbh.

    Lastly, the master is spotlight indexed because you need it to search by content.

    (Sorry if this seems a bit laborious but it's to help others to better understand and chip in with relevant experience.)


    If the former then you could consider using mdutil to disable Spotlight
    indexing of the backup drives.

    So ignore this - I originally thought you meant all 3 drives were
    backups of a fourth - an internal perhaps.


    I don't have them all connected all the time,
    so I usually search 'On this Mac'.

    This would save the space of the index
    and, hopefully, resolve the free space discrepancies that you are seeing >> between the drives.

    That is what the developer recommends,
    but I don't want to loose the indexing I have.

    As far as I can tell from Super Duper documentation, by default it will
    not copy Spotlight indexes. Instead it assumes you will leave the backup drive attached for long-enough after Super Duper has finished doing its backup for Spotlight to detect all the changed files and re-index them.

    If it did copy Spotlight databases they would be no use to you (either
    on the backup drive or on a new master following a full restore) as the
    files are referenced by inode and the inode would change from one disk
    to another. I can find nothing to suggest that Super Duper is clever
    enough to re-write the inodes as it copies the Spotlight database.

    However there is an option in Super Duper to copy Spotlight databases -
    so if you have this on then the backup drive will need space for two complete Spotlight databases: one copied and one re-generated by Spotlight.

    For now I have switched the disk with the largest database
    to be the master.
    This avoids out of space errors.

    If you do need Spotlight then I can't see why one would be different to
    the other unless indexing had been interrupted in some way.

    That was the question. They differ by about a third.

    The only way
    to confirm that would be to force a full re-index of each drive, as slow >> and annoying as that would be.

    I'll do that with the Spotlight I deleted,
    but it may take a week.
    Activity monitor reports 'data read and written'
    as more than the total size of the disk,

    That would be helpful to know - especially relating to the point about
    Super Duper assuming you will leave the disk attached for long enough
    for Spotlight to complete.

    Which begs the question: how do you know that Spotlight has finished. Unfortunately there's no simple command, afaict.

    If you are within hearing distance that is rather obvious.
    If not activity Monitor can tell you.

    I would like to have a better understanding about
    what Spotlight is actualy doing, while indexing,

    Essentially SpotLight records metedata about files, e.g. filenames, data
    and time stamps etc. You can see exactly what by using the mdls command,
    e,g,

    $ mdls ~/Desktop

    And then it keeps information about the content of files, for searching, using 'importer' plugins. The role of the plugin is to 'understand' the format of various file types e.g. .pages, or .xlsx and translate it to something Spotlight understands. You can get a list of importers using
    the command mdimport.

    There's a more complete explanation with examples here:

    <https://eclecticlight.co/2024/11/29/using-and-troubleshooting-spotlight-in-se
    quoia-summary/>

    There's a useful diagnostic tree to follow but it mostly boils down to:
    once the index is corrupt or not working then it's too complicated to
    fix by any means other than to start over and re-index.

    Thanks, that gives a lot of food for thought.

    Finally, if you scroll down to the heading "Flooding the Zone" here <https://www.shirt-pocket.com/blog/index.php/shadedgrey/C5>

    Not applicable to my ancient system, I guess.

    there's mention of a user with slow Super Duper performance who found
    there were upwards of 6 million files being created by Spotlight
    relating to Photos images. If something similar is happening to you then
    that might explain why your disks are running out of space when they
    should theoretically have enough.

    It doesn't clearly say this in that article but I assume the problem is
    that Spotlight is creating 6m temporary files that are being deleted on
    a rolling basis (otherwise the main disk would run out of space as well)
    but SuperDuper still backs-up a bunch of them that it won't know have
    been deleted until next time it runs.

    I have indeed noticed this in practice.
    When a lot of indexable data is added to an external disk
    the main system disk may run out of disk space.
    The only remedy is moving stuff off temporarily,
    or better, have lots of free space to begin with. [1]

    Sorry there is nothing terribly concrete here. I think your best course
    of action is to ensure that Super Duper is configured to ignore
    spotlight databases and other temporary files. Then try to determine if Spotlight has completed on the daughter drives before disconnecting.

    One problem is that SuperDuper is bad at handling disk full problems.
    It copies everything first, then deletes wat is no longer needed.
    This leaves the back-up in an undefi with noned state easy escape.
    (exept perhaps using another file synchroniser)

    The problems I ran into turned out to have been caused
    by the daghter volume having a much larger Spotight database
    than the mother volume. (causing disk full problems)

    Jan

    [1] Seeing this the first time it is somewhat frightening.
    'Something' is eating your main disk free space,
    and you have no idea what could be happening.

    --- Synchronet 3.21a-Linux NewsLink 1.2