Jump to content


Photo

Database tools - what is Clean phantom records?


  • Please log in to reply
9 replies to this topic

#1 zhangrau

zhangrau

    Advanced Member

  • Members
  • PipPipPip
  • 227 posts

Posted 07 December 2012 - 09:29 AM

I just loaded RM6.0.0.3and decided to run the Database Tools on my existing file. Curiously, there is a new tool, which is not explained in the Help file. What's the difference between "Compact database" and the new tool called "Clean phantom records" ??????

Revision: As I worked my way down the list, I found that "Compact database" still issues a "cannot vacuum" error when run directly after "Rebuild indexes." There is a work-around, however. When I remember to do it, switching the focus to a different individual AFTER "Reindex" and BEFORE "Compact" prevents the "cannot vacuum" error.

Revison #2: Here's a thought: Wouldn't it be nice to have a single button to run ALL of the Database tools, and to include a progress bar?

#2 Renee Zamora

Renee Zamora

    Advanced Member

  • Support
  • PipPipPip
  • 3678 posts

Posted 07 December 2012 - 10:31 AM

I just loaded RM6.0.0.3and decided to run the Database Tools on my existing file. Curiously, there is a new tool, which is not explained in the Help file. What's the difference between "Compact database" and the new tool called "Clean phantom records" ??????


"Clean phantom Records" used to be part of "Rebuild Indexes". It will remove "ghost children and spouses" in the database. I will make a note in the tracking system that we need to add this to Help.

Revision: As I worked my way down the list, I found that "Compact database" still issues a "cannot vacuum" error when run directly after "Rebuild indexes." There is a work-around, however. When I remember to do it, switching the focus to a different individual AFTER "Reindex" and BEFORE "Compact" prevents the "cannot vacuum" error.


Have you opened a support ticket and attached your database so we can test why its not compacting the database?

Revison #2: Here's a thought: Wouldn't it be nice to have a single button to run ALL of the Database tools, and to include a progress bar?


Confirming enhancement request is in our tracking system.
Renee
RootsMagic

#3 zhangrau

zhangrau

    Advanced Member

  • Members
  • PipPipPip
  • 227 posts

Posted 07 December 2012 - 10:34 AM

Yes, I have opened a support ticket. Unfortunately, it is a rather large database (around 330K individuals) and would not upload as an attachment to the ticket.

#4 Renee Zamora

Renee Zamora

    Advanced Member

  • Support
  • PipPipPip
  • 3678 posts

Posted 07 December 2012 - 10:38 AM

If you let them know you have that issue you will be given a link to a service to upload the large file to us.
Renee
RootsMagic

#5 TomH

TomH

    Advanced Member

  • Members
  • PipPipPip
  • 3252 posts

Posted 07 December 2012 - 02:53 PM

Revison #2: Here's a thought: Wouldn't it be nice to have a single button to run ALL of the Database tools, and to include a progress bar?

It would for small databases when no one procedure takes very long. But for large databases, separating the functions is very much needed in order to be selective and efficient about the use of time. That said, there does not seem to be any point in bundling the Test Integrity function with anything else - you need to see its report.

I'm happy to see Delete Phantoms moved out of Rebuild Indexes; I think it could readily and logically be integrated with Compact Databases as there is not much point in compacting without having first cleaned out unused records (Phantoms) to free up the space to be vacuumed by SQLite.

I would further like to see Rebuild Indexes split into REINDEX (the pure SQLite function) and UPDATE Birth/Death Years. The latter can get out of sync with user operations on the database and may get slower when Christening or other facts are incorporated as alternate dates when the main one is unavailable. One should not be deterred from using the needed tool because it takes too long.
Tom user of RM6306 FTM2014 Ancestry.ca FamilySearch.org FindMyPast.com
Posted Imagewiki, exploiting the database in special ways >>> Posted Image app, a growing bundle of RootsMagic utilities.

#6 zhangrau

zhangrau

    Advanced Member

  • Members
  • PipPipPip
  • 227 posts

Posted 07 December 2012 - 07:44 PM

Not being an SQL programmer (or user), I always just run all of the tools in the sequence that the buttons appear. I use RM a lot, but that doesn't mean that I understand the intricasies of the programming involved....

As it is, I find that I almost always have time to dosomething else whilewaiting on the Reindex and Compact tools - they take somewhere around 3 to 5minutes to run. If I could run a single tool, knowing that I have 10 minutes of waiting, I could go do something else and let the computer do its thing.

#7 Ludlow Bay

Ludlow Bay

    Advanced Member

  • Members
  • PipPipPip
  • 771 posts

Posted 07 December 2012 - 08:03 PM

Running the database tools on a test RM6 database of +/- 75,000 individuals, stored on a 512GB SSD ----

Test database integrity - 7.3 seconds
Rebuild indexes - 20.7 seconds
Compact database - 11.9 seconds
Clean phantom records - < 1 second

#8 TomH

TomH

    Advanced Member

  • Members
  • PipPipPip
  • 3252 posts

Posted 07 December 2012 - 09:50 PM

That's fast, Ludlow Bay! Thanks to the pricey solid-state drive and, undoubtedly, a powerful computer. Others running RM on an Atom-powered Netbook with 1GB of RAM and a slow hard drive will see far, far longer results. On my low end Win 7 laptop with 4GB and some not-so-hot AMD dual processor (RM only uses one), a 162,000 person database took 2:20 to Rebuild Indexes under 6.0.0.2. Using SQLiteSpy, the SQLite REINDEX on its own ran in about 1:20. If Rebuild Indexes is taken up roughly 50-50 by REINDEX and UPDATE Birth/Death Year and the overall time is measured in minutes, then I'm all for splitting it into the two parts. We often suggest that users Rebuild Indexes to accomplish just one of those tasks because it's the one that could solve some problem. Short durations encourage use; slogging procedures the opposite.

I'm all for efficient procedures, too. I wrote a SQLite query (see Rebuild Indexes and Update Birth and Death Years) that does the UPDATE Birth/Death year in 25 seconds on this big database, probably less than half the time that it takes RM within the Rebuild Indexes tool. I revised it to factor in Christen, Baptism, Burial and Cremation facts as secondary dates to establish the Birth/Death years, which doubled the time to 55 seconds, probably equal to the current RM procedure. It would be nice if RM could incorporate the secondary dates with no increase in its duration. But any increase is all the more reason to break it out.

The Database Tools menu order is not the sequence I would follow. It should be:
  • Test Database Integrity
  • REINDEX SQLite tables
  • Clean Phantoms
  • Compact database
If #1 is OK, there is no need to do #2. #1 says nothing about whether you should do #3 or #4, nor does File > Properties. My RMGC_Properties - Query does provide information that indicates whether there are significant numbers of records in various tables that are needlessly cluttering the database but it is a very heavy, slow query, largely because it also does a Duplicate Name Search. It makes sense to Clean Phantoms before Compacting but, so far, there is no tool to predict whether it is worthwhile compacting. I have seen this large database at 517MB drop to 230MB after compacting - that's worthwhile! In addition to compacting, disk defragmenting should be done from time to time, as the database may get scattered over disparate areas of the drive, thus slowing database operations.

UPDATE Birth/Death year does not necessarily belong in that menu group and could be done at any time, especially following merging ops. Maybe it should be under the Tools menu, not File.

Edit: I should add that I am a heavy user of Rebuild Indexes, specifically for the SQLite REINDEX function, because of the operations I do on my database (and other people's) using external SQLite tools that can or necessarily do disrupt the database table indexes from RM's point of view. When so disrupted, RM's integrity check will report it and all I want to do, as quickly as possible, is to REINDEX - skip the Update Birth/Death years - so that subsequent operations in RM are carried out correctly.

Edited by TomH, 07 December 2012 - 10:04 PM.

Tom user of RM6306 FTM2014 Ancestry.ca FamilySearch.org FindMyPast.com
Posted Imagewiki, exploiting the database in special ways >>> Posted Image app, a growing bundle of RootsMagic utilities.

#9 zhangrau

zhangrau

    Advanced Member

  • Members
  • PipPipPip
  • 227 posts

Posted 07 December 2012 - 09:50 PM

I just finished a session of merging duplicates in a 7-generation familial sequence. Here's current stats, using my cellphone's timer.
Backup -- 147,627 KB
Integrity check -- 0:57 minutes:seconds
Rebuild index -- 2:28
Compact database -- error "cannot vacuum" -- changed individual focus and re-ran
Compact databse -- 1:19
Clean phantoms -- 0:01
Backup -- 147,352 KB
Properties (not all listed):
People -- 332,086
Families -- 103,502
Events -- 642,715
Places -- 45,438
Sources -- 9,880
Citations -- 1,694,368
Repositories -- 1,106
Multimedia items -- 10,833

So the sequence of tasks took less time than I had previously estimated, and the vacuum error continues to occur.

#10 TomH

TomH

    Advanced Member

  • Members
  • PipPipPip
  • 3252 posts

Posted 07 December 2012 - 10:06 PM

Doing a Properties report before and after Clean Phantom Records will give an indication of how many were deleted for certain tables.
Tom user of RM6306 FTM2014 Ancestry.ca FamilySearch.org FindMyPast.com
Posted Imagewiki, exploiting the database in special ways >>> Posted Image app, a growing bundle of RootsMagic utilities.