Database tools - what is Clean phantom records?
Started by zhangrau, Dec 07 2012 09:29 AM
9 replies to this topic
#1
Posted 07 December 2012 - 09:29 AM
I just loaded RM6.0.0.3and decided to run the Database Tools on my existing file. Curiously, there is a new tool, which is not explained in the Help file. What's the difference between "Compact database" and the new tool called "Clean phantom records" ??????
Revision: As I worked my way down the list, I found that "Compact database" still issues a "cannot vacuum" error when run directly after "Rebuild indexes." There is a work-around, however. When I remember to do it, switching the focus to a different individual AFTER "Reindex" and BEFORE "Compact" prevents the "cannot vacuum" error.
Revison #2: Here's a thought: Wouldn't it be nice to have a single button to run ALL of the Database tools, and to include a progress bar?
Revision: As I worked my way down the list, I found that "Compact database" still issues a "cannot vacuum" error when run directly after "Rebuild indexes." There is a work-around, however. When I remember to do it, switching the focus to a different individual AFTER "Reindex" and BEFORE "Compact" prevents the "cannot vacuum" error.
Revison #2: Here's a thought: Wouldn't it be nice to have a single button to run ALL of the Database tools, and to include a progress bar?
#2
Posted 07 December 2012 - 10:31 AM
zhangrau, on 07 December 2012 - 09:29 AM, said:
I just loaded RM6.0.0.3and decided to run the Database Tools on my existing file. Curiously, there is a new tool, which is not explained in the Help file. What's the difference between "Compact database" and the new tool called "Clean phantom records" ??????
"Clean phantom Records" used to be part of "Rebuild Indexes". It will remove "ghost children and spouses" in the database. I will make a note in the tracking system that we need to add this to Help.
Quote
Revision: As I worked my way down the list, I found that "Compact database" still issues a "cannot vacuum" error when run directly after "Rebuild indexes." There is a work-around, however. When I remember to do it, switching the focus to a different individual AFTER "Reindex" and BEFORE "Compact" prevents the "cannot vacuum" error.
Have you opened a support ticket and attached your database so we can test why its not compacting the database?
Quote
Revison #2: Here's a thought: Wouldn't it be nice to have a single button to run ALL of the Database tools, and to include a progress bar?
Confirming enhancement request is in our tracking system.
Renee
RootsMagic
RootsMagic
#3
Posted 07 December 2012 - 10:34 AM
Yes, I have opened a support ticket. Unfortunately, it is a rather large database (around 330K individuals) and would not upload as an attachment to the ticket.
#4
Posted 07 December 2012 - 10:38 AM
If you let them know you have that issue you will be given a link to a service to upload the large file to us.
Renee
RootsMagic
RootsMagic
#5
Posted 07 December 2012 - 02:53 PM
zhangrau, on 07 December 2012 - 09:29 AM, said:
Revison #2: Here's a thought: Wouldn't it be nice to have a single button to run ALL of the Database tools, and to include a progress bar?
I'm happy to see Delete Phantoms moved out of Rebuild Indexes; I think it could readily and logically be integrated with Compact Databases as there is not much point in compacting without having first cleaned out unused records (Phantoms) to free up the space to be vacuumed by SQLite.
I would further like to see Rebuild Indexes split into REINDEX (the pure SQLite function) and UPDATE Birth/Death Years. The latter can get out of sync with user operations on the database and may get slower when Christening or other facts are incorporated as alternate dates when the main one is unavailable. One should not be deterred from using the needed tool because it takes too long.
Tom user of RM6103 RM5041 RM4121 FO9 PAF5 FTM2012 Ancestry.ca FamilySearch.org FindMyPast.com
: exploiting the database in special ways >>>
: a growing bundle of utilities in an easy app.
: exploiting the database in special ways >>>
: a growing bundle of utilities in an easy app.
#6
Posted 07 December 2012 - 07:44 PM
Not being an SQL programmer (or user), I always just run all of the tools in the sequence that the buttons appear. I use RM a lot, but that doesn't mean that I understand the intricasies of the programming involved....
As it is, I find that I almost always have time to dosomething else whilewaiting on the Reindex and Compact tools - they take somewhere around 3 to 5minutes to run. If I could run a single tool, knowing that I have 10 minutes of waiting, I could go do something else and let the computer do its thing.
As it is, I find that I almost always have time to dosomething else whilewaiting on the Reindex and Compact tools - they take somewhere around 3 to 5minutes to run. If I could run a single tool, knowing that I have 10 minutes of waiting, I could go do something else and let the computer do its thing.
#7
Posted 07 December 2012 - 08:03 PM
Running the database tools on a test RM6 database of +/- 75,000 individuals, stored on a 512GB SSD ----
Test database integrity - 7.3 seconds
Rebuild indexes - 20.7 seconds
Compact database - 11.9 seconds
Clean phantom records - < 1 second
Test database integrity - 7.3 seconds
Rebuild indexes - 20.7 seconds
Compact database - 11.9 seconds
Clean phantom records - < 1 second
#8
Posted 07 December 2012 - 09:50 PM
That's fast, Ludlow Bay! Thanks to the pricey solid-state drive and, undoubtedly, a powerful computer. Others running RM on an Atom-powered Netbook with 1GB of RAM and a slow hard drive will see far, far longer results. On my low end Win 7 laptop with 4GB and some not-so-hot AMD dual processor (RM only uses one), a 162,000 person database took 2:20 to Rebuild Indexes under 6.0.0.2. Using SQLiteSpy, the SQLite REINDEX on its own ran in about 1:20. If Rebuild Indexes is taken up roughly 50-50 by REINDEX and UPDATE Birth/Death Year and the overall time is measured in minutes, then I'm all for splitting it into the two parts. We often suggest that users Rebuild Indexes to accomplish just one of those tasks because it's the one that could solve some problem. Short durations encourage use; slogging procedures the opposite.
I'm all for efficient procedures, too. I wrote a SQLite query (see Rebuild Indexes and Update Birth and Death Years) that does the UPDATE Birth/Death year in 25 seconds on this big database, probably less than half the time that it takes RM within the Rebuild Indexes tool. I revised it to factor in Christen, Baptism, Burial and Cremation facts as secondary dates to establish the Birth/Death years, which doubled the time to 55 seconds, probably equal to the current RM procedure. It would be nice if RM could incorporate the secondary dates with no increase in its duration. But any increase is all the more reason to break it out.
The Database Tools menu order is not the sequence I would follow. It should be:
UPDATE Birth/Death year does not necessarily belong in that menu group and could be done at any time, especially following merging ops. Maybe it should be under the Tools menu, not File.
Edit: I should add that I am a heavy user of Rebuild Indexes, specifically for the SQLite REINDEX function, because of the operations I do on my database (and other people's) using external SQLite tools that can or necessarily do disrupt the database table indexes from RM's point of view. When so disrupted, RM's integrity check will report it and all I want to do, as quickly as possible, is to REINDEX - skip the Update Birth/Death years - so that subsequent operations in RM are carried out correctly.
I'm all for efficient procedures, too. I wrote a SQLite query (see Rebuild Indexes and Update Birth and Death Years) that does the UPDATE Birth/Death year in 25 seconds on this big database, probably less than half the time that it takes RM within the Rebuild Indexes tool. I revised it to factor in Christen, Baptism, Burial and Cremation facts as secondary dates to establish the Birth/Death years, which doubled the time to 55 seconds, probably equal to the current RM procedure. It would be nice if RM could incorporate the secondary dates with no increase in its duration. But any increase is all the more reason to break it out.
The Database Tools menu order is not the sequence I would follow. It should be:
- Test Database Integrity
- REINDEX SQLite tables
- Clean Phantoms
- Compact database
UPDATE Birth/Death year does not necessarily belong in that menu group and could be done at any time, especially following merging ops. Maybe it should be under the Tools menu, not File.
Edit: I should add that I am a heavy user of Rebuild Indexes, specifically for the SQLite REINDEX function, because of the operations I do on my database (and other people's) using external SQLite tools that can or necessarily do disrupt the database table indexes from RM's point of view. When so disrupted, RM's integrity check will report it and all I want to do, as quickly as possible, is to REINDEX - skip the Update Birth/Death years - so that subsequent operations in RM are carried out correctly.
Edited by TomH, 07 December 2012 - 10:04 PM.
Tom user of RM6103 RM5041 RM4121 FO9 PAF5 FTM2012 Ancestry.ca FamilySearch.org FindMyPast.com
: exploiting the database in special ways >>>
: a growing bundle of utilities in an easy app.
: exploiting the database in special ways >>>
: a growing bundle of utilities in an easy app.
#9
Posted 07 December 2012 - 09:50 PM
I just finished a session of merging duplicates in a 7-generation familial sequence. Here's current stats, using my cellphone's timer.
Backup -- 147,627 KB
Integrity check -- 0:57 minutes:seconds
Rebuild index -- 2:28
Compact database -- error "cannot vacuum" -- changed individual focus and re-ran
Compact databse -- 1:19
Clean phantoms -- 0:01
Backup -- 147,352 KB
Properties (not all listed):
People -- 332,086
Families -- 103,502
Events -- 642,715
Places -- 45,438
Sources -- 9,880
Citations -- 1,694,368
Repositories -- 1,106
Multimedia items -- 10,833
So the sequence of tasks took less time than I had previously estimated, and the vacuum error continues to occur.
Backup -- 147,627 KB
Integrity check -- 0:57 minutes:seconds
Rebuild index -- 2:28
Compact database -- error "cannot vacuum" -- changed individual focus and re-ran
Compact databse -- 1:19
Clean phantoms -- 0:01
Backup -- 147,352 KB
Properties (not all listed):
People -- 332,086
Families -- 103,502
Events -- 642,715
Places -- 45,438
Sources -- 9,880
Citations -- 1,694,368
Repositories -- 1,106
Multimedia items -- 10,833
So the sequence of tasks took less time than I had previously estimated, and the vacuum error continues to occur.
#10
Posted 07 December 2012 - 10:06 PM
Doing a Properties report before and after Clean Phantom Records will give an indication of how many were deleted for certain tables.
Tom user of RM6103 RM5041 RM4121 FO9 PAF5 FTM2012 Ancestry.ca FamilySearch.org FindMyPast.com
: exploiting the database in special ways >>>
: a growing bundle of utilities in an easy app.
: exploiting the database in special ways >>>
: a growing bundle of utilities in an easy app.











