That would ensure that both users were starting off from the same basis. The original database user could have done any number of things _since_ sending the backup, that might have an effect _comparatively_ upon gauging speed differences. That was my only point.That would test whether backup and restore has any effect on the database, such as the old Pack or Reindex utilities. From my own observations of database properties, backup/restore has no effect - it is just a ZIP utility.

RM4 is SLOW
#41
Posted 18 February 2010 - 09:43 PM
---
--- "GENEALOGY, n. An account of one's descent from an ancestor who did not particularly care to trace his own." - Ambrose Bierce
--- "The trouble ain't what people don't know, it's what they know that ain't so." - Josh Billings
---Ô¿Ô---
K e V i N
#42
Posted 19 February 2010 - 05:47 AM
A robust database should maintain the index files unless some manual deletion of data has taken place and the indexes sghould be rebuilt after any data import so although it would be nice to have the button back I believe there are other database issues that need to be resolved. Either was users do not want to be exporting and importing gedcoms to resolve some underlying programming problem.
Keeping ones customers and their important views at a distance is never a good approach
User of Family Historian 7.0, Rootsmagic 7.6.3
Excel to Gedcom conversion - simple getting started tutorials here
Root
#43
Posted 19 February 2010 - 11:01 AM
File size: 182MB -> 182MB (from WinXP File properties, Explorer reports 187,198KB -> 186,618KB).
- so sqlite 'packing' made little difference, but then Jim says he does not delete persons that are duplicates, rather he edits the surname to preface it with '1', and unlinks from spouses and parents.
Speed: nothing obvious. Merge > Duplicate Search is very slow on both. The 1st stage, Sorting Names, seems to go reasonably fast (~10s) but 2nd stage, Collecting Names, is a slog. My guess is >45 minutes! My own query that finds and counts duplicate Surname+Givens takes ~3min but is much simpler. Duplicate Search created a file in the same folder as the .rmgc file with the same name and the extension .DUP. Its ~100MB size echoes the high number of duplicates my query found (see below).
Jim ran my version of his database on his computer and was surprised that it seemed faster than his original but was clearly experiencing slower response to keyboard shortcuts than I was, apparently some difference in computer performance. However, after re-opening his original and aborting out of the Merge > Duplicate Search after a few minutes with the progress bar at 5%, on the next re-opening his database was 'faster than ever!'. So there is inconsistent behaviour and some unaccountable change occurring even without adding to the database.
I also played with the cache size and Max_Page_Count (supposedly reserves space) using the SQLite manager to no obvious benefit either for its own queries or for RootsMagic; these properties are probably not propagated from one connection to the other.
If there is any conclusion, so far, it would be that the reindexing and packing features of RM3 and FO may not be so important to RM4; that slow performance for larger databases may well be a design or database engine issue but inconsistent behaviour suggests that there needs to be better control over something yet unidentified.
Below is my detailed properties report:
ValueVariable Remark 155501People all records in PersonTable 0- Nameless People no record in NameTable for that RIN 53906- Unresolved Duplicate Names duplicate Given and Surnames, not flagged as "Not a Problem" 0- Resolved* Duplicate Names flagged as "Not a Problem" - flags lost on transfer 61- Unresolved Duplicates with Media Links links lost on merge 72379Families all records in FamilyTable 148662Events all records of EventTable 0- Orphaned Events events for which no person or family match in respective tables 187Alternate names all records in NameTable where IsPrimary=0 0- Orphaned Alternate names* no Primary name record found 5696Total Places all records in PlaceTable incl Places and Place Details (Sites) 139- System Places system supplied Places: LDS Temples 5554- User Places user defined Places excl Sites 1-- Unused User Places* not used by EventTable, will be dropped in a transfer 3-- User Place Details user defined Sites 0--- Used, having Place Detail Notes* Site Notes will be lost in a transfer 2--- Unused Place Details* Sites will be lost in a transfer 1641Total Sources all records from SourceTable 22- Unused Sources* SourceTable records unused by CitationTable 248341Total Citations all records from CitationTable 108- Duplicate Citations identical in most respects, cluttering reports 0- Sourceless Citations* no SourceTable record for this CitationTable record 0- Headless Citations* CitationTable records for which no Person, Event, Family, AltName found; cleaned on transfer 21Repositories all records from AddressTable of type Repository 24To- do tasks all records from ResearchTable 1650Multimedia items all records from MultimediaTable 31- lacking thumbnail probably an imported reference to an image file that has yet to be found 1663Multimedia links all records from MediaLinkTable 15- with Date & Description* if a record has both, the Description is lost in a transfer 320Addresses all records from AddressTable of type Address 0- blank names Name field of AddressTable record is blank 1Correspondence all records from ResearchTable of type Correspondence * NOT TRANSFERABLE via GEDCOM or Drag&Drop to another RM database
Tom user of RM7630 FTM2017 Ancestry.ca FamilySearch.org FindMyPast.com wiki, exploiting the database in special ways >>>
app, a bundle of RootsMagic utilities.
#44
Posted 19 February 2010 - 11:11 AM
Does your eradicate other problems like duplicate Place Details, Media thumbnails, Birth date not appearing in index sidebar etc? - I would expect it should.
Keeping ones customers and their important views at a distance is never a good approach
User of Family Historian 7.0, Rootsmagic 7.6.3
Excel to Gedcom conversion - simple getting started tutorials here
Root
#45
Posted 20 February 2010 - 10:39 AM
While my query of Jim's database reported almost 54000 duplicate names, RM4's DUP file, produced by Tools > Merge > Duplicate Search, lists almost 2,000,000 possible duplicate pairings, which is why the file is so large (124MB), and slow. It takes my computer some 20-25 second to go through the Sorting Names stage (Jim said his took on the order of a minute) and I think mine took around 45 minutes to complete the Collecting Duplicates (Jim closed out due to lack of time).Speed: nothing obvious. Merge > Duplicate Search is very slow on both. The 1st stage, Sorting Names, seems to go reasonably fast (~10s) but 2nd stage, Collecting Names, is a slog. My guess is >45 minutes! My own query that finds and counts duplicate Surname+Givens takes ~3min but is much simpler. Duplicate Search created a file in the same folder as the .rmgc file with the same name and the extension .DUP. Its ~124MB size echoes the high number of duplicates my query found
The DupTable contains only the fields ID1, ID2 (the pairings of RINs of 'duplicate' persons, Merged (a flag that triggers the Merged indicator in the Merge Duplicate Records dialog), and Score, along with three indexes on the three possible combos of ID1 and ID2.
Merging a pair from this list took several seconds. Marking as Not a Problem was faster at approximately 1 second but inconsistent - disk access occurred with each mark. The lack of any on-screen indicator that a pair had been so marked meant that I lost my place if I went from repeatedly doing Alt-N to mark down through the list and then used the mouse to jump back to check. From that point forward, I had no idea if I was unmarking pairs that had been previously marked or jumped forward too far and had missed marking some pairs. Navigation through the list was very sporadic and unpredictable similar to that described below but with delays of perhaps 2s.
Tools > Merge > View 'Not Duplicates' List navigation is very clunky. Paging through it sometimes takes 4 seconds to fetch the next page of pairs for the first time but may be immediate if retracing past steps. Similar jerky effect on the scroll bar and down arrow going forward the first time, smooth going back and forward the second.
The DUPs database is not re-usable from within RM4. The next time you run Tools > Merge > Duplicate Search Merge, it is emptied and re-filled. If you cancel the DSM during the Sorting Names stage, RM4 reports a 'SQLite Error 1 - SQL logic error or missing database'. After sending RootsMagic the error report, the progress bar and Cancel button are frozen but the window can be closed. An empty DUP file and associated SQLite journal file are left behind. Re-starting Duplicate Search Merge results in another error: 'Cannot create file D:\.......'. Unfortunately, the window is not wide enough to report what the filename is but we can guess it's the DUP file. Try to delete and Windows reports it cannot because the file is in use by another program, so close the RM database. Still in use, so close RM4. Now it's deletable.
After deleting RMdatabasename.DUP, re-open RM4 and run Duplicate Search Merge. It runs without error. I Cancel after completion of the 'Sorting Names' stage and part way into 'Collecting Duplicates'. The DUP datafile is back to the same size as the one that had been allowed to get to full completion but the Scoring looks very different. My guess is that the two stages ought to be renamed 'Sorting and Collecting Duplicates' followed by 'Scoring Duplicates'. The good news is that the pairs and all their info are visible in the Merge Duplicate Records dialog and I think it would be safe to proceed to use while ignoring the scoring, e.g., to mark people as 'Not a Problem' (although getting through 2million pairs by hand seems a daunting, RSI prone task). It would be desirable to have the option of marking all pairs with a score below some value as 'Not a Problem' so that this utility might be more useful going forward. However, that would probably result in 2million records being added to the ExclusionTable inside the RM database with other potential side effects.
Sorry if this is too much detail for most - it's the beginning of a report to Support...
Tom user of RM7630 FTM2017 Ancestry.ca FamilySearch.org FindMyPast.com wiki, exploiting the database in special ways >>>
app, a bundle of RootsMagic utilities.
#46
Posted 20 February 2010 - 10:40 AM
I don't know. Can you provide me with a database with these problems? PM for exchange of email addresses.Good work Tom, very interesting although a little disappointing that there may not be a simple fix to the speed issues.
Does your eradicate other problems like duplicate Place Details, Media thumbnails, Birth date not appearing in index sidebar etc? - I would expect it should.
Tom user of RM7630 FTM2017 Ancestry.ca FamilySearch.org FindMyPast.com wiki, exploiting the database in special ways >>>
app, a bundle of RootsMagic utilities.
#47
Posted 20 February 2010 - 05:35 PM
I think I erred above in saying that the DUP file is at full size at the transition between Sorting Names and Collecting Duplicates. When I tried it again, being a bit quicker in hitting cancel, it was only partially filled. So maybe the two stages are more accurately described as "Sorting Names", "Collecting and Scoring Duplicates" or maybe there are three stages, "Scoring" being the last of the three.After deleting RMdatabasename.DUP, re-open RM4 and run Duplicate Search Merge. It runs without error. I Cancel after completion of the 'Sorting Names' stage and part way into 'Collecting Duplicates'. The DUP datafile is back to the same size as the one that had been allowed to get to full completion but the Scoring looks very different. My guess is that the two stages ought to be renamed 'Sorting and Collecting Duplicates' followed by 'Scoring Duplicates'. The good news is that the pairs and all their info are visible in the Merge Duplicate Records dialog and I think it would be safe to proceed to use while ignoring the scoring, e.g., to mark people as 'Not a Problem' (although getting through 2million pairs by hand seems a daunting, RSI prone task). It would be desirable to have the option of marking all pairs with a score below some value as 'Not a Problem' so that this utility might be more useful going forward. However, that would probably result in 2million records being added to the ExclusionTable inside the RM database with other potential side effects.
Using SQLite, I transferred all the duplicate pairings from the DUP file to JimB's RM database ExclusionTable, flagged as "Not a Problem". Now that table has nearly 2million records and the RM database file grew from 182MB to 253MB. Duplicate Search Merge was no faster at getting to the Merge screen but there were no more potential pairings to merge. The View 'Not Duplicates' List was slow to display as before (~4s) and navigation through it not obviously worse.
Tom user of RM7630 FTM2017 Ancestry.ca FamilySearch.org FindMyPast.com wiki, exploiting the database in special ways >>>
app, a bundle of RootsMagic utilities.
#48
Posted 22 February 2010 - 08:32 AM
*Edited - speed slowed when I corrected logic that was skipping a majority of the possible pairings, but still runs 10x faster while returning the same set of pairs.
Edited by TomH, 22 February 2010 - 10:43 PM.
Tom user of RM7630 FTM2017 Ancestry.ca FamilySearch.org FindMyPast.com wiki, exploiting the database in special ways >>>
app, a bundle of RootsMagic utilities.
#49
Posted 03 March 2010 - 08:33 PM
Finally got to posting Ticket Number 0BB-1321D7DC-D670 re this issue.I'm certain that the DSM query design is flawed and that it should execute up to 10x* faster on a large dataset.
Tom user of RM7630 FTM2017 Ancestry.ca FamilySearch.org FindMyPast.com wiki, exploiting the database in special ways >>>
app, a bundle of RootsMagic utilities.
#50
Posted 03 March 2010 - 08:37 PM
Good man Tom, I think my "check for updates" button is broken....Finally got to posting Ticket Number 0BB-1321D7DC-D670 re this issue.

Keeping ones customers and their important views at a distance is never a good approach
User of Family Historian 7.0, Rootsmagic 7.6.3
Excel to Gedcom conversion - simple getting started tutorials here
Root
#51
Posted 22 May 2010 - 01:56 PM
Before on the previous version of RM4 , it ran fast & smootlhly for the 1st 100 merges then it went back to the slow speed again, so it wouldnt be easy to tell just from a fresh import & a couple of merges, you need to run it for a good while
If you want to try out a large gedcom in it to see if the problem is replicated , you can try the one containing the royal lines for 1,000's of years, (i have already imported this one too) at the gedcom database http://www.genealogy...om/gedr2090.htm . . My backup file for either RM3/4 is 250mb
I also find that even after merging in RM3, there are still some people in the database, that show duplicate parents in manual merge window, with ie 1 individual showing 2 wives , with the same names & exactly the same ID no & both women have the same children. Even when exported to RM4 i think these occasional duplicates still showed up.
isis
#52
Posted 23 May 2010 - 04:03 PM
I have some concerns about merge and there are some problems with ShareMerge creating additional citations etc.I also find that even after merging in RM3, there are still some people in the database, that show duplicate parents in manual merge window, with ie 1 individual showing 2 wives , with the same names & exactly the same ID no & both women have the same children. Even when exported to RM4 i think these occasional duplicates still showed up.
I have been bringing together many small databases in expectation of Dynamic Groups being introduced in the not too distant future and have been surprised with the number of duplicate persons after running the automerges. I exported a sample to look at the _UID of each person expecting RM to be to blame but the _UID's were all different. Thinking about my past practices I then realised that when I dragged and dropped John Doe, selected "Everyone in the same Tree", and confirmed that John Doe was the same person I was dropping onto, then John Doe maintained the same _UID but everyone else in his tree got a new _UID. That is currently my suspicion and I believe it will prove to be true.
When you stated the "exactly the same ID" were you referring to the _UID in the gedcom file or Ref No?, if you are not sure export the two individuals in question and open the gedcom file in Notepad as a text file, at the start of each person you will see a line like "1 _UID 94701BD8919CF543A5A1DFCE4000686E65BC", that it the Unique Person ID and if they are both the same I believe RM will merge regardless of other facts.
Keeping ones customers and their important views at a distance is never a good approach
User of Family Historian 7.0, Rootsmagic 7.6.3
Excel to Gedcom conversion - simple getting started tutorials here
Root
#53
Posted 23 May 2010 - 06:13 PM
I have some concerns about merge and there are some problems with ShareMerge creating additional citations etc.
I have been bringing together many small databases in expectation of Dynamic Groups being introduced in the not too distant future and have been surprised with the number of duplicate persons after running the automerges. I exported a sample to look at the _UID of each person expecting RM to be to blame but the _UID's were all different. Thinking about my past practices I then realised that when I dragged and dropped John Doe, selected "Everyone in the same Tree", and confirmed that John Doe was the same person I was dropping onto, then John Doe maintained the same _UID but everyone else in his tree got a new _UID. That is currently my suspicion and I believe it will prove to be true.
When you stated the "exactly the same ID" were you referring to the _UID in the gedcom file or Ref No?, if you are not sure export the two individuals in question and open the gedcom file in Notepad as a text file, at the start of each person you will see a line like "1 _UID 94701BD8919CF543A5A1DFCE4000686E65BC", that it the Unique Person ID and if they are both the same I believe RM will merge regardless of other facts.
Yes I mean the UID number that Rootsmagic gives every person in the database, not any user generated reference. The number that shows next to the persons name in manual duplicate merge. If 2 people have the same RM number you cant merge them into 1 because ts the same person
Also , backups take just under a minute in RM4 even with 1,700,000 people, but it takes about 30mins-1hr to backup the same size gedcom in RM3. It takes a few hours to import the gedcom into either RM3 or RM4
Its just the manual merge , where you merge 2 people which is excruciatingly slow ie 30 mins per person, when you click to go back to the main list of people. I havent tried the automatic duplicate merge search as going by the manual merge it will take hours &/or it will crash & I dont want to corrupt this large gedcom
Try importing these gedcoms into a database, then try manually merging them in RM4, that will give some idea of our problem
http://www.genealogy...om/gedr6217.htm
http://www.genealogy...om/gedr2090.htm
http://www.genealogy...om/gedr6472.htm
http://www.genealogy...om/gedr6001.htm
http://www.genealogy...om/gedr6472.htm
http://www.genealogy...om/gedr6463.htm
This will make a comparable large gedcom for testing
#54
Posted 23 May 2010 - 07:06 PM
I was not dismissing the slowness you describe and I am well aware of it on manual merge as I am currently merging within a large database although not as large as 1,700,000 yet!. I can only hope that something is being worked on this front and that we as users do not just fall into accepting the current speed of operation.Yes I mean the UID number that Rootsmagic gives every person in the database, not any user generated reference. The number that shows next to the persons name in manual duplicate merge. If 2 people have the same RM number you cant merge them into 1 because ts the same person
Two persons with the same _UID will automatically merge when you use ShareMerge, that is what ShareMerge was intended to do so I would suggest you try it.
I have been working my database within the gedcom using my own scripts and most of the scripts I have run complete in less than 10 seconds. My hope is that as Bruce & Team mature with SQL they extract much more speed of operation for RM.
Keeping ones customers and their important views at a distance is never a good approach
User of Family Historian 7.0, Rootsmagic 7.6.3
Excel to Gedcom conversion - simple getting started tutorials here
Root
#55
Posted 17 June 2010 - 09:42 AM
Not the case, today this feature is as painfully slow as it ever was although I am now worried about what type of merging RM was doing so quickly last night and totally confused as to the difference in performance. I tried 4 different computers today with different builds and operating systems and none performed at a speed you would expect bearing in mind the processing involved.
You can read the results and times here and I can only hope someday soon the inherent slow processing operations within RM4 can be finally overcome.

http://www.vyger.co....cate-merge.html
Keeping ones customers and their important views at a distance is never a good approach
User of Family Historian 7.0, Rootsmagic 7.6.3
Excel to Gedcom conversion - simple getting started tutorials here
Root
#56
Posted 17 June 2010 - 12:35 PM
What seems to be happening in your case, Jackson, is that the search is re-done after a merge. (I'm assuming that it took about three minutes of searching and processing to arrive at the first screen of suspected duplicates.)
Tom user of RM7630 FTM2017 Ancestry.ca FamilySearch.org FindMyPast.com wiki, exploiting the database in special ways >>>
app, a bundle of RootsMagic utilities.
#57
Posted 20 June 2010 - 06:00 AM


Tom, I'm not sure what is going on in the background only those at RM know what processing is supposed to happen, without being able to step into or monitor the process it will remain a mystery to me.What seems to be happening in your case, Jackson, is that the search is re-done after a merge. (I'm assuming that it took about three minutes of searching and processing to arrive at the first screen of suspected duplicates.)
The specific file I refer to is one where I have merged most gedcoms into one file and there are a lot of duplication issues that will not help processing times. Renee has indicated that those duplication issues in Sources and Citations and possibly Notes are being looked at so we will watch this space.
Duplication withstanding, RM4, at it's best, still runs this process much slower than RM3 and the time delay can not possible be warranted by the processing involved. IMO
Keeping ones customers and their important views at a distance is never a good approach
User of Family Historian 7.0, Rootsmagic 7.6.3
Excel to Gedcom conversion - simple getting started tutorials here
Root
#58
Posted 20 June 2010 - 07:26 AM
This is weird, there was a rather long winded post here which deserved a considered reply, but I decided to have breakfast first the then respond, now it's not there. Was I dreaming or did someone else see it?
I know it was on this thread, or is there a big brother thing going on?
I saw the post by John James and saved it. BB did remove it, it seems.

#59
Posted 20 June 2010 - 08:25 AM
Hmmmm, I'm confused, I don't know what is real anymore, did anyone watch The PrisionerI saw the post by John James and saved it. BB did remove it, it seems.
There was no bad language in it, so I can't figure out why it was removed.

I was about to reply to the post by John James, but since it was not a quicky I decided to have breakfast first, when I returned it was too late.

Keeping ones customers and their important views at a distance is never a good approach
User of Family Historian 7.0, Rootsmagic 7.6.3
Excel to Gedcom conversion - simple getting started tutorials here
Root
#60
Posted 21 June 2010 - 12:48 PM
Thanks, Laura