Jump to content


Photo

Merge Mysteries looking at db Properties


  • Please log in to reply
15 replies to this topic

#1 TomH

TomH

    Advanced Member

  • Members
  • PipPipPip
  • 3674 posts

Posted 12 February 2010 - 11:09 PM

Given some recent discussion about RM4 slowing after a lot of merging, I ran a little test and logged the database properties at each stage:
  • Original - a new database, drag'n'dropped a Person, his descendants and their spouses from another database, 240 persons in all.
  • Duplicated - repeated the drag'n'drop of the same set of persons so everything should be duplicated.
  • Fully Merged - ran the Automatic Merges tool with all boxes checked, as it comes up by default, i.e., SmartMerge, ShareMerge, SourceMerge and RepositoryMerge; the properties should return to that of #1.
The properties report below shows a doubling of most properties at step 2, which is to be expected. Some exceptions are explainable or questionable:
  • Unresolved duplicate names from 5 to 245: OK because 240 duplicates were added.
  • Unresolved Duplicates with Media Links from 0 to 5: probably OK and the qty merely coincidental with #1, they are not the same duplicates.
  • Places: a little puzzling, all the properties stay constant, indicating that the duplicate Places and Place Details were dropped on import or automatically merged, except the one property "Used, having Place Detail Notes", which doubled. I will have to look into the query that produces this last result; it may be misleading.
  • Multimedia items - constant: indicates that the duplicate item was dropped on import or automatically merged. The high count of missing thumbnails merely indicates that all or most of the items have yet to be viewed in this database as the thumbnail data is dropped in the transfer, the thumbnail being regenerated when first called upon by the program.
  • Addresses - doubled: if Places don't double, then why should addresses if identical?
After step 3, the properties should be back to the same as 1 but are not, in some cases, indicating some fault with the merging process:
  • Families: 1 extra
  • Total sources: 4 extra
  • Total Citations: 244 extra
  • Multimedia links: 2 extra
  • Addresses: double

Orig Dupli- Fully Variable                                Remark                                                              
inal cated Merged                      
240    480   240People                                   all records in PersonTable                                                                   
   0      0     0- Nameless People                        no record in NameTable for that RIN                                                          
   5    245     5- Unresolved Duplicate Names             duplicate Given and Surnames, not flagged as "Not a Problem"                                 
   0      0     0- Resolved* Duplicate Names              flagged as "Not a Problem" - flags lost on transfer                                          
   0      5     0- Unresolved Duplicates with Media Links links lost on merge                                                                          
  86    172    87Families                                 all records in FamilyTable                                                                   
653   1306   653Events                                   all records of EventTable                                                                    
   0      0     0- Orphaned Events                        events for which no person or family match in respective tables                              
   0      0     0Alternate names                          all records in NameTable where IsPrimary=0                                                   
   0      0     0- Orphaned Alternate names*              no Primary name record found                                                                 
297    297   297Total Places                             all records in PlaceTable incl Places and Place Details (Sites)                              
139    139   139- System Places                          system supplied Places: LDS Temples                                                          
156    156   156- User Places                            user defined Places excl Sites                                                               
   0      0     0-- Unused User Places*                   not used by EventTable, will be dropped in a transfer                                        
   2      2     2-- User Place Details                    user defined Sites                                                                           
   1      2     1--- Used, having Place Detail Notes*     Site Notes will be lost in a transfer                                                        
   0      0     0--- Unused Place Details*                Sites will be lost in a transfer                                                             
  10     20    14Total Sources                            all records from SourceTable                                                                 
   0      0     0- Unused Sources                         SourceTable records not used in CitationTable                                                
252    504   496Total Citations                          all records from CitationTable                                                               
   0      0     0- Sourceless Citations*                  no SourceTable record for this CitationTable record                                          
   0      0     0- Headless Citations*                    CitationTable records for which no Person, Event, Family, AltName found; cleaned on transfer 
   1      2     1Repositories                             all records from AddressTable of type Repository                                             
   2      4     2To- do tasks                             all records from ResearchTable                                                               
  17     17    17Multimedia items                         all records from MultimediaTable                                                             
  16     16    16- lacking thumbnail                      probably an imported reference to an image file that has yet to be found                     
  17     34    19Multimedia links                         all records from MediaLinkTable                                                              
   0      0     0- with Date & Description*               if a record has both, the Description is lost in a transfer                                  
  11     22    22Addresses                                all records from AddressTable of type Address                                                
   0      0     0- blank names                            Name field of AddressTable record is blank                                                   
   0      0     0Correspondence                           all records from ResearchTable of type Correspondence                                        
                 * NOT TRANSFERABLE                       via GEDCOM or Drag&Drop to another RM database                                               

Tom user of RM6314 FTM2014 Ancestry.ca FamilySearch.org FindMyPast.com
SQLite_Tools_For_Roots_Magic_in_PR_Celtiwiki, exploiting the database in special ways >>> RMtrix_tiny.png app, a growing bundle of RootsMagic utilities.


#2 TomH

TomH

    Advanced Member

  • Members
  • PipPipPip
  • 3674 posts

Posted 13 February 2010 - 12:58 AM

I think I figured out the cause of the unmerged Sources and Citations. RootsMagic fails to merge identical Sources and Citations if any of the Source Text, Source Comment, Citation Text and Citation Comment fields are non-blank. At least, that would appear to be the case for this exercise: there are only 8 unduplicated citations in the merged set and all of the above fields are blank. The other 488 all are non-blank in at least one of those fields - 244 unique citations duplicated to duplicate sources. Support ticket coming up!

Ticket Number 0E1-13099557-E4C2

Edited by TomH, 13 February 2010 - 10:53 AM.

Tom user of RM6314 FTM2014 Ancestry.ca FamilySearch.org FindMyPast.com
SQLite_Tools_For_Roots_Magic_in_PR_Celtiwiki, exploiting the database in special ways >>> RMtrix_tiny.png app, a growing bundle of RootsMagic utilities.


#3 Nettie

Nettie

    Advanced Member

  • Members
  • PipPipPip
  • 1373 posts

Posted 13 February 2010 - 02:50 PM

Thanks Tom for doing the background work. Hopefully they get this fixed. I did not know why things were not working and sometimes the SMARTMERGE does not even merge persons that are identical in name and facts. So something is not right with that process.

I appreciate the extra work you did to figure out what is going on. I hope programers really look at this. :)
Genealogy:
"I work on genealogy only on days that end in "Y"." [Grin!!!]
from www.GenealogyDaily.com.
"Documentation....The hardest part of genealogy"
"Genealogy is like Hide & Seek: They Hide & I Seek!"
" Genealogists: People helping people.....that's what it's all about!"
from http://www.rootsweb....nry/gentags.htm
Using FO and RM since FO2.0 = Researching the families of William DeCoursey/cy b. 1756 Baltimore Co. MD found father Leonard DeCause..

#4 TomH

TomH

    Advanced Member

  • Members
  • PipPipPip
  • 3674 posts

Posted 13 February 2010 - 05:12 PM

Thanks for the encouragement, Nettie. It would be great to see the program doing a thorough job of merging and cleaning up behind.

Delete Duplicate Citations
I've now developed an SQLite procedure which deletes duplicate citations where they are identical in Fact Type, Source Name, Source Ref#, Source Text, Source Comments, Citation (or Source Details) Ref#, Citation Text, Citation Comments. That worked fine on my MergeTest database bringing the number of citations back down to what it was before duplication. It left the extra Sources but then they could be readily picked out in the Source List and deleted from there with accidental deletion of used Sources discouraged by the warning prompt.

That procedure is not rigorous enough for general use because it ignores the Source and Citation Fields content but it worked fine for my test database because they provided no finer differentiation among the citations. Also Source and Details Media were ignored.

Extra Family (bug?)
Looking into the extra Family after the merging, it appears as though a spouse of a descendant of the ancestral line copied over to the database had two links to the same parent in the source database. On copying over the first time, the parent was dropped but the FamilyID persisted in the FamilyTable and the child continued to be shown twice in the ChildTable. On copying over the second time, this duplicate person had two links to the duplicate FamilyID. On merging, the RIN of the child was made the same but the duplicate Family was left alone. So the child belonged to two Families, each doubly linked. In the Edit Person screen, you see 4 rows for Parents (with no names). A procedure to find such duplicate linkages might be helpful.

If this is a bug, it is that the Family/Child linkage was carried over at all because this was a spouse of a descendant. It wasn't for every other descendants' spouses so it is triggered by exceptional conditions in the source database.

Tom user of RM6314 FTM2014 Ancestry.ca FamilySearch.org FindMyPast.com
SQLite_Tools_For_Roots_Magic_in_PR_Celtiwiki, exploiting the database in special ways >>> RMtrix_tiny.png app, a growing bundle of RootsMagic utilities.


#5 TomH

TomH

    Advanced Member

  • Members
  • PipPipPip
  • 3674 posts

Posted 12 April 2010 - 09:41 PM

I think I figured out the cause of the unmerged Sources and Citations. RootsMagic fails to merge identical Sources and Citations if any of the Source Text, Source Comment, Citation Text and Citation Comment fields are non-blank. At least, that would appear to be the case for this exercise: there are only 8 unduplicated citations in the merged set and all of the above fields are blank. The other 488 all are non-blank in at least one of those fields - 244 unique citations duplicated to duplicate sources. Support ticket coming up!

Ticket Number 0E1-13099557-E4C2


No change in 4.0.8.3.

I find the People View and Sidebar Index don't update after the Auto Share Merge, even though the persons are merged. The People View is also rife with Alternate Name icons beside blank names. Click on one of these or one of the ghost names and you Edit Person opens on the last real record viewed. Switching views in either the Main or Sidebar has no effect when you return to People and Index. To refresh the People View, you have to sort a column. To refresh the Index, you have to toggle Options > Show Alternate Names. Not very intuitive. Why doesn't the display update correctly after the merge?

Tom user of RM6314 FTM2014 Ancestry.ca FamilySearch.org FindMyPast.com
SQLite_Tools_For_Roots_Magic_in_PR_Celtiwiki, exploiting the database in special ways >>> RMtrix_tiny.png app, a growing bundle of RootsMagic utilities.


#6 Vyger

Vyger

    Advanced Member

  • Members
  • PipPipPip
  • 2374 posts

Posted 16 April 2010 - 06:22 AM

After step 3, the properties should be back to the same as 1 but are not, in some cases, indicating some fault with the merging process:

I have noticed this and although duplication was also created in RM3 this happens to a greater extent in RM4.

1. Export/import a file to clean it of redundant information.
2. Create a fresh database and import the gedcom twice into it.
3. Once Sharemerge and other merges are run you should be back to the properties on the single file, but are not.

This is a problem which needs to be overcome in RM4 and is much more prevalent than in RM3. I don't really think you can market ShareMerge as a feature whilst it has this duplication issue, it's not what users would be expecting.

Software Comparisons - Place Management - How other software packages stack up.
Media Gallery (a critical look) - Written when RM4 was introduced but still applies today.

Relaxation is the key to life and this is where I get some time to relax and catch up on my hobby and research s the key to life and this is where I get some time to catch up on me genealogy work and research


#7 Renee Zamora

Renee Zamora

    Advanced Member

  • Support
  • PipPipPip
  • 4152 posts

Posted 28 May 2010 - 11:51 AM

These additional findings and observations on this merging issue have been reported.
Renee
RootsMagic

#8 Vyger

Vyger

    Advanced Member

  • Members
  • PipPipPip
  • 2374 posts

Posted 17 March 2014 - 02:44 PM

Delete Duplicate Citations
I've now developed an SQLite procedure which deletes duplicate citations where they are identical in Fact Type, Source Name, Source Ref#, Source Text, Source Comments, Citation (or Source Details) Ref#, Citation Text, Citation Comments. That worked fine on my MergeTest database bringing the number of citations back down to what it was before duplication. It left the extra Sources but then they could be readily picked out in the Source List and deleted from there with accidental deletion of used Sources discouraged by the warning prompt.

That procedure is not rigorous enough for general use because it ignores the Source and Citation Fields content but it worked fine for my test database because they provided no finer differentiation among the citations. Also Source and Details Media were ignored.


TomH, did you ever work further on this duplication removal procedure, time and RM versions have moved on and this problem was never resolved by the vendors.

Doing some tidy up work your excellent RMGC Properties reported 916 Duplicate Citations and 435 Headless Citations in the master database I am building.

I feel my hand reaching out for a sledgehammer but decided to have a cup of tea instead. <_<

Software Comparisons - Place Management - How other software packages stack up.
Media Gallery (a critical look) - Written when RM4 was introduced but still applies today.

Relaxation is the key to life and this is where I get some time to relax and catch up on my hobby and research s the key to life and this is where I get some time to catch up on me genealogy work and research


#9 TomH

TomH

    Advanced Member

  • Members
  • PipPipPip
  • 3674 posts

Posted 17 March 2014 - 09:15 PM

Sources - Merge Duplicate Masters was the most recent but I don't think is what you want.
Delete Phantom Citations - Query should clean up the Headless Citations.
All Citations & Dupes Count - Query is an old one but might help you find the duplicates. Possibly it could be built on to merge or delete the duplicates.
Digging into the unpublished queries, I find Citations-Dupes-Delete.sql last revised Dec 2010. I will send it to you to try on a copy of your database.

Tom user of RM6314 FTM2014 Ancestry.ca FamilySearch.org FindMyPast.com
SQLite_Tools_For_Roots_Magic_in_PR_Celtiwiki, exploiting the database in special ways >>> RMtrix_tiny.png app, a growing bundle of RootsMagic utilities.


#10 Vyger

Vyger

    Advanced Member

  • Members
  • PipPipPip
  • 2374 posts

Posted 18 March 2014 - 02:48 AM

Digging into the unpublished queries, I find Citations-Dupes-Delete.sql last revised Dec 2010. I will send it to you to try on a copy of your database.


Cheers TomH, webtags or recent changes were not an issue, ran your procedure and 916 records updated. Checks on the individuals I was working and known to be in issue revealed they were all cleaned of duplicates citations.

That saved a lot of time and it is just a pity that despite the Rootsmagicians efforts towards Data Cleaning the Rootsmagic program V6 still allows this duplication to build up without an effective inboard utility to clean up after merge operations.

Software Comparisons - Place Management - How other software packages stack up.
Media Gallery (a critical look) - Written when RM4 was introduced but still applies today.

Relaxation is the key to life and this is where I get some time to relax and catch up on my hobby and research s the key to life and this is where I get some time to catch up on me genealogy work and research


#11 TomH

TomH

    Advanced Member

  • Members
  • PipPipPip
  • 3674 posts

Posted 18 March 2014 - 08:51 PM

That's good news that it worked okay for your database, Jackson, despite being four years old. For the benefit of anyone else, I have posted it to Citations - Delete Duplicates.

I have not tested merging recently - are you saying that RM 6306 still does not merge duplicate citations or were these a consequence of work you had done in a much earlier version?

Tom user of RM6314 FTM2014 Ancestry.ca FamilySearch.org FindMyPast.com
SQLite_Tools_For_Roots_Magic_in_PR_Celtiwiki, exploiting the database in special ways >>> RMtrix_tiny.png app, a growing bundle of RootsMagic utilities.


#12 TomH

TomH

    Advanced Member

  • Members
  • PipPipPip
  • 3674 posts

Posted 20 March 2014 - 05:28 AM

A quick test merging a database with a copy of itself proves that, years later, merging still leaves a trail of duplicates and detritus. Database properties shows a doubling or near doubling of several stats. I have not looked into it very much but citations for events are duplicated while citations for persons do not appear to be. I left all the auto merge options checked.

Tom user of RM6314 FTM2014 Ancestry.ca FamilySearch.org FindMyPast.com
SQLite_Tools_For_Roots_Magic_in_PR_Celtiwiki, exploiting the database in special ways >>> RMtrix_tiny.png app, a growing bundle of RootsMagic utilities.


#13 Renee Zamora

Renee Zamora

    Advanced Member

  • Support
  • PipPipPip
  • 4152 posts

Posted 20 March 2014 - 02:44 PM

A quick test merging a database with a copy of itself proves that, years later, merging still leaves a trail of duplicates and detritus. Database properties shows a doubling or near doubling of several stats. I have not looked into it very much but citations for events are duplicated while citations for persons do not appear to be. I left all the auto merge options checked.


I was testing this by creating a new file. I drag n dropped from a small file twice into the new file. Then I used AutoMerge to merge everyone. I found that two sources did not merge. When I did automerge on sources they still didn't merge. They are attached on the person and fact levels so that's not the issue. I can't figure out why these sources are not merging. They appear exactly the same in every way. This has been added to our tracking system for development to look at further.
Renee
RootsMagic

#14 Vyger

Vyger

    Advanced Member

  • Members
  • PipPipPip
  • 2374 posts

Posted 20 March 2014 - 03:01 PM

I thought I had replied to this thread earlier with the same merge problem results that TomH and Renee have highlighted.

Keeping larger databases clean is a growing problem and identical sources and citations should merge, so lets hope the Rootsmagician can look at Renees example.

Software Comparisons - Place Management - How other software packages stack up.
Media Gallery (a critical look) - Written when RM4 was introduced but still applies today.

Relaxation is the key to life and this is where I get some time to relax and catch up on my hobby and research s the key to life and this is where I get some time to catch up on me genealogy work and research


#15 c24m48

c24m48

    Advanced Member

  • Members
  • PipPipPip
  • 1211 posts

Posted 20 March 2014 - 05:54 PM

I was testing this by creating a new file. I drag n dropped from a small file twice into the new file. Then I used AutoMerge to merge everyone. I found that two sources did not merge. When I did automerge on sources they still didn't merge. They are attached on the person and fact levels so that's not the issue. I can't figure out why these sources are not merging. They appear exactly the same in every way. This has been added to our tracking system for development to look at further.


The following is a wild theory, and I'm even going to shoot down my own theory.

I've never tested this, but what if what I will call "notes in a source" - Master Source Text, Detail Text, and that sort of thing - have the same bug as general notes, family notes, and event notes where a trailing carriage return line feed is dropped in a GEDCOM export. Then the source or citation would not look the same after being dragged and dropped as it did before being dragged and dropped.

Here is where I shoot down my theory. If you drag and drop the same data twice and if the same carriage return and line feed in a "note in a source" is dropped both times, then then neither dragged and dropped source will match the original but they should match each other and therefore they should merge.

But what if you created the new database first by copying the old database to the new database (thereby preserving everything) and second by dragging and dropping from the old database to the new database. In this case (and if "notes in a source" have the same GEDCOM bug as regular notes), then it's possible the otherwise identical sources could differ only by the presence or absence of a trailing carriage return linefeed in a "note in a source". And if they differ, then they won't merge.

Still a shot in the dark!

Jerry


Edited: oops, I didn't reread the whole thread carefully enough. Tom already identified a problem where any non-blank "notes in a source" prevent merging, even if the non-blank "notes in a source" are equal. That very much supercedes my wild theory.

#16 c24m48

c24m48

    Advanced Member

  • Members
  • PipPipPip
  • 1211 posts

Posted 20 March 2014 - 06:05 PM

..... citations for events are duplicated while citations for persons do not appear to be.


Very interesting observation. The failure to merge citations for events is clearly a bug.

However, I think there is really a much deeper problem in that RM's whole view of copying citations feels very wrong to me (see "Adventures in Extreme Splitting" for more details). Which is to say, when you Memorize and Paste a citation, RM effectively creates a "duplicate citation", and does so on purpose.

RM only treats citations as duplicate during a merge if the citations are for the same person (and absent the bug for events) for the same event for the same person. That's consistent with the current design where "duplicate citations" are created on purpose and are not treated as duplicate. But the underlying data model could (and should, in my opinion) be much improved by treating citations much more like Place Details and by not creating what I'm calling "duplicate citations" on purpose.

Jerry