Jump to content


Photo

Small GEDCOM issue

gedcom export

  • Please log in to reply
No replies to this topic

#1 RobJ

RobJ

    Advanced Member

  • Members
  • PipPipPip
  • 56 posts

Posted 04 August 2017 - 04:32 PM

It's been quieter here lately, so I hope Renee doesn't mind this report!  My current workflow is somewhat different than others, I've been creating RM trees using TreeShare, then selecting one or more individuals to export in a GEDCOM, then import into WikiTree.  Not surprising, there are a number of issues.  I finally discovered the RM export option "Extra details (RM specific)", and turned that off, and that dropped a LOT of extraneous info from the resultant GEDCOM.  But the GEDCOM still includes some RM specific info, that I have to manually remove.  Every image, and that includes the image of every source page, includes RM GEDCOM tags (with leading underscores).  Here's an example (and there are dozens per person):

 

3 OBJE
4 FILE Z:\Data\RootsMagic\Lynch1_media\31111_4330101-00182.jpg
4 FORM jpg
4 _TYPE PHOTO
4 _SCBK N
4 _PRIM N

(This is attached to a source.)  WikiTree's handling of this is partly at fault, and I have to take that up with them.  WikiTree drops the file path (a real problem!  lots of info about the image but no image or even the file name of the image), and it drops '_PRIM N', but keeps the rest, in translated forms.

   'FILE Z:\Dat...jpg' becomes 'File   '

   'FORM jpg' becomes 'Format: jpg'

   '_TYPE PHOTO' becomes 'PHOTO     '

   '_SCBK N' becomes 'Scrapbook: N '

 

My fix request: please consider _TYPE and _SCBK as RM specific, and don't include them if the option is off.  An Ancestry.com GEDCOM does not include them, is relatively clean (at least for an Ancestry.com export!).

 

Related issue: (sorry, no example here) before I found and turned off RM specific info in the export, the sources in the GEDCOM included a lot more garbage, including (for each source citation) 3 copies of the publishing statement, one in parentheses, one not in parens but often missing a random amount from the beginning, and a third that literally included the terms PUBPLACE, PUBYEAR, and PUB something else.  It also included a repeat of much of the source name.  While I could use global replace for some problems, every source required considerable editing, much of it by hand, one by one.  Edit: I should have said that this may be partly or wholly the fault of WikiTree, being ill-equipped to handle RM specific tags.  Edit2: I created a GEDCOM with RM specific stuff turned on, to see if I could find an example of corrupted truncated data to show here, but could not find any.  That would seem to implicate WikiTree as the 'truncator'.  They clearly don't know what to do with it all, but try anyway and concatenate everything as part of a 'citation', including 3 variant copies of the publishing statement, 'TID 439', and the 'Publisher', 'PubPlace', and 'PubDate' tags.  Obviously I shouldn't have included RM specific data in the export.

 

Additional comments:  something I find quite striking is the media included with the TreeShare download.  An Ancestry.com GEDCOM does not include any media (for a tree that I had not added any media).  If I use TreeShare to create an RM tree, a whole bunch of media (much of it in duplicate) is downloaded, into a Media folder, and consists of images of all of the census pages and other source documents.  What is particularly amazing to me is how fast these images are downloaded, as the whole lot of them are generally downloaded faster than I could access just one of them within Ancestry.com!  Clearly, Ancestry must be caching these images, and providing immediate access through the API, but if you try to view one within the Ancestry interface, it appears to ignore the cached version and slowly search and retrieve the original image all over again.  (I won't mention the cryptic names assigned, or the media file duplication, has been reported by others elsewhere.)







Also tagged with one or more of these keywords: gedcom export