Jump to content


Photo

Import of UTF8 gedcom mangles accented chars


  • Please log in to reply
1 reply to this topic

#1 KenCRoy

KenCRoy

    Advanced Member

  • Members
  • PipPipPip
  • 316 posts

Posted 22 June 2010 - 06:17 AM

Interestingly, RootsMagic 4 only exports as CHAR UTF8, but when importing a GEDCOM file created as CHAR UTF8 where the file is encoded as UTF8 without BOM, all French accented names get mangled.

For example,

1 NAME Théophile /Soucy/

gets imported as Théophile

It appears to me that RootsMagic is expecting an ANSI encoded file for input and is not looking at the CHAR UTF8 to determine how to handle the input data.

Edited - to add Support Ticket submitted on this problem.

Ticket Number 21F-13B37622-23B6

Edited - to add

This problem does not exist with a RootsMagic 4 generated gedcom file, which shows a file encoding of UTF-8.

Problem exists for UTF-8 encoded files that are saved as UTF-8 without BOM

which was also added to the ticket.

#2 TreeTraverser

TreeTraverser

    Advanced Member

  • Members
  • PipPipPip
  • 48 posts

Posted 25 June 2010 - 02:27 PM

It seems that "1 NAME Théophile /Soucy/" is ANSI-encoded. If the GEDCOM were UTF-8, wouldn't it appear as "1 NAME Théophile /Soucy/"? I think the presence of a Byte Order Mark (BOM) is optional and RM should base its parsing on the CHAR tag, which is required in a GEDCOM.