Jump to content


Photo

SURNAME Spelling Variation Work Around


  • Please log in to reply
13 replies to this topic

#1 Vyger

Vyger

    Advanced Member

  • Members
  • PipPipPip
  • 3268 posts

Posted 07 November 2010 - 05:48 AM

Many many years ago I wished for an option to have RM Explorer sorted by Surname Soundex rather than Surname Alphabetical as is the only option now. This was to help group alternate spelling surnames together and spot potential duplicate individuals. I still believe it is a valid wish except that it would now need to be extended to the sidebar and People View.

I do various name studies and one big one has four spelling variations influenced by time frame, geography and literacy. All these spellings have the same soundex code, hence my original wish, but they do not group together in RM Explorer or the sidebar so a particular individual might list in any of those four blocks.

Reading Dons Post on the hints & tips board inspired me to a work around which already existed in RM4 but in the midst of so much other work I had overlooked. The alternate name has one drop down option for "Other Spelling" amongst others. It might be a little like using a sledge hammer to crack a nut but for now I can run a piece of script on the gedcom file, check the existing Surname spelling and insert the variations in blocks like below.

1 NAME Anne /Surgenor/
2 GIVN Anne
2 SURN Surgenor
2 TYPE other spelling

Turning on the option to show Alternate Names will now group all those family names together in RM Explorer , Sidebar and People View regardless of the spelling variations. For now I will just exclude the Alternate Name fact from Narrative Reports and keep my fingers crossed that the Soundex wish might become a reality in the future.

I canít really post this on the Hints & Tips as it involves a bit of external work, after all it would be rather time consuming working through all names in the database adding numerous spelling variations one record at a time.

My thanks to Don Newcomb for the spark that brought about this big win for me.

“Your most unhappy customers are your greatest source of learning.” -Bill Gates

 

 

User of Family Historian 6.2.7, Rootsmagic 7.5.8, Family Tree Maker 2014 & Legacy 7.5 (in order of preference)

 

Excel to Gedcom conversion - simple getting started tutorials here

 

Root


#2 Jerry Bryan

Jerry Bryan

    Advanced Member

  • Members
  • PipPipPip
  • 3225 posts

Posted 07 November 2010 - 10:00 AM

I would wish against using Soundex for this purpose. My surname is Bryan. The Soundex code for Bryan is the same as the Soundex code for Brown, but the Soundex code for Bryan is not the same as the Soundex code for Bryant (and Bryant and Bryan are often variant spellings for the same family). There are numerous other examples of this problem. For example, Smith and Sandy have the same Soundex code. Do you really want John Sandy sorted right along with John Smith? But the Bryan/Bryant/Brown situation is the one that most often creates problems for me.

Soundex sounds great in theory, but in my experience it's one of the least effective ways that has ever been invented to deal with spelling variations.

Jerry

#3 Vyger

Vyger

    Advanced Member

  • Members
  • PipPipPip
  • 3268 posts

Posted 07 November 2010 - 10:40 AM

Good points and points I had not realised, I did only wish for it as an option but in the examples you gave it could be more confusing that it would be worth.

There are still other ways known surname variations could be incorporated but for now I am just happy to have found a work around that is of benefit to my own research.

“Your most unhappy customers are your greatest source of learning.” -Bill Gates

 

 

User of Family Historian 6.2.7, Rootsmagic 7.5.8, Family Tree Maker 2014 & Legacy 7.5 (in order of preference)

 

Excel to Gedcom conversion - simple getting started tutorials here

 

Root


#4 Romer

Romer

    Advanced Member

  • Members
  • PipPipPip
  • 2053 posts

Posted 07 November 2010 - 11:54 AM

Vyger, might the creation of a new report that would group together instances of surnames in your database with the same soundex code be of help?

Edit:
Actually, Soundex looks to be already available in Custom Reports. You could save the results as a Text File, parse it in a spreadsheet application, then group by Soundex (via a pivot table if in Excel).

#5 Vyger

Vyger

    Advanced Member

  • Members
  • PipPipPip
  • 3268 posts

Posted 07 November 2010 - 04:09 PM

Vyger, might the creation of a new report that would group together instances of surnames in your database with the same soundex code be of help?

Edit:
Actually, Soundex looks to be already available in Custom Reports. You could save the results as a Text File, parse it in a spreadsheet application, then group by Soundex (via a pivot table if in Excel).

Thanks Romer, I know I can do all that but really working completely within RM would be my goal.

Once I am finished with my duplicate fact elimination on my large merge database I will write a script to achieve the alternate surname spelling.

“Your most unhappy customers are your greatest source of learning.” -Bill Gates

 

 

User of Family Historian 6.2.7, Rootsmagic 7.5.8, Family Tree Maker 2014 & Legacy 7.5 (in order of preference)

 

Excel to Gedcom conversion - simple getting started tutorials here

 

Root


#6 Renee Zamora

Renee Zamora

    Advanced Member

  • Support
  • PipPipPip
  • 8121 posts

Posted 12 November 2010 - 11:48 AM

Good points and points I had not realised, I did only wish for it as an option but in the examples you gave it could be more confusing that it would be worth.

I checked and your enhancement request has not made it into our tracking system for RM4. Do you still think there would be a need for your idea? I can add it if you are still wanting it.
Renee
RootsMagic

#7 Vyger

Vyger

    Advanced Member

  • Members
  • PipPipPip
  • 3268 posts

Posted 12 November 2010 - 01:40 PM

I checked and your enhancement request has not made it into our tracking system for RM4. Do you still think there would be a need for your idea? I can add it if you are still wanting it.

Soundex sorting works fine on the main surnames I research but after reading the feedback of others obviously it does not work so well on other names.

My workaround works for me apart from having to avoid the use of Alternate Names in sentencing etc. It would be nice for a user to enter known spelling variations of the surnames they research and for RM to apply them globally and have those surnames always group together in Explorer, Sidebar and People View but this would be another item that I can't see how to transfer with gedcom.

I have entered the known variations of my main surnames to every individual which is really overkill but suits my needs within the existing RM structure. There has been talk of being able to save the "core" of RM which would include the many items that remain specific to the database which is open, if that became a reality then saving information like known surname spelling variations could also be possible.

It's a subject which dogs every researcher so hope other users will interject with their own views on how this could be achieved.

“Your most unhappy customers are your greatest source of learning.” -Bill Gates

 

 

User of Family Historian 6.2.7, Rootsmagic 7.5.8, Family Tree Maker 2014 & Legacy 7.5 (in order of preference)

 

Excel to Gedcom conversion - simple getting started tutorials here

 

Root


#8 Renee Zamora

Renee Zamora

    Advanced Member

  • Support
  • PipPipPip
  • 8121 posts

Posted 15 November 2010 - 01:37 PM

The method I use in surname studies with many spelling variations is to choose one spelling version and make sure everyone has that as an alternate name. Then I can search for everyone in that family/surname at one time. Makes for a lot less work then to list all the variants they could possibly be known under. This way I know that the Alternate names used were really in the documents, except maybe that one main spelling.
Renee
RootsMagic

#9 Vyger

Vyger

    Advanced Member

  • Members
  • PipPipPip
  • 3268 posts

Posted 06 December 2018 - 09:18 AM

I'm wishing again for the Option to sort by Soundex on the Index pane which I proposed many many years ago but was never adopted.

 

Back in the day before mass literacy you spoke your name to the registrar and the scribe wrote and spelled it as they thought it should be, different scribe, different variations. I’m about to begin research on a new family of McGreevy (most popular spelling), other spellings in order of popularity, McGreevey, McGreavey, McGreavy, McGrevy, McGreivy, McGreevery, McGrevey, McGreivey, McGreeavey, McGreav.

 

So eleven variations in total, sorted all over the place and all with the same Soundex Code of M261, even the one without the trailing letter Y and the one with the letter R within it, if I had the option to sort by the Soundex all would be fine for that surname and it's variant spellings. What I do at present is what Renee suggested all those years ago but for most it’s a totally manual record by record process and therefore prohibitive and to time consuming. Thankfully I can identify the most popular spelling and through a bit of code apply that as an Alternate Name for all the other variants but that is not an option available to most, the fact I'm about to do that for this new research name is what brought this to the fore again.

 

Legacy 7.5 provides the option to display the Soundex on the Index but I don’t believe it can be sorted on, maybe later versions have developed this further.

 

Calculating Soundex on the fly wouldn't be practical each time a list was reordered, a conversion of a database to RM8 would need to insert the Soundex Code into the Name Table and there after use a millisecond or so to update it on any name changes or saves. Then it is only a question of providing the option of sorting on that field.

 

I recognize Jerrys observations that it is not always useful for all names but in all of my name studies it would prove very useful with the exception of my own Surname which has evolved through two Soundex Codes. Another option, which would overcome Jerrys previous objections, would be to facilitate users in creating and maintaining their own lists of spelling variations under a unique code rather that a Soundex Code and provide the option to group and sort on that code as opposed to the Surname spelling, I can see this being especially appealing to many international users where migration to a new country has changed the Surname spelling considerably.

 

I believe more people would find this useful than not and the easy to implement feature would add a unique advantage to Rootsmagic.
 


“Your most unhappy customers are your greatest source of learning.” -Bill Gates

 

 

User of Family Historian 6.2.7, Rootsmagic 7.5.8, Family Tree Maker 2014 & Legacy 7.5 (in order of preference)

 

Excel to Gedcom conversion - simple getting started tutorials here

 

Root


#10 TomH

TomH

    Advanced Member

  • Members
  • PipPipPip
  • 6004 posts

Posted 06 December 2018 - 10:05 AM

Agreed that Sounded coding for surnames ought to be fairly easy. One of SQLite's core functions can encode it on the fly. https://www.sqlite.o...nc.html#soundex

Tom user of RM7550 FTM2017 Ancestry.ca FamilySearch.org FindMyPast.com
SQLite_Tools_For_Roots_Magic_in_PR_Celti wiki, exploiting the database in special ways >>> RMtrix-tiny.png app, a bundle of RootsMagic utilities.


#11 zhangrau

zhangrau

    Advanced Member

  • Members
  • PipPipPip
  • 1413 posts

Posted 07 December 2018 - 07:31 AM

I have no issue with adding Soundex processing as an Option, but it really wouldn't hold any usefulness for me.

 

My initial project was a surname-study of my family's name. It has been traced to early 1600's in France, through Québec, to all of North America.

 

I have documented variations beginning with Cha, Che, Chi, Ga, Ge, Gha, Gi, Ja, Je, Ji, Jo, Sa, Sha, Sho, Za, Zha  - and with multiple spellings of the ending sound, I once counted over 80 variations in the name's spelling - soundex won't help cut that clutter.

 

As Renee suggested, I list an Alternate Name fact for every variation I find for each individual - including apparent misspellings, name changes, and marriages. With those displayed in the Sidebar Index (and using Duplicate Search Merge), I already have the ability to seek out duplicates.



#12 Jerry Bryan

Jerry Bryan

    Advanced Member

  • Members
  • PipPipPip
  • 3225 posts

Posted 07 December 2018 - 08:19 AM

"Sounds like" searches in general are a good idea, and not just in RM. But actual Soundex searches themselves are often of very limited value. There are other "sounds like" systems that are much better than Soundex.

 

Examples abound of the reason I say that actual Soundex searches are often of limited value. Here are just a few.

  • The surname Bryan has the same Soundex code as does the surname Brown. People with the Brown surname are not quite as abundant a are people with the Smith surname, but sometimes it seems like it. If I do a Soundex search for Bryan it can be very difficult to find any Bryans among all the Browns.
  • The surname Bryan does not have the same Soundex code as does the surname Bryant. I have an actual signature of myself and every Bryan ancestor for ten generations, and in no case did anyone ever add a T to the end of the their name. But there are many, many, many historical records of my family where a court clerk or a census enumerator or somebody like that has added a T. Soundex searches for Bryan find variations such as Brian or Bryon etc. just fine. But such searches never find a name ending with a T. Wild card searches are a much better approach.
  • Our immigrant ancestor was named O'Bryan and all his sons dropped the O' prefix. But I still have to look for O'Bryan and its variants such as O'Brien. Wildcard work great for such searches, but not Soundex.
  • The surname Sandy has the same Soundex code as Smith. It seems counter-intuitive, but it is true.

Jerry 



#13 Vyger

Vyger

    Advanced Member

  • Members
  • PipPipPip
  • 3268 posts

Posted 07 December 2018 - 09:20 PM

I have documented variations beginning with Cha, Che, Chi, Ga, Ge, Gha, Gi, Ja, Je, Ji, Jo, Sa, Sha, Sho, Za, Zha  - and with multiple spellings of the ending sound, I once counted over 80 variations in the name's spelling - soundex won't help cut that clutter.

 

As Renee suggested, I list an Alternate Name fact for every variation I find for each individual - including apparent misspellings, name changes, and marriages. With those displayed in the Sidebar Index (and using Duplicate Search Merge), I already have the ability to seek out duplicates.

 

 

"Sounds like" searches in general are a good idea, and not just in RM. But actual Soundex searches themselves are often of very limited value. There are other "sounds like" systems that are much better than Soundex.

 

Examples abound of the reason I say that actual Soundex searches are often of limited value. Here are just a few.

  • The surname Bryan has the same Soundex code as does the surname Brown. People with the Brown surname are not quite as abundant a are people with the Smith surname, but sometimes it seems like it. If I do a Soundex search for Bryan it can be very difficult to find any Bryans among all the Browns.
  • The surname Bryan does not have the same Soundex code as does the surname Bryant. I have an actual signature of myself and every Bryan ancestor for ten generations, and in no case did anyone ever add a T to the end of the their name. But there are many, many, many historical records of my family where a court clerk or a census enumerator or somebody like that has added a T. Soundex searches for Bryan find variations such as Brian or Bryon etc. just fine. But such searches never find a name ending with a T. Wild card searches are a much better approach.
  • Our immigrant ancestor was named O'Bryan and all his sons dropped the O' prefix. But I still have to look for O'Bryan and its variants such as O'Brien. Wildcard work great for such searches, but not Soundex.
  • The surname Sandy has the same Soundex code as Smith. It seems counter-intuitive, but it is true.

Jerry 

 

OK, but I keep hearing objections and reservations and no suggested or progressive solutions to one of the biggest challenges in genealogy research?

 

* Should genealogy software facilitate grouping and sorting of used defined similar surname lists via a custom user defined surname group?

 

* Should genealogy software facilitate the mass creation of an Alternate Name of the most commonly known surname in a family study?

 

* Outside of Soundex and it's known limitations how should genealogy software evolve to cater for this need of the masses?

 

It's up to us and our experience of researching migrant families and name changes through time to be making suggestions to Rootsmagic towards aiding further research, if we sit on our hands then we go nowhere. Apart from changes through history I'm sure you are both well aware of how migrant family names can mutate beyone recognition in some cases any the variants only known to particular researchers so how do you suggest genealogy software comprise this information and facilitate appropriate "like" sorting?


“Your most unhappy customers are your greatest source of learning.” -Bill Gates

 

 

User of Family Historian 6.2.7, Rootsmagic 7.5.8, Family Tree Maker 2014 & Legacy 7.5 (in order of preference)

 

Excel to Gedcom conversion - simple getting started tutorials here

 

Root


#14 Vyger

Vyger

    Advanced Member

  • Members
  • PipPipPip
  • 3268 posts

Posted 18 December 2018 - 11:38 AM

I didn't frame this up that well, it was originally posted quite a few years ago and the title is far from appropriate.

 

Having just returner from Germany with another spelling variation and another Soundex code I will write my thoughts up properly and post again. The thing about the German coffee shop experience is I actually spelled out my surname, there was background noise but it was only when I checked the phonetics of the German alphabet I could see just what happened, a challenge we continue to face every day in genealogy.


“Your most unhappy customers are your greatest source of learning.” -Bill Gates

 

 

User of Family Historian 6.2.7, Rootsmagic 7.5.8, Family Tree Maker 2014 & Legacy 7.5 (in order of preference)

 

Excel to Gedcom conversion - simple getting started tutorials here

 

Root