Jump to content


Photo

NameClean & PlaceClean need to learn (not a problem)

dataclean

  • Please log in to reply
8 replies to this topic

#1 Vyger

Vyger

    Advanced Member

  • Members
  • PipPipPip
  • 3385 posts

Posted 24 April 2019 - 07:30 PM

I just tried NameClean and it counted over 6000 of my manes had problems where they do not.

 

It seems obvious where NameClean and PlaceClean continue to report high counts of potential problems despite problems not existing that the feature becomes useless and will be ignored by users.

 

NameClean and PlaceClean need some learning capability towards international acceptable standards from the user to build on what is otherwise a useful utility. The ability to apply defaults would still be a welcome option for a thorough analysis of the Name and Place Data.


“Your most unhappy customers are your greatest source of learning.” -Bill Gates

It's now time for discretion, trust, patience and support

 

User of Rootsmagic 7.5.9, Family Historian 6.2.7, Family Tree Maker 2014 & Legacy 7.5

 

Excel to Gedcom conversion - simple getting started tutorials here

 

Root


#2 Jerry Bryan

Jerry Bryan

    Advanced Member

  • Members
  • PipPipPip
  • 3482 posts

Posted 24 April 2019 - 10:06 PM

It's funny that I don't complain about NameClean and PlaceClean. Instead, I just don't use them because they don't meet my needs very well. A quick look at them always gives the impression that they are excellent tools that are well thought out and that they ought to be extremely useful. They even support filtering people by group, whereas many places in RM that to me ought to support filtering people by group don't do so. So it's hard to describe why the tools don't work well for me, but they don't.

 

I remember that before the tools were formally introduced into RM, the tools were available in the production version of RM on a beta test basis. The beta test version actually worked quite well for me. I can't remember or don't understand the reasons for the beta version working so much better for me than the production version. The production version is objectively much better than was the old beta version, but the beta version worked for me and the production version just doesn't. It's probably a case of "The fault, dear Brutus, is not in our stars, But in ourselves", meaning the fault lies with me and not with the software. But I don't understand quite why. In any case, there are many, many areas of RM that are much more important to me than NameClean and PlaceClean.

 

Jerry

 



#3 keithcstone

keithcstone

    Advanced Member

  • Members
  • PipPipPip
  • 121 posts

Posted 25 April 2019 - 02:34 AM

Funny, I had just commenting on how handy I found the place list report when I say this post.

 

I've been spending a lot of time in the Place List and Place Clean in the last year and I agree Place (and Name) clean could use some additional smarts. Not being familiar with all the international naming standards I can't comment on that, but I have some other ideas for intelligence that could be worthy of discussion.

 

- clean should have an option to auto-merge after cleanup. For example if I use data clean to change At Home, Berks co, pa; Berks, PA; Berks, Pennsylvania, etc, I shouldn't be forced to then go into the place list and manually consolidate them.

- You should have the option to auto-flag based on rules, and to scroll through a list of (only) pending changes and then commit them all in one click. 

- you should be able to define transformations that would be "smarter" than simple search and replace and save them for future use. along the lines of IF {place name} = "MO";{place name}="Missouri, United States" and {place name:Country}="U.S.A.";{place name:Country}="United States". So if the only thing in the entire place name field is "MO" I want it to be "Missouri, United States", and if the country portion of the place name is U.S.A. I want it to be United States, but without changes the rest of the place name. There should be a number of standard transformations available, and you should be able to edit and add to them just like you can with fact sentences.



#4 Renee Zamora

Renee Zamora

    Advanced Member

  • Support
  • PipPipPip
  • 8369 posts

Posted 25 April 2019 - 10:25 AM

I'm not sure on a couple of these. 

 

You should have the option to auto-flag based on rules, and to scroll through a list of (only) pending changes and then commit them all in one click.

 

What rules? Are they other than the options you select for running Place or Name Clean?

There is a checkbox in the header row that will mark them all. 

 

you should be able to define transformations that would be "smarter" than simple search and replace and save them for future use. along the lines of IF {place name} = "MO";{place name}="Missouri, United States" and {place name:Country}="U.S.A.";{place name:Country}="United States". So if the only thing in the entire place name field is "MO" I want it to be "Missouri, United States", and if the country portion of the place name is U.S.A. I want it to be United States, but without changes the rest of the place name. There should be a number of standard transformations available, and you should be able to edit and add to them just like you can with fact sentences.

 

There is in the PlaceClean settings an option under "Add or Remove Country" where it will add United States for you. 


Renee
RootsMagic

#5 keithcstone

keithcstone

    Advanced Member

  • Members
  • PipPipPip
  • 121 posts

Posted 25 April 2019 - 10:40 AM

I'm not sure on a couple of these. 
 
 
What rules? Are they other than the options you select for running Place or Name Clean?
There is a checkbox in the header row that will mark them all.

Marking them all is suicidal, there could be hundreds or even thousands of entries that aren't what you want. The problem is the options are only mark everything and hope it doesn't make things worse, tediously go through one at a time clicking until your finger cramps, or taking your changes on a find and replace all. "Rules" should be smarter and user definable, and you should be able to select only some of them to process.
 
 

There is in the PlaceClean settings an option under "Add or Remove Country" where it will add United States for you.

Yes, but that can add country to things you don't want. I'm suggesting "smart" cleaning rules that are cognizant of what part of the field you're worried about.

 

These rules don't have to be terribly elaborate, you already have a basis for them in your sentence structure. Ship the software with a default list that could be enabled or disabled, allow the user to add or change them just like you do with fact types and sentences.



#6 Renee Zamora

Renee Zamora

    Advanced Member

  • Support
  • PipPipPip
  • 8369 posts

Posted 25 April 2019 - 11:59 AM

On DataClean I use one filter at a time so its focused on what issues I am looking for.

 

Confirming these are on the enhancement request list. 


Renee
RootsMagic

#7 Vyger

Vyger

    Advanced Member

  • Members
  • PipPipPip
  • 3385 posts

Posted 25 April 2019 - 12:12 PM

I agree with the need for some custom input of Rules and published some NameClean Observations here previously.

 

Often users criticize Rootsmagic for being too US centric and one cannot expect rules for every country and preference to be incorporated. Custom Place Detail identifiers will vary from Country to Country and quite some time ago I suggested the facilitating of user input strings to help this process. In previous discussions the existing difficulties of dealing with "Mc" and "Mac" surnames has been highlighted and then we go onto The "Von", "De", "Van" and other country specific name variations, I believe Rootsmagic need to embrace some simple rules to enable users to define how these should be dealt with.

 

The image below was what I had suggested for identifying Place Details in other countries some years back.

 

DC1.png


“Your most unhappy customers are your greatest source of learning.” -Bill Gates

It's now time for discretion, trust, patience and support

 

User of Rootsmagic 7.5.9, Family Historian 6.2.7, Family Tree Maker 2014 & Legacy 7.5

 

Excel to Gedcom conversion - simple getting started tutorials here

 

Root


#8 Jerry Bryan

Jerry Bryan

    Advanced Member

  • Members
  • PipPipPip
  • 3482 posts

Posted 25 April 2019 - 03:18 PM

I play with DataClean so seldom that this thread inspired me to revisit it. I decided to look at NameClean instead of PlaceClean for this revisit.

 

I'm with Renee in that when I do play with the tool, I find multiple filters to produce far more results than I can cope with. So I never use more than one filter at a time. When I first entered the tool this time, it remembered that the last time I played with the tool many months ago, I was using the "Misplaced prefixes" filter. It's a very Good Thing that NameClean remembers filters so well, especially since many parts of RM don't have such excellent memories.

 

But after that, my revisit if NameClean immediately went south. Out of a 60,000 person database, I had about 41 hits. It's hard to tell exactly how many hits there were because the list of hits doesn't identify how many hits there were. But more importantly, the list is almost completely unusable. You can't click on any of the hits to see the Edit Person screen for the person or to see the person in Pedigree View or Family View or anything like that. So there is absolutely no context to make a decision about whether to accept RM's recommendations or not. There is not even a Print button so you can make a copy of the list to paper or to a file and work the list from within RM's views and Edit Person screens.

 

Out of curiosity, I did write down one of the people on list - like write down on paper. So I exited NameClean and went to the person in RM. I had to go to the person by name rather than RIN because the list of hits does not include the RIN number. The person's name was Major K. D.  Brown. The tool obviously is suggesting that his name was K. D. Brown and that he might have been a Major in a military force of some sort. This was a person I had imported into my database based on a GEDCOM from a fellow researcher. The import was forever ago, back when I was using PAF and even before I switched from PAF to Family Origins. I basically don't import such GEDCOM's any more, but back then I didn't know any better. It turned out that Major K. D. Brown died in 1970 so I'm comfortable using his real data as an example.

 

I am not related to him. Rather, he is the brother of another man who married a woman who was my second cousin three times removed. It didn't take very long to discover that - only a minute or two. The only citation I had was the old one that gave the name of the fellow researcher from whom I had imported the GEDCOM into PAF so long ago. But it only took my another five minutes or so using modern online databases to come up with a death index entry, an image of his actual death certificate, and an image of his obituary. In these documents his name was listed variously as Major G. Brown and Major Gilliam Brown, and and "Major" was his actual first name and not a military title. Indeed, there was no indication that he ever served in the military. I can't explain where my colleague came up with the K. D. instead of G. or Gilliam for the middle name.

 

Therefore, there is nothing to do in NameClean for this person except not to accept NameClean's recommendation. However, there is no way to tell NameClean that the name for this person is "not a problem" and not to present it to me again. Well, there kind of is, but it's pretty clunky. I can make a group of everybody in my database and restrict NameClean to that group. Then for people like this Major Gilliam Brown, I can remove them from the group. Then I will have to completely rerun my query because I had to destroy the results of the original query to look at Major K. D. Brown in Edit Person, in Pedigree View, and in Family View.

 

Jerry



#9 Vyger

Vyger

    Advanced Member

  • Members
  • PipPipPip
  • 3385 posts

Posted 26 April 2019 - 05:34 AM

Out of curiosity, I did write down one of the people on list - like write down on paper. So I exited NameClean and went to the person in RM. I had to go to the person by name rather than RIN because the list of hits does not include the RIN number

 

I agree with Jerry with regard to the functional gap in regards to the need to often reference the Edit Person screen and I can also identify with the need for pen and paper which we all hope disappears in RM8.

 

Apart from the ability to mark as Not a Problem I still maintain a degree of learning needs to be incorporated in the Rules of NameClean and PlaceClean. Selecting improper capitalization NameClean, from memory, deals with James Mcconnell properly by suggesting the surname capitalization James McConnell but I remember from another run this morning it does not apply the same "Mc" rule when the "Mc" is in a given name so reporting James McConnell Brown as a problem suggesting James Mcconnell Brown as a replacement, presently I have no way to remove these entries from subsequent runs except to follow Rootsmagics false corrections.

 

I know some people have great confidence that release features will be followed up with refinement and improvement but this has not been my experience. If DataClean was part of RM6 and still in Beta I would have hoped it would have been refined by the time of public release in RM7 so I cannot share that faith as I presently see DataClean as another unfinished symphony. I believed DataClean was a valuable feature and tool when released and have taken time to report on anomalies from the time of release concluded with this detailed post being 12 months ago but there have been no improvements I have noticed or seen reported.

 

Personally I find it very disappointing that so many features with great promise fall into disuse by the mass of users simply because they have tried them found various limitations and moved on. So many users don't use Shared Events or Place Details as they don't translate well to other programs, turn off County Check explorer as the help is often not welcome and now DataClean is brushed aside due to the many anomalies, false positives and lack of learning.

 

It does make one wonder why one takes time to try and contribute towards helping Rootsmagic succeed.


“Your most unhappy customers are your greatest source of learning.” -Bill Gates

It's now time for discretion, trust, patience and support

 

User of Rootsmagic 7.5.9, Family Historian 6.2.7, Family Tree Maker 2014 & Legacy 7.5

 

Excel to Gedcom conversion - simple getting started tutorials here

 

Root