Jump to content


Photo

DataClean Beta Feature

dataclean

  • Please log in to reply
73 replies to this topic

#21 Allen Prunty

Allen Prunty

    Advanced Member

  • Members
  • PipPipPip
  • 95 posts

Posted 04 May 2013 - 07:57 AM

Renee...

I love the name clean feature... this is a good data scrubber. The place cleaner causes duplicate places.

Allen

#22 Vyger

Vyger

    Advanced Member

  • Members
  • PipPipPip
  • 3546 posts

Posted 04 May 2013 - 08:13 AM

I love the name clean feature... this is a good data scrubber. The place cleaner causes duplicate places.


Allen, I think the more accurate description is "results in duplicate places", that's where an automerge should kick in IMO

Keeping ones customers and their important views at a distance is never a good approach

 

User of Family Historian 7.0, Rootsmagic 7.6.3

 

Excel to Gedcom conversion - simple getting started tutorials here

 

Root


#23 Renee Zamora

Renee Zamora

    Advanced Member

  • Admin
  • PipPipPip
  • 8768 posts

Posted 06 May 2013 - 08:42 AM

Confirming enhancement request is in our tracking system.
Renee
RootsMagic

#24 Jerry Bryan

Jerry Bryan

    Advanced Member

  • Members
  • PipPipPip
  • 3974 posts

Posted 06 May 2013 - 04:31 PM

I ran into sort of a new glitch in the name clean feature which is really just a variation on theme I've mentioned before. And by "glitch", I do not mean "bug". It's just a usability improvement that's needed to make this tool more valuable.

I've reported previously that I've been spending a lot of time with the misplaced nickname option of the name clean feature. And I've reported before that there needs to be a way to remember where I was and resume, else I have to keep skipping the same names over and over again. Or else there needs to be a way to skip a name "permanently" so I don't keep having to skip it over and over again. But almost more than that, it seems like I need a way to limit the individuals I'm working on at the moment to a Named Group or a color coded collection of individuals or something like that. Which is to say, I need to be able to prioritize the individuals I'm cleaning - like to clean a particular couple of thousand out of the 60,000 in my database or something like that. The ones being prioritized might perhaps be the individuals in a report for an upcoming family reunion.

The example at hand is that I was printing a narrative report that included James Carl (Carl) Doe, and some of the fact sentences were reporting his name as James rather than as Carl. So I was looking at possible bugs in the Sentence Templates, etc. But the solution was much simpler (and embarrassing) than that. I thought I had cleaned all the names in this particular report by hand long before the Name Clean Beta arrived. But I had missed one name that needed to be cleaned up (at least one, I should say). I still had the given name as "James Carl (Carl)" without the quotes. The misplaced nicknames option would have fixed this, but there's no way to direct the misplaced nicknames option at just the names that are most important to me right now.

Jerry

#25 Jerry Bryan

Jerry Bryan

    Advanced Member

  • Members
  • PipPipPip
  • 3974 posts

Posted 06 May 2013 - 04:35 PM

Which is to say, I need to be able to prioritize the individuals I'm cleaning .......


By the way, a really slick way to do this would be to put a Named Group into People View and then to have an option where Name Clean only works on the individuals in the view. Of course, I've advocated that when a Named Group is in People View, there should be a much more global option whereby everything that RM does applies only to those individuals in the view.

Jerry

#26 Renee Zamora

Renee Zamora

    Advanced Member

  • Admin
  • PipPipPip
  • 8768 posts

Posted 07 May 2013 - 09:21 AM

Confirming enhancement requests are in our tracking system.
Renee
RootsMagic

#27 Jerry Bryan

Jerry Bryan

    Advanced Member

  • Members
  • PipPipPip
  • 3974 posts

Posted 07 May 2013 - 05:00 PM

The Date Last Edited field for individuals whose data was updated by the Name Clean feature is not updated. It shoud be.

Jerry

#28 Renee Zamora

Renee Zamora

    Advanced Member

  • Admin
  • PipPipPip
  • 8768 posts

Posted 08 May 2013 - 08:53 AM

Confirming issues noted in our tracking system.
Renee
RootsMagic

#29 JimWalton

JimWalton

    Advanced Member

  • Members
  • PipPipPip
  • 34 posts

Posted 14 May 2013 - 02:20 PM

It was earlier suggested that there should be a not-a-problem option. I second this. I use 5 underscores to indicate a given or surname is unknown. This way it appears properly in a report. The name clean doesn't like that and there is no way to tell it to ignore all those names.

#30 Jerry Bryan

Jerry Bryan

    Advanced Member

  • Members
  • PipPipPip
  • 3974 posts

Posted 15 May 2013 - 10:08 AM

It was earlier suggested that there should be a not-a-problem option. I second this. I use 5 underscores to indicate a given or surname is unknown. This way it appears properly in a report. The name clean doesn't like that and there is no way to tell it to ignore all those names.


This is a really interesting suggestion. I was thinking of "not-a-problem" working on a person by person basis. But this suggestion is really much more global and useful than something that would work on a person by person basis.

I realize that some of the things that are being mentioned as issues with Name Clean may sound like trifles and whining, but with many thousands of people in a database then some of these trifles can easily render the Name Clean feature to be pretty worthless.

Jerry

#31 zhangrau

zhangrau

    Advanced Member

  • Members
  • PipPipPip
  • 1588 posts

Posted 16 May 2013 - 04:54 PM

When using Tools | Merge | Duplicate Search Merge, there is a separate *.DUP file maintained to keep track of the USER'S decision that a pair of individuals should not be merged, and then they are never offered again by RM as a merge recommendation. A similar technique could be used by the Name Clean feature to allow the USER to have significant control over how the Name Clean feature operates over multiple instances.

#32 Vyger

Vyger

    Advanced Member

  • Members
  • PipPipPip
  • 3546 posts

Posted 17 May 2013 - 04:31 AM

When using Tools | Merge | Duplicate Search Merge, there is a separate *.DUP file maintained to keep track of the USER'S decision that a pair of individuals should not be merged, and then they are never offered again by RM as a merge recommendation. A similar technique could be used by the Name Clean feature to allow the USER to have significant control over how the Name Clean feature operates over multiple instances.


Agreed, there needs to be some form of learning and resuming with this feature, and I would imagine there would be in the release version.

Keeping ones customers and their important views at a distance is never a good approach

 

User of Family Historian 7.0, Rootsmagic 7.6.3

 

Excel to Gedcom conversion - simple getting started tutorials here

 

Root


#33 fredandrana

fredandrana

    Member

  • Members
  • PipPip
  • 28 posts

Posted 29 June 2013 - 08:38 PM

Perhaps I am posting this in the wrong place, but a way to delete duplicate facts would be nice, such as duplicate birth entries or death entries. Especially for after you run the dataclean.

#34 Vyger

Vyger

    Advanced Member

  • Members
  • PipPipPip
  • 3546 posts

Posted 30 June 2013 - 12:46 PM

Perhaps I am posting this in the wrong place, but a way to delete duplicate facts would be nice, such as duplicate birth entries or death entries. Especially for after you run the dataclean.


I would say you are posting in the right place as that is a very big dataclean issue and one which has come up many times over the years.

The one problem which is recognized is that truly identical in respect of sources, media, notes, dates etc are very rare so IMO some suggestive prompt to show what RM thinks are duplicates would be good and then let the user decide.

Under Lists > Fact Type > Print the drop down does allow you to print a list of those with more than one instance of the selected fact, personally I would like to see this either expanded to a find criteria or produce a hyperlinked report like the Find Everywhere report.

Keeping ones customers and their important views at a distance is never a good approach

 

User of Family Historian 7.0, Rootsmagic 7.6.3

 

Excel to Gedcom conversion - simple getting started tutorials here

 

Root


#35 Jerry Bryan

Jerry Bryan

    Advanced Member

  • Members
  • PipPipPip
  • 3974 posts

Posted 30 June 2013 - 01:56 PM

The one problem which is recognized is that truly identical in respect of sources, media, notes, dates etc are very rare so IMO some suggestive prompt to show what RM thinks are duplicates would be good and then let the user decide.


Totally agree. It's hard to imagine any scenario under which RM could be expected to decide reliably which of two duplicate facts to keep. What RM needs to do instead of some sort of smart fact merging is to facilitate user decision making about duplicate facts.

The one exception to this idea is where you export some number of your people from RM to some software or process outside of RM, do some updating there, and then import the people back into RM. In this case you should have the option of the having the newly imported people replacing the people already in your database. The newly imported people are after all people who started out by being in your database. And this would only be an option. The older merge option would still exist - hopefully reinforced with some RM faciliated way to resolve issues with duplicate facts.

Jerry

#36 fredandrana

fredandrana

    Member

  • Members
  • PipPip
  • 28 posts

Posted 30 June 2013 - 07:29 PM

Vyger....thank you very much! That little tip helps considerably!

As for the duplicate facts, I think it would be beneficial and fairly simple to implement a search that pops up duplicate facts, such as multiple birth facts for one person. And then let the user choose which one to keep. For an example, I merged my gencom with one I had created on another site. If the places or dates were different, it saved a birth fact for each instance of difference. When I ran PlaceClean, several of those became identical. So, I had


Bohannon, Alvah
birth September 11, 1916
birth September 11, 1916 Alburg, Grand Isle, Vermont, United States
birth September 11, 1916 Alburg, Grand Isle, Vermont, United States

I would dearly love to see an option that would allow me to either delete identical duplicate records, or pop up all matching fact records and allow me to either delete or merge into a single record. Maybe something similar to the people merge.

#37 Vyger

Vyger

    Advanced Member

  • Members
  • PipPipPip
  • 3546 posts

Posted 06 July 2013 - 07:12 AM

I produced a mockup several years ago for an enhanced DSM screen which was more informative regarding sources, notes and media and with check boxes which allowed facts to be dropped or included at that point. My experience has shown me that many of my duplicate facts have resulted from merging where this lack of important decision making information just made me air on the side of caution and being lazy or trying to speed on did not edit the merged individual at the time.

Apart from DSM ff RM had a facility to highlight suspected duplicate facts some similar UI would work well.

Posted Image

Keeping ones customers and their important views at a distance is never a good approach

 

User of Family Historian 7.0, Rootsmagic 7.6.3

 

Excel to Gedcom conversion - simple getting started tutorials here

 

Root


#38 Renee Zamora

Renee Zamora

    Advanced Member

  • Admin
  • PipPipPip
  • 8768 posts

Posted 11 July 2013 - 11:43 AM

I find that a lot of duplicate facts happen while merging because the place list was not cleaned before the merge. Even if you have two places with the exact name in your database the merge will look at them as two separate places. You need to merge your duplicate places.

I also want a tool in DataClean to automatically merge those duplicates for me. But, in the mean time on a recent project helping someone merge 19 PAF files into one RM database I used DataClean to clean up the places. Then I dragged n dropped the database into a new blank database and all the duplicate places merged for me. Since they didn't have any RM specific items used in the database this was a easy fix. I would just make sure that any GEDCOM I wanted to import into RM was cleaned up as much as possible before doing any importing or merging into my main RM database.
Renee
RootsMagic

#39 Vyger

Vyger

    Advanced Member

  • Members
  • PipPipPip
  • 3546 posts

Posted 11 July 2013 - 03:07 PM

Renee, there is still an issue with Sources becoming duplicated and even though identical they will not merge within RM. I know this as I now have 5 instances of a Master Source which was created through a programmed solution in a gedcom file and each instance is the same.

Anything which the Rootsmagician can pull out to prevent such duplication in the future and perhaps score Places on close match terms to highlight them where maybe only punctuation is a difference would be big step towards data cleaning needs.

You know I have big hopes in this area as whilst RM has a good set of Place related tools reconciling and managing Places and Place Details within RM is far from pleasurable and easy.

Keeping ones customers and their important views at a distance is never a good approach

 

User of Family Historian 7.0, Rootsmagic 7.6.3

 

Excel to Gedcom conversion - simple getting started tutorials here

 

Root


#40 Renee Zamora

Renee Zamora

    Advanced Member

  • Admin
  • PipPipPip
  • 8768 posts

Posted 12 July 2013 - 09:45 AM

I have found duplicate sources, but mine have spaces or comma's that cause them to not merge during Automatic Merge SourceMerge. I know in the tracking system we do have it noted that sources are duplicated. So I am watching for them, just haven't found it totally true in the PAF files to RM database I am working on.
Renee
RootsMagic