Jump to content


Photo

Mass deletion or splitting a huge file in half

RM7 data file editing

  • Please log in to reply
7 replies to this topic

#1 jim.everhart@gmail.com

jim.everhart@gmail.com

    New Member

  • Members
  • Pip
  • 3 posts

Posted 28 May 2019 - 09:49 AM

I am working with a 1.2 million name data file that I need to either delete 300,000 names or split into 2 files. Is there a way to filter entries a date period? IE delete all prior to say a birth date of 1700. Or to prune the ancestors of a chosen person. The creator of this file died 5 years ago. It was donated to my Historical Society and the pre-1700 records are not needed by us. Manually choosing people and deleting the families one at a time is painfully slow any suggestions?



#2 John_of_Ross_County

John_of_Ross_County

    Advanced Member

  • Members
  • PipPipPip
  • 661 posts

Posted 28 May 2019 - 10:51 AM

I would guess that there are many disconnected trees.  Does the large size make the program  slow?  What if you delete individuals at the 1700 time frame or unlink them , leaving older generations in place.  There would be a gap for those after 1700.   Then do a Gedcom export tree-by-tree of the remaining trees later than 1700.

 

Just an idea, have not tried it.  Try the procedure on a copy of your file.



#3 Jerry Bryan

Jerry Bryan

    Advanced Member

  • Members
  • PipPipPip
  • 3521 posts

Posted 28 May 2019 - 11:05 AM

By design and policy, RM does not support any sort of mass delete. What you can do instead is a mass copy.

 

Make a new and empty database. Into that database, drag and drop everyone whose birth date is 1700 or after. That should about do it. The original database will remain unchanged.

 

I would do a considerable practice with this before doing it for real. I'm leaving out a few of the details on purpose, because of the need to practice. For example, the best place to start is probably to create a group of everyone who was born in 1700 or after. I would put this group into RM's People View, sorted by birth date, and be sure it is what you want.

 

Just as important would be to double check who is not in the group. This is doable, but is probably not quite as easy as you might like. I would do something color code everyone red who is in the 1700 or after group. You can easily clear the color coding when you are done. Then make a second group of everyone who is not red. You can't make a group of everyone who is not in the first group, but you can make a group of everyone who is not red. Having made the second group, put them into RM's People View, sorted by birth date, and be sure nobody is being left behind that you want to keep.

 

Having gotten this straight, clear the color and drag and drop your first group into the new and empty database.

 

Jerry



#4 jim.everhart@gmail.com

jim.everhart@gmail.com

    New Member

  • Members
  • Pip
  • 3 posts

Posted 28 May 2019 - 01:32 PM

Thanks The color coding is excellent Idea. I stumbled across a webinar on splitting a large database. My problem is the former genealogist went way overboard in English and French royalty in his own personal tree that has less than 1/10 of 1 % to do with the families of our members. We are trying to make our online tree more relevant for our members. Yes speed is a issue, The extra fluff is killing us. His family enters our area about 70 years ago. His original work has 1035 pages of problems according to Roots Magic. I never met the man, all I know is he used PAF-5 . Roots seems to be handling this file with no problems. Both FTM and Legacy balk when it hits 900,000 people.



#5 jim.everhart@gmail.com

jim.everhart@gmail.com

    New Member

  • Members
  • Pip
  • 3 posts

Posted 28 May 2019 - 04:00 PM

I tried creating a group and included all in database. Then tried to remove all born before 1700. I get 9 and there are tons more? choices were born less than 1700?

 

   



#6 Jerry Bryan

Jerry Bryan

    Advanced Member

  • Members
  • PipPipPip
  • 3521 posts

Posted 28 May 2019 - 04:52 PM

Without seeing your database, I can only guess the reasons for what you are seeing. Perhaps the people born before 1700 don't have dates that RM recognizes as valid? Perhaps the people born before 1700 are using a birth fact that is different than the standard RM birth fact or perhaps they don't have a birth fact at all? For example, you might run into somebody that was married in 1650 but who has no birth fact. You won't find such people by searching for a birth fact.

 

To simplify all these possible complications and convolutions, try reversing the logic. For the first group, make your selection criterion be Birth > Date > is after > 1699. This should only pick up people who have a birth fact, whose birth date is valid,  AND whose birth date is after 1699.

 

For the second group (which is the "opposite" group), please do follow the very explicit instructions about color coding the first group RED and making the second group be the NOT RED group. (Any other color would do as well. It doesn't have to be red.)

 

The problem with making the first group the "after 1700" group and the second group the "before 1700" group using the date selection for both groups is that these conditions may not be quite as opposite as it might seem. For example, if you have anybody has a birth date that RM does not recognizes as valid or if anybody does not have a birth fact at all, they will not be picked up as being either "after 1700" or "before1700". The second group needs truly to be the opposite of the first group, and the only way I know really to be sure of this is via the mediation of the color coding trick. You can't make the second group be the opposite of the first group directly. But you can color code directly from a group and having done so you can create a group from the color coding. That way, you can get two groups that are truly opposite, and everybody will be in one group or the other but not both.

 

The problems you are running into are also why you need to practice a bit with the procedure before the actual drag and drop. Problems like invalid dates or missing birth facts will show up much better the way I'm describing than if you do the date testing as a part of the drag and drop. Also, it's really easy to delete your groups and clear your color coding and to start over again without doing any damage at all to your database.

 

Jerry



#7 TomH

TomH

    Advanced Member

  • Members
  • PipPipPip
  • 6213 posts

Posted 28 May 2019 - 08:57 PM

I wonder if an adaptation of my SQLite script Living Flag – Set Globally might help in this process. One might (mis-)use the Living Flag temporarily to divide the people into two groups. Currently, it sets the flag to false for everyone born 105 years before today's year along with any ancestor or spouse of said person. That parameter could be changed to 319 years. It also sets it to false for anyone with the Death fact - that could be bypassed. There are also criteria for dates of other types of events that might be modified. Just a thought...


Tom user of RM7550 FTM2017 Ancestry.ca FamilySearch.org FindMyPast.com
SQLite_Tools_For_Roots_Magic_in_PR_Celti wiki, exploiting the database in special ways >>> RMtrix-tiny.png app, a bundle of RootsMagic utilities.


#8 John_of_Ross_County

John_of_Ross_County

    Advanced Member

  • Members
  • PipPipPip
  • 661 posts

Posted 29 May 2019 - 06:53 PM

Try the RM option "Count trees".  

 

This will tell you how many disconnected trees are in your file and how many individuals are in each tree.

 

If you can identify which disconnected tree [or trees] relate to the current families of your members, then you could extract just these individual trees as separate databases.

 

Hopefully, these smaller files would be manageable.

 

On the other hand, there might be so many cross connections, that this would not help.