Jump to content


Photo

Logic error in Select People by Data Groups


  • Please log in to reply
10 replies to this topic

#1 JimWalton

JimWalton

    Advanced Member

  • Members
  • PipPipPip
  • 34 posts

Posted 04 June 2013 - 12:11 PM

I created a custom report and got some bizarre output, so I created a group that was supposed to contain just the people I wanted. Her is my logic statement:
Marriage Place contains California AND Marriage Date is after 1948 AND Marriage Date is before 1960.
This should only output people who were married in California from 1949 through 1959. I got a list of 21 names, 11 of whom failed at least one of the logic statements, and 4 failed two.
My first person was married in 1916 in Nebraska. There is nothing in her record that associates marriage with California, and while 1916 is certainly before 1960, it fails the AND portion that it must also be after 1948. Five of the items failed the California test and 10 failed the date test. I found others that passed but were not on the list.

#2 c24m48

c24m48

    Advanced Member

  • Members
  • PipPipPip
  • 2612 posts

Posted 04 June 2013 - 12:22 PM

I created a custom report and got some bizarre output, so I created a group that was supposed to contain just the people I wanted. Her is my logic statement:
Marriage Place contains California AND Marriage Date is after 1948 AND Marriage Date is before 1960.
This should only output people who were married in California from 1949 through 1959. I got a list of 21 names, 11 of whom failed at least one of the logic statements, and 4 failed two.
My first person was married in 1916 in Nebraska. There is nothing in her record that associates marriage with California, and while 1916 is certainly before 1960, it fails the AND portion that it must also be after 1948. Five of the items failed the California test and 10 failed the date test. I found others that passed but were not on the list.


People with multiple marriage events can end up doing strange things to your search results, but you indicate that this isn't the problem in this particular case. Are you sure you were creating a new group as opposed to editing an existing group which already contained a few people?

One thing you might try (althought is should be totally unnecessary) would be to do an "unmark all" at the begining of your selection process for group membership before you do your marking for California and after 1948 and before 1960.

Another thing to look at is that when you are doing your "mark" operation, it will tell you how many people it marked. Does the number of people it says it marked correspond to the number of people in the group?

Also, did you actually create a Named Group, or did you do your "marking" totally from within the dialog to set up the custom report? The process of marking and unmarking is the same, and you don't have to make a Named Group if you are willing to do your selection from within the dialog to setup the custom report. But if you actually created a Named Group then it's easy to look at all the individuals who are in the Named Group to see who they were. It sounds like you may be figuring out who the people were based on the output from your custom report rather than by looking at an actual Named Group.

Jerry

#3 TomH

TomH

    Advanced Member

  • Members
  • PipPipPip
  • 5464 posts

Posted 04 June 2013 - 01:17 PM

Beware that RootsMagic does not do Boolean logic on the event but on all the criteria, e.g., if there are three Marriage events and one is in "California" with no date, another is anywhere but California and before 1948 and another is anywhere but California and after 1960, the combined criteria have been satisfied. The first event satisfies only the first criterion, the second event passes only the third criterion, and the third event scores on only the second - in RM Explorer's perverse logic, the Person passes the test because all the ANDed criteria have been satisfied by one or more events. Only if there were a single Marriage event would RM's reasoning correspond to yours!

The more criteria you have, the more weird the results can get. You could reduce your event date criteria from two to one by using a date range, e.g., "Between 1948 and 1960". This does not preclude the result though that a person with a California Marriage outside the range and a non-California Marriage inside the range will pass the test but the result set should be smaller - the example in the previous paragraph will fail because the date criterion is failed by all three events.

However, your description suggests something else is also contributing to the unexpected results - i.e., returning Persons on the list with no Marriage event in California. Your first criterion should have failed. Did you actually use the string "California" in your filter or was it "CA"?

Tom user of RM7230 FTM2017 Ancestry.ca FamilySearch.org FindMyPast.com
SQLite_Tools_For_Roots_Magic_in_PR_Celti wiki, exploiting the database in special ways >>> Rmtrix_tiny.png app, a growing bundle of RootsMagic utilities.


#4 Laura

Laura

    Advanced Member

  • Members
  • PipPipPip
  • 4276 posts

Posted 04 June 2013 - 01:17 PM

I used Texas as I don't have many California marriages in those date ranges.

Mark
Marriage exists is True
AND Marriage place contains Texas

Unmark
Marriage date is blank
OR Marriage date is before 1948
OR Marriage date is after 1960

I could also add AND Sex is Female/AND Sex is Male to the Mark criteria so I would not have both spouses in the Group.

I only had one person who married more than once. One marriage was in Texas in 1951 and the other marriage had no date or place. The person was in the Group because they had a marriage with a date and place which fit the search criteria.

I don't quite know how to explain how I set up search filters. but I think about what I want.

In this case, I wanted Marriages in Texas so I used Mark to get that.

I didn't want marriages with no date or marriages before 1948 or after 1960 so I used Unmark to fine tune the search.

#5 JimWalton

JimWalton

    Advanced Member

  • Members
  • PipPipPip
  • 34 posts

Posted 06 June 2013 - 10:09 AM

I've been doing computer database work since the 1970s when I was teaching Boolean math and logic, so my logic should work. The only explanation that makes sense is that the RootsMagic Boolean search does not look at a single event, but lumps all events of the same type together as was explained by TomH. Even this, however, does not answer the questions.

Here again is my search criteria: Marriage after 1948 AND before 1960 AND in California. I always spell out the full state name.

My grandmother was married in 1916 in Nebraska, then again in 1967 in California. 1916 is not after 1948, but it is before 1960. However, since this is an AND fuinction, both must be true so the 1916 date fails. 1967 fails for the same reason. California is the only criteria that passes. The logic says that if any one item fails, the total statement fails. Unfortunately there is not an option to search on dates between two dates as TomH suggested, so I couldn't test that, but I see no reason why the results would be any different. True, the first date was before 1960, and the second date was after 1948, but that is the function of the AND statement. It must meet both criteria or it fails.

The logic should do this:
1916 Marriage after 1948, failed; before 1960, passed; NOT California, failed. Statement fails.
1961 Marriage after 1948, passed; before 1960, failed; California, passed. Statement fails.

The other issue is the customized report. My grandmother appears on the report with her 1916 Nebraska marriage, but not her 1961 California marriage. My uncle was married in 1945, place unknown. His second marriage was in 1961 in California. Since neither marriage meets the date criteria, he should not be on the list at all. But on the report, his first marriage appears with no location, But the second does not appear at all.

There is a problem in the implementation. The logic just does not work. If RM is not going to follow the rules of Boolean logic, why do they use the Boolean operators in their search function?

#6 TomH

TomH

    Advanced Member

  • Members
  • PipPipPip
  • 5464 posts

Posted 06 June 2013 - 11:03 AM

Jim, the way I explained how RM's perverse operands in its Boolean logic works does account for the results you get:
1 marriage event after 1948 TRUE
AND
1 marriage event before 1960 TRUE
AND
1 marriage event in California TRUE
= TRUE
The only way the outcome can be false is for there to be only one Marriage event that is before 1948 or after 1960 AND or not in California
OR, for multiple marriage events, none of them in California or, if one or more is, all of them before 1948 or all of them after 1960.

Yes, you can search between two dates in a single criteria using ...event > Date > Equals > "Between 1948 and 1960". With "equals", you can use any of the valid date formats with modifiers such as after | before | by | from ... to ... | ... - ... , etc. I see an odd result with the "or" modifier; it seems to be equivalent to "-" in this use. So change your criteria to:
Marriage event > Place > contains > "California"
AND
Marriage event > Date > equals >"Between 1948 and 1960"
and you will improve your result set but it won't eliminate the exceptions I described earlier. For that, we need an enhancement previously requested to nest Booleans on the same event, the suggestion being to add a selection "Same criterion" in the dropdown list of criteria types that would be enabled for selection when there is an immediately preceding criterion. The choices for this "Same criterion" would match those of the preceding criterion, which itself could be a "Same criterion".

Edited by TomH, 06 June 2013 - 12:38 PM.

Tom user of RM7230 FTM2017 Ancestry.ca FamilySearch.org FindMyPast.com
SQLite_Tools_For_Roots_Magic_in_PR_Celti wiki, exploiting the database in special ways >>> Rmtrix_tiny.png app, a growing bundle of RootsMagic utilities.


#7 TomH

TomH

    Advanced Member

  • Members
  • PipPipPip
  • 5464 posts

Posted 06 June 2013 - 11:15 AM

Laura's approach also fails to produce the desired result on persons with multiple marriages. Suppose a person has a marriage event that does fit the desired result, i.e., between 1948 and 1960 in California and a second marriage event anywhere outside the desired time range. That person will be Marked and then Unmarked and not show up in the result set when they should. Jim's initial criteria which use Mark only resulted in undesired inclusions; Laura's Mark/Unmark procedure leads to false exclusions. Both would be solved with the "Same criterion" enhancement.

Tom user of RM7230 FTM2017 Ancestry.ca FamilySearch.org FindMyPast.com
SQLite_Tools_For_Roots_Magic_in_PR_Celti wiki, exploiting the database in special ways >>> Rmtrix_tiny.png app, a growing bundle of RootsMagic utilities.


#8 c24m48

c24m48

    Advanced Member

  • Members
  • PipPipPip
  • 2612 posts

Posted 06 June 2013 - 12:14 PM

Tom's explanation is right on, and I'm therefore a little loathe to explain further. But that usually doesn't stop me. :)

Truly there is a serious design problem in RM's implementation of the Boolean logic (actually, more than one design problem) in RM Explorer. The problem that's causing the strange search results in this case is the design problem associated with multiple facts of the same type. The fact type is Marriage in your case, but the same problem afflicts any fact type.

I thought I would try to create a bare minimum example that exemplifies the design problem.
  • Suppose you have a person with one marriage that was in 1948 in California. Suppose you search for Marriage Date is before 1950 AND Marriage Place contains California. The search will find your person, as well it should. So far, so good.
  • Suppose you have a person with one marriage that was in 1952 in California. Suppose you search for Marriage Date is before 1950 AND Marriage Place contains California. The search will not find your person, as well it should not. Still so far, so good.
  • Suppose you have a person with two marriages. This is where the design problem can manifest itself. Suppose the first marriage was in 1945 in Texas and suppose the second marriage was in 1952 in California. Suppose you search for Marriage Date is before 1950 AND Marriage Place contains California. You are not expecting the search to find the person, but it does. That's because there was a marriage before 1950 (namely the one in 1945 in Texas) AND there was marriage in California (namely the one in 1952 in California). So the "Boolean" aspect of the logic is actually correct. It's the "design" aspect of the logic that's wrong because each piece of the AND found a different marriage fact.
Here's yet another way to look at the nature of the design problem. First, let's take an even simpler example. Let's just say that you search for Marriage Date is before 1950 and don't even worry about the Marriage Place. The truth is (and this is very subtle), you are not searching for a marriage even though you think you are. You are actually searching for a person. Namely, you are searching for a person who has at least one marriage fact which has a marriage date before 1950. This may seem like a distinction without a difference. But there actually is huge difference.

So let's restate our original example with an AND into what it really means. As originally stated, the search was for Marriage Date is before 1950 AND Marriage Place contains California. But the search that really happens is the following: search for a person who has at least one Marriage Fact which has a Marriage Date before 1950 AND where the same person also has at least one Marriage Fact which has a Marriage Place which contains California.

I don't know if that makes it more clear or not. Sometimes, extra explanation can make things more confusing rather than less confusing.

From a technical database point of view, the problem is that the underlying query is searching a table in RM called the Person Table. To take facts into account such as Marriage Date and Marriage Place, the Person Table has to be joined to another table called the Event Table which contains things like births, marriages, deaths, etc. But the fact that the tables are joined does not change the fact that the query is searching the Person Table. To do the search the way some of us might like to see it work, the underlying query would have to search the Event Table rather than the Person Table. The Event Table does have to be joined to the Person Table so that the proper person can be displayed after a search is completed, but the query really does need to search in the Event Table rather than in the Person Table. That's one of the several reasons that those of us who write SQLite queries like the SQLite queries so much. We have the freedom and control to search the correct tables that will give us the most meaningful results.

Jerry

#9 TomH

TomH

    Advanced Member

  • Members
  • PipPipPip
  • 5464 posts

Posted 06 June 2013 - 12:42 PM

Thankfully, Jerry usually explains things better than I do. Also I happened to discover an error in my explanation, corrected above in post #6.

Tom user of RM7230 FTM2017 Ancestry.ca FamilySearch.org FindMyPast.com
SQLite_Tools_For_Roots_Magic_in_PR_Celti wiki, exploiting the database in special ways >>> Rmtrix_tiny.png app, a growing bundle of RootsMagic utilities.


#10 JimWalton

JimWalton

    Advanced Member

  • Members
  • PipPipPip
  • 34 posts

Posted 06 June 2013 - 02:45 PM

Thanks. That does make a perverse sort of sense, but it certainly isn't right. I did try Tom's idea of between and did get better results. Thanks for that tip, Tom. The custom report still only reports the first marriage which may or may not be the right one, but at least it is closer than it was. Twenty years ago I could write SQL statement in my sleep, today I fall asleep trying to remember how I did it. But I shouldn't have to resort to that with a commercial program. But then when a company creates its own brand of SQL I have to wonder. Maybe TomH should go to work for RM and teach them how to do it right. :-) My main question is why should tools like his even be necessary?

Anyway, off my soapbox, I've got a work-around (a phrase that means when the manufacturer can't do it right find another way to do it) that gives me a more reasonable approach, so I'll content myself with that because I don't have another choice. Thanks again for the help.

#11 TomH

TomH

    Advanced Member

  • Members
  • PipPipPip
  • 5464 posts

Posted 06 June 2013 - 03:54 PM

To get multiple marriages listed, not just the first, use Reports > Lists > Fact List > People with this fact type > Marriage; People to include > Select from a list OR the Name of the Group if you have already created one. It won't tell you directly who the spouse was but you could save it to a text file, import it into a spreadsheet, and filter on one date at a time or sort by date and place to find the likely partner.

In a way, RM's logic does make some sense for certain kinds of searches (an example escapes me) by being more inclusive (or more exclusive, depending on whether you Mark or Unmark) than what we think the conventional logic should be.

BTW, RM has not invented its own brand of SQL; it uses the open-source SQLite within a Delphi development platform. It's also possible that the search is not even a SQLite query but rather a high level procedure on a memory table populated by a simpler SQLite query.

Tom user of RM7230 FTM2017 Ancestry.ca FamilySearch.org FindMyPast.com
SQLite_Tools_For_Roots_Magic_in_PR_Celti wiki, exploiting the database in special ways >>> Rmtrix_tiny.png app, a growing bundle of RootsMagic utilities.