Jump to content


Photo

Sources cleaning


  • Please log in to reply
6 replies to this topic

#1 Kamolga

Kamolga

    Advanced Member

  • Members
  • PipPipPip
  • 69 posts

Posted 12 June 2019 - 06:24 AM

Hi,

 

After importing a gedcom from myheritage, I have about 200 master sources (10k citations) which are websites of MyHeritage.

It is very annoying because when I want to cite a source, I have all those useless master sources to go through.

I would like to use MyHeritage master source and use the name of the website in reference / page number. The problem if I merge the sources is that I lose their content (I need to move part of the master text into reference or Research notes). 

That is why I use SQLite to change the sourceID 1,2,3,4,etc with 150 (the one of MyHeritage), put the website in reference, change quality, etc. Unfortunately, I can not find a way to update the details column, which is much more visual and is included in the footnote -> basically what we would enter manually in Source Details-Page Number on the first tab 'Citation'.

https://ibb.co/VDDdDHF

Do I miss something?


Rootsmagic 7.5.9.0 with a lot of SQL queries (SQLiteSpy) and a bit of Family Historian 6.2 (tree view and map)


#2 Jerry Bryan

Jerry Bryan

    Advanced Member

  • Members
  • PipPipPip
  • 3535 posts

Posted 12 June 2019 - 07:07 AM

That data is in the CItationTable rather than in the SourceTable.

 

Jerry



#3 Kamolga

Kamolga

    Advanced Member

  • Members
  • PipPipPip
  • 69 posts

Posted 12 June 2019 - 04:08 PM

It should be since it is different for each citation (SourceTable being the master source list that can be cited). In CitationTable there are Comments, ActualText and RefNumber that would be suitable but none of those field would shows what is written in that space (so we can not write there either)


Rootsmagic 7.5.9.0 with a lot of SQL queries (SQLiteSpy) and a bit of Family Historian 6.2 (tree view and map)


#4 TomH

TomH

    Advanced Member

  • Members
  • PipPipPip
  • 6218 posts

Posted 12 June 2019 - 07:31 PM

It is within xml data in the Fields column of the CitationTable. Iirc, between the tags <Page> and </Page>.

Tom user of RM7550 FTM2017 Ancestry.ca FamilySearch.org FindMyPast.com
SQLite_Tools_For_Roots_Magic_in_PR_Celti wiki, exploiting the database in special ways >>> RMtrix-tiny.png app, a bundle of RootsMagic utilities.


#5 Jerry Bryan

Jerry Bryan

    Advanced Member

  • Members
  • PipPipPip
  • 3535 posts

Posted 12 June 2019 - 09:33 PM

To follow up on Tom's comment, it's between the <page> and </page> tags in the XML in the Fields column because you are probably using the Free Form template and "page" is a field in the Free Form template. More generally, if you are using RM's built-in source templates or source templates of your own design, any variable you define in the source template becomes a <"variable name"> and </"variable name"> tag pair in the XML in the Fields column. Some templates other than the Free Form template may also have a "page" variable somewhere, so I could be wrong about the "page" variable coming from the Free Form template.

 

I'm probably missing something obvious, but I always find it awkward to to try to parse RM's Fields data with the SQLite string functions that are available.

 

Jerry



#6 Kamolga

Kamolga

    Advanced Member

  • Members
  • PipPipPip
  • 69 posts

Posted 14 June 2019 - 07:49 AM

Thanks to both of you!!!

 

When I use this

 --View Details reference of the citation
 DROP VIEW IF EXISTS vDetails;
 CREATE TEMP VIEW vDetails AS
 SELECT  C.CitationID
        ,S.Name COLLATE NOCASE AS SourceName
        ,SUBSTR(CAST(C.FIELDS AS TEXT),86, LENGTH(CAST(C.FIELDS AS TEXT))-118) AS Details
  FROM CitationTable C 
  LEFT JOIN SourceTable S 
  USING (SourceID)
  WHERE SourceName IS NOT 'GEDCOM file imported on 22 May 2019'
 ;
 Select * from vDetails

I finally see a 3 columns table with CitationID, Name of source and the 3rd column I called 'Details' where I have this reference I could not find. :) 

Note that I only use free forms and I had to exclude 'GEDCOM file imported on 22 May 2019' source (which was created by RM on importation at my demand) because it was not empty (and should be).

I guess if I used RM Source templates, the query would vary a little (like 68 instead of 86, 100 instead of 116), or I would exclude those templates with WHERE S.TemplateID=0 in my first stage of cleaning

 

From that view I thought "I would like to change the detail reference from CitationID  3 to 'MyNewReference' " and gave it a shot

UPDATE CitationTable
 SET Fields=REPLACE(CAST(FIELDS AS TEXT),SUBSTR(CAST(FIELDS AS TEXT),86, LENGTH(CAST(FIELDS AS TEXT))-118),'MyNewReference')
 WHERE CitationID=3

It worked! Thanks again! :)

 

 


Rootsmagic 7.5.9.0 with a lot of SQL queries (SQLiteSpy) and a bit of Family Historian 6.2 (tree view and map)


#7 Jerry Bryan

Jerry Bryan

    Advanced Member

  • Members
  • PipPipPip
  • 3535 posts

Posted 14 June 2019 - 08:58 AM

Your use of SUBSTR to pseudo-parse the Fields column very creative! I find that any time I have to deal with the Fields column the pseudo-parsing of the XML data to be quite an adventure and nearly always involves SUBSTR one way or another. I wonder if there is a function available somewhere for SQLite that would parse the XML for you.

 

RM's use of XML in the Fields column comes about because Source Templates require the definition of variable names that are not actual column names in the database tables. For a lot of reasons, I suspect that RM's use of XML for the Fields column is cast in stone. Nevertheless, this is one of the very few areas of the RM database design where I wish the developers had done it just a little differently. Basically, the need is to store and use ordered pairs of the form (keyword,value) or keyword=value or something like that. RM's XML does it as <field> <name>keyword</name> <value>value</value> </field> which is a mess to parse unless you have an XML parser handy.

 

The parsing can be avoided entirely if instead you have (probably in another table) a couple of columns called something like Keyword and Value (or since these are Source Templates) something like VariableName and Value. Such a table would have to have lots and lots of rows, which may be the reason the developers didn't go that way. And there would have to be  two such tables, one for the Fields in the CitationTable and one for the Fields in the SourceTable, or else if there were only one such table it would have to have an OwnerType to be able to determine whether to JOIN with the CitationTable or the SourceTable. This other table would be called something like the KeywordTable and would have something like a KeywordID for each row as the ID to JOIN with the CitationTable and the SourceTable. I know it sounds complicated, but I think it would be about 10,000,000 times simpler than parsing the XML. I have seen several commercial products that set their relational databases up in the fashion which I'm describing.

 

Jerry