Find Everywhere Search
Posted 25 November 2012 - 01:07 PM
Posted 25 November 2012 - 01:34 PM
I have a lot of Photos attached to my RM database and when I do a find everywhere search it takes a long time (4 min and 10 sec). The last 4 min was searching Multimedia. If you do not need the multimedia searched once the search gets to the multimedia cancel and the information found up to that point will be displayed.
My comment probably belongs more under Wish List than under Tips and Hints, but I will post here for now.
I love Find Everywhere and I think it's a great new feature. My only complaint is that it's almost too good. Which is to say, it sometimes finds too much. I'm not so worried about how long it takes as I am about how many matches there can be. The list can easily be so long as to be extremely unwieldy and therefore almost unusable. So I almost want to wish for some filtering cababilities. But one of the huge advantages of Find Everywhere is its great power in finding things combined with the extreme ease with which a search can be set up. I'm afraid that if extensive filtering cababilities were to be included, it would no longer be easy to use.
I do realize that some fairly extensive Boolean logic can be set up to "sort of filter" the search to keep the results set manageable. So by filtering, I'm talking about maybe finding only within people or finding only within families, etc. And within people, maybe being able to find only within a Named Group or within a specified color coding or something like that. Etc.
Maybe something like the following would be useful. Maybe it could give you a list of how many people matches there are, now many family matches there are, how many events matches there are, etc., and then you could click any one or more of the summaries to expand and see the matches for that category. This way, the results set wouldn't actually be any smaller, but it might be easier to navigate.
I don't really don't know what to suggest to deal with enormously huge (but valid) results sets from Find Everywhere searches. I'm afraid that if I wish for something, I might get it.
In any case, aside from the ease of use and the great power of the search, the really best thing about Find Everywhere is that it produces what some users in these forums have called a "live report". That means that you can click on an item in the results set and go directly to edit whatever needs to be edited. You don't have to save the results to a txt file or an rtf file or anything like that and then go back to RM to do your editing. The fact that Find Everywhere produces a live report is a huge, huge, huge improvement. I hope it portends more live reports in the future.
Posted 27 November 2012 - 04:47 AM
The fact that Find Everywhere produces a live report is a huge, huge, huge improvement. I hope it portends more live reports in the future.
Ditto... - and good tip from Bill
"Never, for the sake of peace and quiet, deny your own experience or convictions"
Current user of Rootsmagic version 126.96.36.199, Family Tree Maker 2014 and Legacy 7.5 on Win 10
Posted 27 November 2012 - 09:23 AM
Posted 28 November 2012 - 02:28 AM
Posted 28 November 2012 - 05:23 AM
Bill, how many photos are attached? It's crazy that 80% of the search time is spent on Multimedia. Something's wrong.
It's likely... a "little bit" of *this* below ;-)
I have over 10,000 photos attached to my RM database.
There's probably some unnecessary extra processing time being used up to read through 10,000 thumbnail BLOBs as part of accessing column fields in each row that are beyond the BLOB column. I've just had a look at RM's database schema for the first time and see that the MultimediaTable schema currently places the column for Thumbnail BLOBs (for this particular problem) squarely in the middle of the table's other columns. I suspect this is SUB-suboptimal:
When I was designing the SQLite file format, I made the assumption
that BLOBs would be used infrequently and would not be very big.
The file format design is not optimized for storing BLOBs. Indeed,
BLOBs are stored as a singly-linked list of database pages. It
is hard to imagine a more inefficient design.
Much to my surprise, people begin putting multi-megabyte BLOBs
in SQLite databases and reporting that performance really was not
an issue. I have lately taken up this practice myself and routinely
uses SQLite database with BLOBs that are over 10MiB is size. And
it all seems to work pretty well here on my Linux workstation. I
have no explanation for why it works so well, but it does so I'm not
going to complain.
If your images are only 30KiB, you should have no problems.
Here's a hint though - make the BLOB columns the last column in
your tables. Or even store the BLOBs in a separate table which
only has two columns: an integer primary key and the blob itself,
and then access the BLOB content using a join if you need to.
If you put various small integer fields after the BLOB, then
SQLite has to scan through the entire BLOB content (following
the linked list of disk pages) to get to the integer fields at
the end, and that definitely can slow you down.
When it comes to non-trivial sized blobs, here are some best practises:
- Store as the last column in a table, or in a separate table. This is so that SQLite doesn't have to skip over the blob in order to get other values in a row (last column) and so that a read will get multiple rows at once (separate table)
- Make sure your queries explicitly mention which columns are wanted otherwise you'll get the blob every time which consumes more memory and CPU
- Bump up the page size of your database. The default of 1kb isn't that good a fit
- While SQLite will always work with blobs (up to 2GB each), you can also consider storing larger ones as files and then store the filename within SQLite
- Use the incremental blob api for random access to a blob's contents
- Use zeroblob on row creation to ensure that space is allocated for a blob correctly. You may even add extra padding if items grow later so that the row doesn't have to be rewritten on updating blob contents
- A vacuum every now and then after periods of writes will help
--- "GENEALOGY, n. An account of one's descent from an ancestor who did not particularly care to trace his own." - Ambrose Bierce
--- "The trouble ain't what people don't know, it's what they know that ain't so." - Josh Billings
K e V i N
Posted 28 November 2012 - 10:15 AM
As for the effect of the thumbnail images on the speed of searching other fields in the same table, there's nothing the user can do about it except for the last statement: "A vacuum every now and then..." - that's SQLite parlance for the RM Database Tools > Compact Database function.
2012-12-08: the 100 minute duration for Rebuild Indexes must have been an anomaly, perhaps due to some background task (backup?) interfering severely with RM's disk operations. Recent runs have clocked in around 140 seconds. Likewise, there must have been some impact on the SQLiteSpy ops; my Update Birth/Death Years query hit 25 seconds instead of 4 minutes. Strangely, though, SQLite REINDEX is pretty much the same as it was when the long numbers for the other ops were recorded.
Edited by TomH, 08 December 2012 - 03:40 PM.
Posted 28 November 2012 - 04:31 PM
This was clearly taking forever so I did a SQLiteSpy SELECT for a match on the Description field which is a text column 'to the right' of the Thumbnail BLOB column. This search executes in milliseconds.
The very few BLOBS and the speed of SQLiteSpy suggest to me that the RM procedure for Find Everywhere is poorly designed for speed in working through Multimedia.
Additionally, cancelling Find Everywhere before it finishes leaves the RM SQLite engine with unfinished SQL statements such that, at the very least, the Compact Database tool cannot execute, returning SQLite Error 1 - cannot VACUUM - SQL statements in progress. This condition should affect other procedures which will return a similar message. It is necessary to close the database and reopen it to clear the condition.
Whether 100, 600 or 10,000 images, there is something really wrong with Find Everywhere. Maybe it has something to do with the number of links. On another database with 168 media items and only 177 links, it finishes the search quite smartly.
Posted 30 November 2012 - 09:26 AM
Posted 08 December 2012 - 04:14 PM
However, I see another issue with handling large result sets. I think it hits a limit and can go no further but there is no message. On a large database of 162,000 people, it completes a search for a string that contains a surname for which there are 133 people in the database in about 7 minutes. But a similar search for one with 54,000 people in the database did not get out of the People part of the search after several hours. It would be helpful that there be a warning message that the result set has hit the limit - please try to narrow your search.
The other good news is that Cancel does show the partial result set and now closes RM's SQLite connection to the database so that RootsMagic can proceed with other database operations without itself being closed and reopened.