Jump to content


Photo

RM7 Publisher PDF's

Publisher and PDFs

  • Please log in to reply
10 replies to this topic

#1 Rick Landrum

Rick Landrum

    Advanced Member

  • Members
  • PipPipPip
  • 498 posts

Posted 07 April 2020 - 10:46 AM

As far as I can tell, you can not include a PDF in a publisher book. You also can not include them in scrap books. I have many attachments in my tree as PDF files. Is there a way to bring these into a Publisher book? Add them to a scrap book?

 

I know you can convert the PDF to a JPEG, and then replace the file in RM7 with the JPEG, but I'm hoping there is a less tedious solution.

 

Thanks

Rick


RickL


#2 Jerry Bryan

Jerry Bryan

    Advanced Member

  • Members
  • PipPipPip
  • 3978 posts

Posted 07 April 2020 - 05:02 PM

No part of RM supports PDF's except as type file - not as type image.

That means that you can link PDF's into RM any place you can link any other file (including images), and you can see them yourself from within RM because RM will open them for you to see with your system's default PDF viewer such as Adobe Acrobat. But RM will not print the PDF's in narrative reports or in Publisher or as a part of a photo gallery or on a Web page created by RM. As I said, no part of RM supports PDF's except as type file.

 

As a practical matter, I don't capture images as PDF's if I'm the one doing the capturing. And if I have no choice than to have a PDF, I will convert it so something else. I am standardizing on PNG format rather than JPG format because JPG format is a lossy format and PNG format is not. I understand that there are lossless JPG editors, but I don't yet understand such editors enough or trust them enough to commit all my images to the idea of lossless JPG editors.

The one sort of exception is that if what I have to start with is a PDF, I will often keep it around in the same folder as the exported PNG but I will link the PNG into RM and not the PDF. Also, you can sometimes copy and paste text out of a PDF and you cannot copy and paste text out of a PNG or JPG. But even if I copy and paste text out of a PDF and keep the PDF around for safekeeping, it's still the PNG export that I link into RM.

It's worth remembering sometimes that formats such as JPG, PNG, and GIF, and TIF are image formats, but PDF is a text format. PDF is a very rich text format and a PDF file can and often does include embedded images. That doesn't make PDF into an image format, even when there is no text in the PDF file and all it contains is a single embedded image. This produces the illusion that PDF is an image format, but it isn't.

None of this means that RM8 would never support PDF's in the same manner in which it supports image formats. It's not impossible, and maybe RM8 will do so (or RM9 or RM10 or some such?).

 

Jerry

 



#3 TomH

TomH

    Advanced Member

  • Members
  • PipPipPip
  • 6444 posts

Posted 07 April 2020 - 09:51 PM

Not really into the technicalities but I'm wondering if it's accurate to say PDF is a text format. Document format, yes, which includes text, vector graphics and raster images. And the latter may be stored in a stream that is compliant or compatible with JPEG or with PNG or with TIFF. That doesn't solve anything. The combining of a PDF with a RM Book currently has to be done outside of RM in something like Acrobat or maybe Word. 


Tom user of RM7630 FTM2017 Ancestry.ca FamilySearch.org FindMyPast.com
SQLite_Tools_For_Roots_Magic_in_PR_Celti wiki, exploiting the database in special ways >>> RMtrix-tiny.png app, a bundle of RootsMagic utilities.


#4 Renee Zamora

Renee Zamora

    Advanced Member

  • Admin
  • PipPipPip
  • 8780 posts

Posted 08 April 2020 - 07:12 AM

You can add blank pages to the Publisher, and after generating the book edit it to insert the PDF pages.


Renee
RootsMagic

#5 zhangrau

zhangrau

    Advanced Member

  • Members
  • PipPipPip
  • 1591 posts

Posted 08 April 2020 - 07:27 AM

I'd add this to Renee's suggestion. Insert TEXT pages, with appropriate titles, so that those titles will be included in the Table of Contents.



#6 Jerry Bryan

Jerry Bryan

    Advanced Member

  • Members
  • PipPipPip
  • 3978 posts

Posted 08 April 2020 - 07:34 AM

Not really into the technicalities but I'm wondering if it's accurate to say PDF is a text format.

 

I suppose it depends on what you mean by text format. Let's put it this way. You can open a PDF file with Notepad and see and interpret all the formatting codes. They are standard ASCII characters (well, probably UTF-8 characters these days). You can also see visually where there are embedded images because the embedded images look like a stream of Greek in Notepad. The "stream of Greek" is actually pure binary stuff which encodes the images. In the earliest days of PDF files, techie people would sometimes create simple PDF files by hand using text editors like Notepad, although Notepad itself did not yet exist. I was such a person who created PDF files by hand. You obviously couldn't create embedded images with a simple text editor. But you could create the text and control portions of a PDF file by hand using a simple text editor.

By contrast, an HTML file is also a text format. You can open an HTML file with Notepad and see and interpret all the formatting codes. But there are no embedded images. Instead, there are links to the images. Therefore, you will see no "streams of Greek" if you open an HTML file with Notepad.

Other examples abound, and I suppose there are some gray areas about what is and what is not a text file. You can open a JPG file or a PNG file or a TIF file or a GIF file with Notepad or any simple text editor. You will see a "stream of Greek" and no meaningful text because the encoding is all binary. A really smart text editor might be able to interpret some of the formatting codes for you, or at least show you the "stream of Greek" in hex. But what you have is a bunch of binary codes which encode the pixels or the vectors and there is no text.

 

Where you might get into the gray areas is with some rich text file formats. An RTF file which works fine with Microsoft Word is perfectly legible with Notepad because all text and all the formatting codes are ASCII/UTF-8 characters. All the formatting codes make sense to a human reader. Except that an RTF file is like a PDF file in the ability of having embedded images. These embedded images look like a "stream of Greek" in Notepad because they are binary codes which encode the pixels and the vectors.

 

DOC and DOCX files (native Microsoft Word files) are a little tricky. I definitely think of them as text files, albeit as rich text files. DOC files are an older Microsoft Word format. The text is legible in Notepad, surely because the text is ASCII/UTF-8. But the formatting codes are binary and look like "streams of Greek" in Notepad. DOCX files are the current Microsoft Word format. The entirety of DOCX files look like streams of Greek in Notepad. I suspect that's because the text in DOCX files are in a richer version of UNICODE than UTF-8 and Notepad only supports ASCII/UTF-8. UTF-8 gets you English and most Western European Language, but UTF-8 does not get you things like Greek (real Greek!), Russian, Arabic, Hebrew, Chinese, Japanese, etc. So I think modern Microsoft Word is using a more advanced version of UNICODE than UTF-8. I suspect that a text editor supporting the more advanced version of UNICODE would allow you to see the text in DOCX files, although you still would not be able to see the binary formatting codes.

 

Jerry



#7 Rick Landrum

Rick Landrum

    Advanced Member

  • Members
  • PipPipPip
  • 498 posts

Posted 08 April 2020 - 12:33 PM

Thanks everybody for the insights.
 
I have really two issues related to this problem. First, early on before I started using Publisher, I was using PDF's for some image files thinking it was easier to download files from various sources as PDF's. This was partly because the PDF's were single files that could be several pages in length. Saving the images as individual image files made the task of loading them to my tree take much longer. The PDF's worked fine with the RM7 database, and a viewer could enlarge the image to see or read it better.
 
Knowing what I do now, it seems fairly obvious that I should not use PDF files for images going forward. Secondly, the question of what to do about all the PDF image files that I have already loaded to my tree is the challenge. First I tried converting the PDF to other file types, such as JPEG, and then replace the file in my tree with the new file. Of course this works but, it would take a huge amount of time to convert all the files needing to be changed. Secondly, using Rene's hint, I tried inserting a blank page in the Publisher Book, and then saving the book as a PDF file. I then open the book using Adobe PDF Pro and use document/insert to place the PDF image file pages into the book after the blank page. Lastly, I use PDF Pro to delete the blank page, and then resave the book. Depending on how many image PDF's are needed to be added to the Publisher book, the process can be repeated multiple times before saving and closing the book file.
 
This process works very well, and is reasonably fast, without having to convert files or make changes in my RM7 tree.
 
Thanks again,
Rick

RickL


#8 Bob C

Bob C

    Advanced Member

  • Members
  • PipPipPip
  • 269 posts

Posted 08 April 2020 - 01:32 PM

FYI Irfanview a free photo editor has a batch processor that will convert .pdf to .tiff



#9 Jerry Bryan

Jerry Bryan

    Advanced Member

  • Members
  • PipPipPip
  • 3978 posts

Posted 08 April 2020 - 01:53 PM

The whole question of multi-page documents is an interesting challenge. One of the virtues of the PDF format is that can act as sort of a container literally to "contain" multiple pages. This can be a delightful convenience because it is generally speaking easy to manipulate the multiple pages in the single PDF file because the single PDF file is - well it is single file from the point of view of your computer's file system. Having multiple pages in a single PDF file can be delightful until and unless the PDF file contains hundreds of pages each containing lots of images, at which point the single PDF file becomes so large as to be totally unwieldy and you might begin to wish that the images were not in a single large container.

It's not quite the same concept, but TIF files also have this ability to be a container to "contain" multiple images. We usually speak of multi-image TIF files instead of speaking as if a TIF file were a single container for multiple images. But it's the same concept with a different name.

 

For a long time, I was so enamored of this concept that I used multi-image TIF files a great deal with RM. A typical example would be for modern court house marriage records, where one "marriage record" might consist of three images - the marriage application, the marriage license, and the marriage return. So I would put all three images into a single TIF file, effectively containerizing the three images which collectively constituted one marriage record. I would link the TIF file into RM and be done. Happy, happy, happy.

Except that I eventually realized that there were two major problems with my plan. One problem is that Web browsers would not support TIF files and I wanted to start placing some of my data in the Web, including images. The other problem is that I came to realize that most software that supports a lot of different image formats including TIF will only show you the first of the multiple images. Such software will not even give you a warning that there are other images in the file that you are not seeing. So I decided to abandon the TIF format. Doing so has been a huge amount of work. You can't just replace one TIF file with one JPG file or with one PNG file. Doing that wouldn't be so bad. Instead you have to replace one TIF file with three JPG files or three PNG files and get them all linked into RM properly. Doing so is very labor intensive, and I rue the day I decided that TIF files were a great idea.

But the moral of the story is that I do understand the attraction of PDF files as containers for multiple images. As long as the number of images in the PDF container is relatively modest, it's a great tool. Except when it isn't, as when RM doesn't support it for Publisher. For Publisher, I think Renee's suggestion for inserting a blank page in the Publisher book and then adding the PDF images after the fact is probably a good idea. For the Web, I still don't think PDF is a good idea. I used to say (correctly, I think) that most Web browsers didn't support PDF files. I think most or maybe all Web browsers do now support PDF files in some manner.  But the way browsers support PDF files still doesn't quite meet my needs. For example, with JPG and PNG files, I can publish my data online with Gedsite and the image files work really well. There is a medium size thumbnail that the user of my page can click on to see a full size of the image. But with PDF files, there is no media size thumbnail nor any size thumbnail for PDF files. All that appears on my Web page is a link without much clue as to what the link is. So I'm still trying to stay away from PDF's as much as possible.

 

Jerry



#10 Rick Landrum

Rick Landrum

    Advanced Member

  • Members
  • PipPipPip
  • 498 posts

Posted 08 April 2020 - 02:21 PM

The whole question of multi-page documents is an interesting challenge. One of the virtues of the PDF format is that can act as sort of a container literally to "contain" multiple pages. This can be a delightful convenience because it is generally speaking easy to manipulate the multiple pages in the single PDF file because the single PDF file is - well it is single file from the point of view of your computer's file system. Having multiple pages in a single PDF file can be delightful until and unless the PDF file contains hundreds of pages each containing lots of images, at which point the single PDF file becomes so large as to be totally unwieldy and you might begin to wish that the images were not in a single large container.

It's not quite the same concept, but TIF files also have this ability to be a container to "contain" multiple images. We usually speak of multi-image TIF files instead of speaking as if a TIF file were a single container for multiple images. But it's the same concept with a different name.

 

For a long time, I was so enamored of this concept that I used multi-image TIF files a great deal with RM. A typical example would be for modern court house marriage records, where one "marriage record" might consist of three images - the marriage application, the marriage license, and the marriage return. So I would put all three images into a single TIF file, effectively containerizing the three images which collectively constituted one marriage record. I would link the TIF file into RM and be done. Happy, happy, happy.

Except that I eventually realized that there were two major problems with my plan. One problem is that Web browsers would not support TIF files and I wanted to start placing some of my data in the Web, including images. The other problem is that I came to realize that most software that supports a lot of different image formats including TIF will only show you the first of the multiple images. Such software will not even give you a warning that there are other images in the file that you are not seeing. So I decided to abandon the TIF format. Doing so has been a huge amount of work. You can't just replace one TIF file with one JPG file or with one PNG file. Doing that wouldn't be so bad. Instead you have to replace one TIF file with three JPG files or three JPG files and get them all linked into RM properly. Doing so is very labor intensive, and I rue the day I decided that TIF files were a great idea.

But the moral of the story is that I do understand the attraction of PDF files as containers for multiple images. As long as the number of images in the PDF container is relatively modest, it's a great tool. Except when it isn't, as when RM doesn't support it for Publisher. For Publisher, I think Renee's suggestion for inserting a blank page in the Publisher book and then adding the PDF images after the fact is probably a good idea. For the Web, I still don't think PDF is a good idea. I used to say (correctly, I think) that most Web browsers didn't support PDF files. I think most or maybe all Web browsers do now support PDF files in some manner.  But the way browsers support PDF files still doesn't quite meet my needs. For example, with JPG and PNG files, I can publish my data online with Gedsite and the image files work really well. There is a medium size thumbnail that the user of my page can click on to see a full size of the image. But with PDF files, there is no media size thumbnail nor any size thumbnail for PDF files. All that appears on my Web page is a link without much clue as to what the link is. So I'm still trying to stay away from PDF's as much as possible.

 

Jerry

Jerry,

The "container concept" is exactly what I was after by loading PDF's instead of individual image files. However, as I said, I did not anticipate the issues with Publisher and scrap books that I'm now in the middle of. I have yet to try and publish on a website, so I am pretty uninformed regarding the issues and pitfalls that might be waiting on me there. Anyway, adding PDF's to my "publisher" books using Rene's suggestion seems to be the best solution for me now. My method of distributing books to family members will be as a PDF saved on a USB drive.

Thanks for the input

Rick


RickL


#11 Rick Landrum

Rick Landrum

    Advanced Member

  • Members
  • PipPipPip
  • 498 posts

Posted 10 April 2020 - 06:46 AM

You can add blank pages to the Publisher, and after generating the book edit it to insert the PDF pages.

Thanks Renee,

I have tested your suggestion, as well as Zhangrau's comments, and  I have found it does work for what I was trying to accomplish. One drawback that I did find is that any edits to the Text page in the completed book will disappear if you need to update and re-save the book. The changes would have to be recreated each time the book is updated. However, given that no tools currently exist in RM7 Publisher to handle PDF files, handling PDF images with after the fact changes to the book seems to be much better than converting the PDF to say a Jpeg in the RM7 data base. Depending on the size of the PDF (number of pages etc.) that approach can really take a lot of time.

Thanks again for your help.

Rick

 

RickL