I'm glad you mentioned the caption issue. I have a bad habit of ignoring it when I'm trying to describe how RM's media tags work. But I don't use RM's media captions, so it's sort of out of sight and out of mind for me with respect to RM's media captions.
I used to make heavy use of RM's support of captions for media files. But I stopped doing so after a particular RM upgrade. I can't find in the history of RM when the change was made. Prior to the change, each time a media file was used it could have a different caption. After the change, the media file itself could only have one caption and the one caption applied everywhere the media file was used. I had lots of things like photos of brothers where one of the captions would be be something like John Doe (left) and his brother William and the other caption would be something like William Doe (right) and his brother John. Putting aside the wisdom of using captions in that manner (and it probably wasn't wise), it was quite annoying to lose hundreds of captions and that the loss was totally outside of my control. And there wasn't anything in the announcement of the RM upgrade that it was going to cost me hundreds of captions. So I resolved never to use RM's captions again. Fool me once, shame on you. Fool me twice, shame on me.
Another approach to captions that's surely better at least philosophically is to store captions in the IPTC data that can be associated with image files. That way, the caption stays with the file. But RM doesn't support IPTC data, and my experience with image files and IPTC data is that such data is easy to become lost when you are editing image files. Also, it's my experience (with which not everybody agrees) that depending on what graphics software you are using, IPTC data can be deeply buried and hard or impossible to find. It's simply not front and center the way I would prefer - sort of like the clickiness problem in the RM7 user interface. Also, I prefer the PNG format to JPG because PNG is lossless and JPG is lossy, but PNG does not support IPTC. And I prefer the PNG format to TIF/TIFF because TIF/TIFF is not supported by Web browsers.
Therefore, I add captions to photographs by editing them, adding white space, and captioning the photo in the white space. It's sort of like writing on the back of printed photographs, except it's like adding white space to printed photographs and writing on the front of the photograph in the added white space. Not everybody likes my approach, but it's what works the best for me. By the way, when I edit photographs in this manner, I always keep the original around completely unmodified.
Jerry