Issue with non-Latin characters in metadata
|
February 20, 2011, 09:47 PM
|
|
Hi,
There are non-Latin characters in the captions of my pictures, stored as metadata. Unfortunately, they don't appear correctly when I upload the pictures.
I have tried to store the caption in the XMP segment, the IPTC segment (in Caption-Abstract) and the EXIF segment (in Description). In all three cases, the caption is detected when I upload the picture, but non-Latin characters never show correctly. However, there isn't any problem when I read the metadata with exiftool on my computer (which runs Mac OS 10.5.8).
I am aware of other threads where the metadata where incorrectly written by another software programme. In my case, they were written with a script using exiftool, and the encoding is specified at UTF8 (in IPTC:CodedCharacterSet).
Thanks in advance for your help.
|
|
|
February 21, 2011, 09:36 AM
|
|
You should be aware of the following from the exiftool FAQ:
Most textual information in EXIF is stored in ASCII format, and ExifTool does not convert these tags. However it is not uncommon for applications to write UTF‑8 or other encodings where ASCII is expected, and ExifTool will quite happily read/write any encoding without conversion. For a few EXIF tags (UserComment, GPSProcessingMethod and GPSAreaInformation) the stored text may be encoded either in ASCII, Unicode (UCS-2) or JIS.
If your editing application writes UTF-8 data in fields where ASCII is not expected, exiftool will *not* convert these to ASCII where it is expected as per the EXIF 2.3 standard. As such using exiftool is no guarantee for correctly formatted exif-data.
Also keep in mind, that the default preference for reading metadata usually are EXIF, IPTC and then XMP (I have no idea in which order 23hq handles these), so a wrongfully encoded exifdata will usually show even though there are correctly encoded IPTC or XMP.
|
|
|
Team 23
February 22, 2011, 05:18 PM
|
|
Hey guys -- and thank you for the reply, Henrik.
We are actually using exittool on our end to extract the data as well, so it might be simple a matter of ensuring some encoding stuff in our code to solve this problem. Pierre-Jean, can you send an original file exhibiting the problem to my email at steffen@23hq.com?
Thanks,
Steffen
|
|
|
February 22, 2011, 10:54 PM
|
|
Steffen, Henrik,
Thanks a lot for your replies. Steffen, I've just sent you an e-mail.
By the way, I find the picture metadata quite confusing, with lots of duplicates between the segments. Do you have any recommendations about the fields to use, in particular to ensure future compatibility?
Cheers,
Pierre-Jean
|
|
|
Team 23
February 27, 2011, 01:42 PM
|
|
Pierre-Jean,
Exif and utf-8 is apparent a pretty tricky proposition -- but we've made a few changes on our end that should means that non-latin chars from EXIF are displayed correctly after upload.
|
|
|
February 28, 2011, 01:00 PM
|
|
Steffen,
That works perfectly now with my pictures (caption stored under IPTC:Caption-Abstract, with IPTC:CodedCharacterSet set to UTF8).
Thanks a lot for your quick reaction!
Cheers,
Pierre-Jean
|
|
|