Need help sorting out how to use 23? Or have you discovered one of those nasty bugs?

Accented characters in tags munged on upload

clvrmnky   March 02, 2009, 05:18 AM

I've mentioned this before in the past, but I didn't want to drag up an old message thread.

I'm still noticing that after uploading my photos that already have tags with accented characters in them will have those accented characters munged in various ways. I can correct the tags by hand during upload, but the problem is that any existing tags with certain characters in them will

For example, I've just uploaded a test photo (http://www.23hq.com/clvrmnky/photo/3997546) which I tagged with some of the following keywords in Adobe Lightroom and then exported to JPEG: "Français, café, l'hôtel, naïve."

According to "Get Info" in OS X, these keywords are present (and correct) in the JPEG IPTC header. Likewise when I use a thirdparty IPTC/EXIF viewer. Once I upload the photo, however, the tags look like this in the upload form before saving:

"naã¯ve l'hã´tel cafã© franã§ais testshots testshot tests test na•ve l'h™tel cafŽ cafe butterfly "wings of paradise butterfly conservatory" "test shots" franais" (I'm not sure how these chars will end up displaying in this interface, but you get the idea.)

Note how the individual tags with extended chars are partially duplicated with the chars removed, as well as weird munged characters in the other.

Once I save the photo (without making any changes to the tags in the upload form) these munged tags will be saved to the photo in 23.

I'm not sure what to do. I'm pretty sure Lightroom is not doing anything special, and the IPTC fields are supposed to support such characters. Other applications and the OS seem to understand the contents of the JPEG header. I'm hoping that this can be remedied somehow.

Now, I tried uploading the same photo to Some Other Photo Sharing Site, and I get very similar results. Among the partially duplicated keywords, though, it actually displays the correct ones.

So, something is going on here I can't figure out, and I admit it might be the format of the JPEG. I'd just like to know what, exactly.

I know that I18n stuff can be subtle to get right, so if you need me to provide more test data or explain myself better (I'm pretty tired right now -- up late studying for a mid-term.)

Does anyone else see this?

Thanks.

 
clvrmnky   March 02, 2009, 05:28 AM

I should have done a little research before posting this.

I have to verify this, but it might be an issue with IPTC, or the way Lightroom writes out the IPTC data. Again, I need to confirm, but I'm reading that it is not stored properly as (some sort of) Unicode, and the various tag-cleaning processes that all photo sites use to normalize the data simply can't make much sense of the binary soup they see.

At least, this is what I'm reading.

Apparently the right data is in the image XMP block, though I generally don't flush XMP data out to the image as a change metadata [I don't need it, usually], so a.) I'm not sure it's there and b.) I have no idea if this actually helps, since the uploader would have to be told to use the XMP data upon upload.

I'll try to report back as I learn more, or perhaps someone who has done this already can school me on this.

 
Steffen Tiedemann Christensen Team 23   March 02, 2009, 06:27 PM

Thank you for the very detailed description -- please let us know what you find out. We'd love to have another leg up on that SOPSS you mention ;-)

It may help for you to know that we're using exiftool to read EXIF and IPTC.

 
clvrmnky   March 03, 2009, 03:02 AM

THAT'S the tool I was looking for. Thanks.

I have an idea what is going on. Lightroom is rendering JPEGs with the keywords in three sections: "Keywords", "Subject" and "Hierarchical Subject".

It is "Keywords" that is totally broken, containing all sorts of Unicode garbage. This explains why I see so many duplicates when I upload here (and That Other Site.)

This must be something I can adjust in Lightroom, since it gives you total control over metadata.

I'll follow up once I know more -- I'm chatting with the smart folks over at the Adobe user forums right now. At least I know /why/ this is happening. Just not how to fix it.

At any rate, it appears that 23 is just trying to make sense of the Unicode garbage that somehow made its way into my images upon upload, and is doing nothing wrong.

[Later]

I've raised this as a defect with Adobe. For some reason some small subset of their users have a problem with 8-bit ASCII being inserted into IPTC/EXIF metadata fields.

 
clvrmnky   April 27, 2009, 03:57 AM

An update. What an ordeal trying to sort this out.

This problem looks related to IPTC fields, missing "Coded Character Set" field, and the fact that Lightroom may default to MacRoman for character data on a Mac. It appears that the improperly encoded IPTC:Keyword data is not understood by any application, especially if the Coded Character Set field is not set.

Since no app or web site can know what char set the data is in, it probably guesses UTF-8 or Latin-1, which is (of course) totally wrong.

This is a posted bug in Lightroom, which should write these out in UTF-8 and/or set the Code Character Set (at the very least) and/or give the user some control over how potentially dangerous data is stuffed into EXIF/IPTC/XMP. (I vote for ignoring IPTC altogether since it is rather outdated.)

A workaround is to postprocess the images with ExifTool to just remove the IPTC:Keywords filed completely. In fact, I just remove the entire IPTC block with a Lightroom export plugin (http://regex.info/blog/lightroom-goodies/metadata-wrangler) since most of that data is duplicated in the XMP block.

Here are some references:

- http://regex.info/blog/lightroom-goodies/flickr (search for "accented characters")
- http://www.lightroomforums.net/showthread.php?t=2313
- http://www.flickr.com/help/forum/30559/?search=lightroom

 
To participate in this conversation, you'll need to join the group




About 23

About 23
What is 23 and who's behind the service?
Just In
Discover the world from a different angle.
Here's a crop of the latest photos from the around the world.
Search
Search photos from users using 23
Help / Discussion
Get help or share your ideas to make 23 better
23 Blog / 23 on Twitter
Messages and observations from Team 23
Terms of use
What can 23 be used for and what isn't allowed
More services from 23
We also help people use photo sharing in their professional lives
RSS Feed
Subscribe to these photos in an RSS reader
  • Basque (ES)
  • Bulgarian (BG)
  • Chinese (CN)
  • Chinese (TW)
  • Danish (DK)
  • Dutch (NL)
  • English (US)
  • French (FR)
  • Galician (ES)
  • German (DE)
  • Italian (IT)
  • Norwegian (NO)
  • Polish (PL)
  • Portuguese (PT)
  • Russian (RU)
  • Spanish (ES)
  • Swedish (SE)

Popular photos right now

See also