Bug 16127 - broken UTF-8 handling while trimming field length
Summary: broken UTF-8 handling while trimming field length
Alias: None
Product: Infrastructure
Classification: Infrastructure
Component: bugzilla.altlinux.org (show other bugs)
Version: unspecified
Hardware: all Linux
: P2 enhancement
Assignee: Mikhail Gusarov
QA Contact: Mikhail Gusarov
URL: https://bugzilla.altlinux.org/buglist...
Depends on: 16711
  Show dependency tree
Reported: 2008-06-21 13:23 MSD by Michael Shigorin
Modified: 2009-01-22 06:51 MSK (History)
1 user (show)

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description Michael Shigorin 2008-06-21 13:23:11 MSD
Seems like the field shortening (used at least in buglists) is a bit naїve about multibyte characters and uses byte counts.  This results in Cyrillic strings being cut too early (even if Chinese would get even less hieroglyphs):

udev depends on udev_static-addon instead of udev_static
не все правила отрабатывают пр�...

Second one would also get its last character damaged by being cut in two bytes.

In a perfect world, there might be no sense to cut things at all; but closer to reality, they cut strings preferably on whitespace/punctuation boundaries.
Comment 1 Mikhail Gusarov 2008-07-02 23:54:38 MSD
Yes, Bugzilla does a simple substr() on bytestrings. D'oh.

I can invent a quick hack for the our, UTF-8, Bugzilla, but making it suitable for upstream means a lot of work (essentially converting all the internals from the bytestrings to the Unicode strings :)
Comment 2 Vitaly Fedrushkov 2008-12-02 12:26:20 MSK
https://bugzilla.mozilla.org/show_bug.cgi?id=363153 fixed in 3.2
Comment 3 Mikhail Gusarov 2009-01-22 06:51:52 MSK
Yep, fixed in 3.2.