#16127 – broken UTF-8 handling while trimming field length

Bug 16127 - broken UTF-8 handling while trimming field length

Summary: broken UTF-8 handling while trimming field length

Status:	CLOSED FIXED

Alias:	None

Product:	Infrastructure
Classification:	Infrastructure
Component:	bugzilla.altlinux.org (show other bugs)
Version:	unspecified
Hardware:	all Linux

Importance:	P2 enhancement
Assignee:	Mikhail Gusarov
QA Contact:	Mikhail Gusarov

URL:	https://bugzilla.altlinux.org/buglist...
Keywords:

Depends on:	16711
Blocks:
	Show dependency tree

Reported:	2008-06-21 13:23 MSD by Michael Shigorin
Modified:	2009-01-22 06:51 MSK (History)
CC List:	1 user (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Michael Shigorin 2008-06-21 13:23:11 MSD

Seems like the field shortening (used at least in buglists) is a bit naїve about multibyte characters and uses byte counts.  This results in Cyrillic strings being cut too early (even if Chinese would get even less hieroglyphs):

udev depends on udev_static-addon instead of udev_static
не все правила отрабатывают пр�...

Second one would also get its last character damaged by being cut in two bytes.

In a perfect world, there might be no sense to cut things at all; but closer to reality, they cut strings preferably on whitespace/punctuation boundaries.

Comment 1 Mikhail Gusarov 2008-07-02 23:54:38 MSD

Yes, Bugzilla does a simple substr() on bytestrings. D'oh.

I can invent a quick hack for the our, UTF-8, Bugzilla, but making it suitable for upstream means a lot of work (essentially converting all the internals from the bytestrings to the Unicode strings :)

Comment 2 Vitaly Fedrushkov 2008-12-02 12:26:20 MSK

https://bugzilla.mozilla.org/show_bug.cgi?id=363153 fixed in 3.2

Comment 3 Mikhail Gusarov 2009-01-22 06:51:52 MSK

Yep, fixed in 3.2.