#978580 xgettext: should provide an option to ignore some directories

Package:
gettext
Source:
gettext
Description:
GNU Internationalization utilities
Submitter:
Lucas Nussbaum
Date:
2024-06-24 17:30:04 UTC
Severity:
important
Tags:
#978580#5
Date:
2020-12-26 21:52:15 UTC
From:
To:
Hi,

During a rebuild of all packages in sid, your package failed to build
on amd64.

Relevant part (hopefully):
http://qa-logs.debian.net/2020/12/26/dasher_5.0.0~beta~repack2-1_unstable.log

A list of current common problems and possible solutions is available at
http://wiki.debian.org/qa.debian.org/FTBFS . You're welcome to contribute!

If you reassign this bug to another package, please marking it as 'affects'-ing
this package. See https://www.debian.org/Bugs/server-control#affects

If you fail to reproduce this, please provide a build log and diff it with me
so that we can identify if something relevant changed in the meantime.

About the archive rebuild: The rebuild was done on EC2 VM instances from
Amazon Web Services, using a clean, minimal and up-to-date chroot. Every
failed build was retried once to eliminate random failures.

#978580#10
Date:
2020-12-26 23:13:55 UTC
From:
To:
FTR, the file that poses problem is Testing/gtest/test/gtest_unittest.cc
This is not something that contains anything to be translated, we'd need
some option to just ignore Testing/ entirely.

Samuel

#978580#15
Date:
2020-12-26 23:18:05 UTC
From:
To:
Samuel Thibault, le dim. 27 déc. 2020 00:13:55 +0100, a ecrit:

More precisely, it is line

  EXPECT_STREQ("(Invalid Unicode 0xABCDFF)",
   WideStringToUtf8(L"\xABCDFF", -1).c_str());

which is *expected* to be an invalid code-point...

Samuel

#978580#20
Date:
2020-12-26 23:29:01 UTC
From:
To:
Hello gettext maintainers,

Lucas Nussbaum, le sam. 26 déc. 2020 22:52:15 +0100, a ecrit:

And this is due to this part of dasher:

  EXPECT_STREQ("(Invalid Unicode 0xABCDFF)",
   WideStringToUtf8(L"\xABCDFF", -1).c_str());

which is precisely *expected* to be an invalid code-point... Since this
string is not marked as to be translated, xgettext shouldn't error out
about it?

Samuel

#978580#25
Date:
2020-12-26 23:59:23 UTC
From:
To:
I don't know.

It must be noted that I uploaded gettext 0.21 for unstable recently
and it propagated to testing today. Apparently the new gettext is more
nitpicky than the old one.

My feeling is that somehow you are calling xgettext "implicitly", i.e.
without being really aware that you are in fact calling xgettext.

If required I can ask gettext upstream about this, but we would need a
minimal test case and a little bit more of investigation on our side.

Thanks.

#978580#30
Date:
2020-12-27 00:17:42 UTC
From:
To:
Santiago Vila, le dim. 27 déc. 2020 00:59:23 +0100, a ecrit:

Well, it does seem that upstream's intent really is to call xgettext
over the source code, to extract translatable strings.

€ cat test.c

#include <wchar.h>

void f(const wchar_t *str) { }

void g(void) {
	f(L"\xABCDFF");
}


€ xgettext test.c
xgettext: x-c.c:1666: phase5_get: Assertion `UNICODE_VALUE (c) >= 0 && UNICODE_VALUE (c) < 0x110000' failed.

Samuel

#978580#35
Date:
2020-12-27 17:05:54 UTC
From:
To:
Greetings.

The upload of gettext 0.21 for Debian unstable has made package "dasher",
maintained by Samuel Thibault (in Cc), not to build anymore, as reported here
by Lucas Nussbaum:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=978315

We are not sure where is exactly the problem (either "dasher" or "gettext").

In short: xgettext seems to parse and complain about UTF conformance
of strings even if they are not marked for translation.

Here is a minimal test case provided by Samuel:
----- Begin forwarded message ----- € cat test.c #include <wchar.h> void f(const wchar_t *str) { } void g(void) { f(L"\xABCDFF"); } € xgettext test.c xgettext: x-c.c:1666: phase5_get: Assertion `UNICODE_VALUE (c) >= 0 && UNICODE_VALUE (c) < 0x110000' failed. Samuel
----- End forwarded message ----- Thanks.
#978580#38
Date:
2020-12-27 17:05:54 UTC
From:
To:
Greetings.

The upload of gettext 0.21 for Debian unstable has made package "dasher",
maintained by Samuel Thibault (in Cc), not to build anymore, as reported here
by Lucas Nussbaum:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=978315

We are not sure where is exactly the problem (either "dasher" or "gettext").

In short: xgettext seems to parse and complain about UTF conformance
of strings even if they are not marked for translation.

Here is a minimal test case provided by Samuel:
----- Begin forwarded message ----- € cat test.c #include <wchar.h> void f(const wchar_t *str) { } void g(void) { f(L"\xABCDFF"); } € xgettext test.c xgettext: x-c.c:1666: phase5_get: Assertion `UNICODE_VALUE (c) >= 0 && UNICODE_VALUE (c) < 0x110000' failed. Samuel
----- End forwarded message ----- Thanks.
#978580#43
Date:
2020-12-27 18:40:41 UTC
From:
To:
Hi Santiago, Samuel,

This behaviour was introduced in gettext 0.20, with the ability to grok
C11 and C++11 string literals.

In the next gettext release, functions like 'f' (which take a 'const wchar_t *'
argument) can be designated as gettext-like functions, for which the argument
needs to be extracted and put into the POT file. For this, it must be possible
to convert it to UTF-8.

The assertion could be converted to a reasonable error message, sure.

Having a reasonable error message (with line number) *and* emitting this error
message only when the string actually gets extracted would make xgettext more
complex.

Since Samuel says:

  ... the file that poses problem is Testing/gtest/test/gtest_unittest.cc
  This is not something that contains anything to be translated, we'd need
  some option to just ignore Testing/ entirely.

this looks like the better option.

Bruno