#567210 libsaxonb-java: doc-available always returns false without network

#567210#5
Date:
2010-01-27 23:23:08 UTC
From:
To:
In certain cases, the doc-available function wrongly returns false when
the document does, in fact, exist.  The attached testcase contains three
files.  Running "saxonb-xslt -s:foo.xml -xsl:foo.xsl" produces the
output:

Requested document is file:/tmp/saxonb-testcase/index.xhtml
Missing‽
Requested document is index.xhtml
Missing‽
Requested document is file:/tmp/saxonb-testcase/index.xhtml
Missing‽
Requested document is index.xhtml
Missing‽

Note that if Java networking works, it works correctly:

Requested document is file:/tmp/saxonb-testcase/index.xhtml
It exists!
Requested document is index.xhtml
It exists!
Requested document is file:/tmp/saxonb-testcase/index.xhtml
It exists!
Requested document is index.xhtml
It exists!

Nevertheless, there is no reason to require networking for file: URIs.
Saxon B should work correctly for file: URIs regardless of the state of
networking, period.  There is no legitimate reason to make a network
request when all documents are local.

Upon further examination, it appears that Saxon is attempting to do a
lookup of the DTD on www.w3.org, even when -expand:off and -dtd:off are
specified.  It does not use catalogs to do so, and therefore fails.

#567210#10
Date:
2010-01-28 01:15:17 UTC
From:
To:
I forgot to include the testcase, so it is now attached.  Also, since
I'm really only using Saxon-B via Ant, if a new feature is added to
resolve this problem, I really need it to be accessible via the Ant
task.

#567210#17
Date:
2011-08-29 17:35:50 UTC
From:
To:
Hi Brian,

thanks for your report. I am afraid that we need someone to implement
the feature. Please note that I have just uploaded a newer upstream
version 9.1.0.8 of saxonb.

Sorry,
Torsten

#567210#22
Date:
2011-08-29 18:45:02 UTC
From:
To:
I'm not using saxonb via ant right now, so the ant task is not critical
for me anymore.  Do you still need someone to fix the underlying bug
(doc-available returns false for file URIs) or just the ant task?  I'll
try to look into the former sometime this wekk if that's the case to see
if I can fix it.

#567210#27
Date:
2011-08-29 20:00:25 UTC
From:
To:
That would be nice.

Thanks,
Torsten

#567210#32
Date:
2013-05-17 06:04:58 UTC
From:
To:
tags 567210 important
thanks

Technically this render the package unusable if you do not have
internet connection, right ? Should'nt this be marked a 'grave' then ?

#567210#39
Date:
2013-10-17 07:55:54 UTC
From:
To:
tags 567210 patch
severity 567210 grave
thanks

Instead of:

$ cat /usr/bin/saxon-xslt
#!/bin/sh

exec java -classpath /usr/share/java/saxon.jar com.icl.saxon.StyleSheet "$@"

it should read:

#!/bin/sh

java -cp /etc/xml/resolver:/usr/share/java/xslthl.jar:/usr/share/java/xml-resolver.jar:/usr/share/java/saxonb.jar
net.sf.saxon.Transform -x
org.apache.xml.resolver.tools.ResolvingXMLReader -y
org.apache.xml.resolver.tools.ResolvingXMLReader -r
org.apache.xml.resolver.tools.CatalogResolver "$@"

#567210#48
Date:
2013-12-31 19:40:53 UTC
From:
To:
Hello,

The patch/different arguments for invoking saxonb-xslt attached to this
bug report doesn't appear to work with the test case provided by Brian
Carlson and current version of libsaxonb-java in the archive in
jessie/sid.  Or at least it's not working for me; it hangs after the
first "Requested document is file:/path/to/index.xhtml" and I can
confirm via wireshark that it is phoning home to www.w3.org.

I also tried specifying the Apache resolver (as in [0], but with the
paths updated to match Debian's libxml-commons-resolver1.1-java JAR,
etc.), and it hangs in the same way.

Can someone demonstrate test cases that show that this bug is addressed?

Thank you,
tony

[0] https://bugs.launchpad.net/ubuntu/+source/saxonb/+bug/400277

#567210#53
Date:
2014-01-07 13:39:26 UTC
From:
To:
Dear submitter,

  Could you please confirm that patch proposed at:

http://bugs.debian.org/567210#39

  Does solve the issue for you ?

Thanks much

#567210#58
Date:
2014-01-08 00:35:37 UTC
From:
To:
It does not appear to solve the problem.  It still does not work when
networking is disabled, and it takes a long time and hangs even when
networking is enabled (probably because of rate-limiting on the W3C's
server).  With networking disabled:

  vauxhall ok % cat saxonb-xslt
  #!/bin/sh

  java -cp /etc/xml/resolver:/usr/share/java/xslthl.jar:/usr/share/java/xml-resolver.jar:/usr/share/java/saxonb.jar net.sf.saxon.Transform -x org.apache.xml.resolver.tools.ResolvingXMLReader -y org.apache.xml.resolver.tools.ResolvingXMLReader -r org.apache.xml.resolver.tools.CatalogResolver "$@"
  vauxhall no % ./saxonb-xslt -s:foo.xml -xsl:foo.xsl
  Requested document is file:/tmp/saxonb-testcase/index.xhtml
  Missing‽
  Requested document is index.xhtml
  Missing‽
  Requested document is file:/tmp/saxonb-testcase/index.xhtml
  Missing‽
  Requested document is index.xhtml
  Missing‽

#567210#63
Date:
2014-01-09 08:17:10 UTC
From:
To:
Dear Mike,

  I am trying to solve issue reported against debian: saxonb 9.1.0.8
package. The full report is at:

http://bugs.debian.org/567210

  In summary the documentation from:

http://www.saxonica.com/documentation/sourcedocs/xml-catalogs.html

  Does not seems to apply when used within debian installation. Indeed
if one download the testcase from:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=567210#10

  and run:

$ java -cp /etc/xml/resolver:/usr/share/java/xml-resolver.jar:/usr/share/java/saxonb.jar
 -Dxml.catalog.files=/etc/xml/catalog -Dxml.catalog.verbosity=1
net.sf.saxon.Transform -x
org.apache.xml.resolver.tools.ResolvingXMLReader -y
org.apache.xml.resolver.tools.ResolvingXMLReader -r
org.apache.xml.resolver.tools.CatalogResolver -s:foo.xml -xsl:foo.xsl

I always get an error without network, however it seems to be doing
something else when network is up. I have not been able to track down
what is actually missing during this XSLT transform.

The file /etc/xml/resolver/CatalogManager.properties comes from
libxml-commons-resolver1.1-java package.

Could you please let me know if there is a way to debug (log) what is
actually downloaded from the net that makes this test script fails.

Thanks much in advance,

#567210#68
Date:
2014-01-09 09:05:50 UTC
From:
To:
These can be tricky to debug. Are you seeing a trace of URI requests from the catalog resolver? If not, have you tried using a monitoring tool such as Wireshark?

The normal cause of problems is that a source file contains a DTD or schema reference to files hosted on www.w3.org. These requests will fail immediately if there is no network, and they will usually fail after a 30 second delay if there is a network, because W3C (since a couple of years ago) has been throttling requests to serve these files. Recent versions of Saxon deal with this by automatically redirecting requests to a local copy held within Saxon itself, but with earlier releases (including 9.1.0.8) the redirection has to be done at user level.

This is all several levels removed from Saxon itself: Saxon calls the XML parser (usually Xerces) to do the parsing, and requests for DTDs etc emanate from the parser, not from Saxon. So Saxon doesn't actually know what files are being requested.

Regards,

Michael Kay
Saxonica

#567210#73
Date:
2014-01-28 13:28:03 UTC
From:
To:
Hi Michael,

Thanks for your previous replay.
I checked with tcpdump and Wireshark that requests are made to
www.w3.org [0]. Hence the failure without network.
Could you please elaborate how to patch Saxonb 9.1.0.8 to redirect
those requests to a local copy held within Saxon itself?

You can find previous discussion here:
http://bugs.debian.org/567210

[0] http://paste.debian.net/78819/

Thanks & Regards,
Eugene

#567210#78
Date:
2014-01-28 14:25:16 UTC
From:
To:
Saxon-B 9.1 does not include copies of these resources.

You can always write a URIResolver and direct the request to copies held at application level, but it can't be done "behind the scenes".

My recommendation would be to move forward to a later Saxon release that fixes the problem. The current release is 9.5. We have no plans to issue further maintenance releases for 9.1, although we do appreciate that some users have been sticking with that release because of the discontinuities introduced between 9.1 and 9.2.

Michael Kay
Saxonica

#567210#83
Date:
2014-01-29 08:28:42 UTC
From:
To:
We have Saxon-HE 9.4.0.7 in Debian archive. So I tried the above
test-case with it:
$ java -cp /etc/xml/resolver:/usr/share/java/xml-resolver.jar:/usr/share/java/Saxon-HE.jar
-Dxml.catalog.files=/etc/xml/catalog -Dxml.catalog.verbosity=1
net.sf.saxon.Transform -s:foo.xml -xsl:foo.xsl

The result is it still fails without network. With network it works.
Also, when I look into the source code of Saxon-HE 9.4.0.7 at [0], I
cannot find the local copies of those resources. So I don't understand
how it would work without the network. What did I miss?

[0] https://dev.saxonica.com/repos/archive/opensource/tags/9.4.0.7/

Eugene

#567210#88
Date:
2014-01-29 09:00:29 UTC
From:
To:
If you use the -t option on the command line, then attempts to use local copies of W3C DTDs will be traced on System.err. Hopefully this will shed more light on why the mechanism isn't working for you.

The EntityResolver that Saxon uses in 9.4 can be found here:

https://dev.saxonica.com/repos/archive/opensource/tags/9.4.0.7/hej/net/sf/saxon/lib/StandardEntityResolver.java

I'm not sure why the data files aren't included under the 9.4.0.7 Subversion tag, but the files are here:

https://dev.saxonica.com/repos/archive/opensource/latest9.4/data/w3c/

I note that your JAR file has been renamed, so it's possible it has also been rebuilt. Look inside it with a ZIP utility and check for the directory named "w3c".

A list of the W3C documents bundled with Saxon for 9.5 can also be found here:

http://www.saxonica.com/documentation/index.html#!sourcedocs/w3c-dtds

and the corresponding list for 9.4 is at:

http://www.saxonica.com/documentation9.4-demo/index.html#!sourcedocs/w3c-dtds

Michael Kay
Saxonica

#567210#93
Date:
2014-01-29 12:41:13 UTC
From:
To:
option, when the test succeeds, you can see Saxon fetching a local
copy, but that doesn't seem to be the case without network.

[0] http://sourceforge.net/projects/saxon/files/Saxon-HE/9.4/SaxonHE9-4-0-7J.zip/download

Eugene

#567210#98
Date:
2014-01-29 13:01:28 UTC
From:
To:
OK, so the problem seems to be here:

Cannot read xhtml11/xhtml-inlpres-1.mod file

and the reason would appear to be the absence of the w3c/ prefix on the file name.

This takes us to here:

https://saxonica.plan.io/boards/3/topics/5625

and that in turn leads me to

https://saxonica.plan.io/issues/1813

which I think is fixed in the 9.5 branch but not in 9.4.

The underlying cause is inconsistent use of system IDs and public IDs in the W3C-published DTDs.

Michael Kay
Saxonica

#567210#103
Date:
2014-01-29 13:25:08 UTC
From:
To:
I tried with latest 9.4 and it wasn't the case anymore, but main
problem persisted.
network, but now it gives some valuable output:
[...]
Warning: SXXP0005: The source document is in namespace
http://www.w3.org/2005/Atom, but none of the
  template rules match elements in this namespace
[...]
Saxon does not have a local copy of PUBLIC -//W3C//DTD XHTML+RDFa
1.0//EN SYSTEM http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd
[...]

[0] http://sourceforge.net/projects/saxon/files/Saxon-HE/9.5/SaxonHE9-5-1-4J.zip/download

#567210#108
Date:
2014-01-29 14:25:00 UTC
From:
To:
You can ignore that warning for present purposes.

Unfortunately there is no complete list of DTDs on the W3C site that might potentially needed, and even if there were, I probably wouldn't want to ship them all with Saxon. So you might have to go back to using catalogs. On the other hand, if you can identify where this was referenced from, I can take a look and see if it ought to be included. It looks as if it comes from one of the XHTML variants, but there seem to be many of these in use.

Michael Kay
Saxonica

#567210#113
Date:
2014-01-30 11:49:57 UTC
From:
To:
with the latest Saxon-HE from [0]:
$ java -cp /etc/xml/resolver:/usr/share/java/xml-resolver.jar:../saxon9he.jar
-Dxml.catalog.files=/etc/xml/catalog -Dxml.catalog.verbosity=1
net.sf.saxon.Transform -t
-x:org.apache.xml.resolver.tools.ResolvingXMLReader
-y:org.apache.xml.resolver.tools.ResolvingXMLReader
-r:org.apache.xml.resolver.tools.CatalogResolver -s:foo.xml
-xsl:foo.xsl

It fails immediately without network, but hangs and succeeds with
network. "Saxon does not have a local copy of PUBLIC -//W3C//DTD
XHTML+RDFa" is not shown anymore! With tcpdump and wireshark I see a
request:
GET /MarkUp/DTD/xhtml-rdfa-1.dtd HTTP/1.1
User-Agent: Java/1.6.0_27
Host: www.w3.org
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive

As you say in the previous letter there is no complete list of DTDs on
the W3C site that might potentially be needed, and even if there were,
Saxon would not ship them all.
Should we conclude then this is a corner test-case and is not
supported by Saxon offline?

[0] http://sourceforge.net/projects/saxon/files/Saxon-HE/9.5/SaxonHE9-5-1-4J.zip/download

Eugene

#567210#118
Date:
2014-01-31 08:07:34 UTC
From:
To:
I just wanted to amend this test-case with more-or-less full HTTP
request/response chain from tcpdump: http://paste.debian.net/79423/

#567210#123
Date:
2014-02-06 13:25:07 UTC
From:
To:
Hello,

A fresh look/opinion is welcome! Could someone please take a fresh
look at the bug #567210.
After closer look at my last test results (described in my two last
mails to the bug), I tend to think the problem lies in Saxon-{B|HE}.

Any comments are very much appreciated,
Eugene