- Package:
- libsaxonb-java
- Source:
- saxonb
- Submitter:
- "brian m. carlson"
- Date:
- 2014-11-14 17:18:04 UTC
- Severity:
- normal
In certain cases, the doc-available function wrongly returns false when the document does, in fact, exist. The attached testcase contains three files. Running "saxonb-xslt -s:foo.xml -xsl:foo.xsl" produces the output: Requested document is file:/tmp/saxonb-testcase/index.xhtml Missing‽ Requested document is index.xhtml Missing‽ Requested document is file:/tmp/saxonb-testcase/index.xhtml Missing‽ Requested document is index.xhtml Missing‽ Note that if Java networking works, it works correctly: Requested document is file:/tmp/saxonb-testcase/index.xhtml It exists! Requested document is index.xhtml It exists! Requested document is file:/tmp/saxonb-testcase/index.xhtml It exists! Requested document is index.xhtml It exists! Nevertheless, there is no reason to require networking for file: URIs. Saxon B should work correctly for file: URIs regardless of the state of networking, period. There is no legitimate reason to make a network request when all documents are local. Upon further examination, it appears that Saxon is attempting to do a lookup of the DTD on www.w3.org, even when -expand:off and -dtd:off are specified. It does not use catalogs to do so, and therefore fails.
I forgot to include the testcase, so it is now attached. Also, since I'm really only using Saxon-B via Ant, if a new feature is added to resolve this problem, I really need it to be accessible via the Ant task.
Hi Brian, thanks for your report. I am afraid that we need someone to implement the feature. Please note that I have just uploaded a newer upstream version 9.1.0.8 of saxonb. Sorry, Torsten
I'm not using saxonb via ant right now, so the ant task is not critical for me anymore. Do you still need someone to fix the underlying bug (doc-available returns false for file URIs) or just the ant task? I'll try to look into the former sometime this wekk if that's the case to see if I can fix it.
That would be nice. Thanks, Torsten
tags 567210 important thanks Technically this render the package unusable if you do not have internet connection, right ? Should'nt this be marked a 'grave' then ?
tags 567210 patch severity 567210 grave thanks Instead of: $ cat /usr/bin/saxon-xslt #!/bin/sh exec java -classpath /usr/share/java/saxon.jar com.icl.saxon.StyleSheet "$@" it should read: #!/bin/sh java -cp /etc/xml/resolver:/usr/share/java/xslthl.jar:/usr/share/java/xml-resolver.jar:/usr/share/java/saxonb.jar net.sf.saxon.Transform -x org.apache.xml.resolver.tools.ResolvingXMLReader -y org.apache.xml.resolver.tools.ResolvingXMLReader -r org.apache.xml.resolver.tools.CatalogResolver "$@"
Hello, The patch/different arguments for invoking saxonb-xslt attached to this bug report doesn't appear to work with the test case provided by Brian Carlson and current version of libsaxonb-java in the archive in jessie/sid. Or at least it's not working for me; it hangs after the first "Requested document is file:/path/to/index.xhtml" and I can confirm via wireshark that it is phoning home to www.w3.org. I also tried specifying the Apache resolver (as in [0], but with the paths updated to match Debian's libxml-commons-resolver1.1-java JAR, etc.), and it hangs in the same way. Can someone demonstrate test cases that show that this bug is addressed? Thank you, tony [0] https://bugs.launchpad.net/ubuntu/+source/saxonb/+bug/400277
Dear submitter, Could you please confirm that patch proposed at: http://bugs.debian.org/567210#39 Does solve the issue for you ? Thanks much
It does not appear to solve the problem. It still does not work when networking is disabled, and it takes a long time and hangs even when networking is enabled (probably because of rate-limiting on the W3C's server). With networking disabled: vauxhall ok % cat saxonb-xslt #!/bin/sh java -cp /etc/xml/resolver:/usr/share/java/xslthl.jar:/usr/share/java/xml-resolver.jar:/usr/share/java/saxonb.jar net.sf.saxon.Transform -x org.apache.xml.resolver.tools.ResolvingXMLReader -y org.apache.xml.resolver.tools.ResolvingXMLReader -r org.apache.xml.resolver.tools.CatalogResolver "$@" vauxhall no % ./saxonb-xslt -s:foo.xml -xsl:foo.xsl Requested document is file:/tmp/saxonb-testcase/index.xhtml Missing‽ Requested document is index.xhtml Missing‽ Requested document is file:/tmp/saxonb-testcase/index.xhtml Missing‽ Requested document is index.xhtml Missing‽
Dear Mike, I am trying to solve issue reported against debian: saxonb 9.1.0.8 package. The full report is at: http://bugs.debian.org/567210 In summary the documentation from: http://www.saxonica.com/documentation/sourcedocs/xml-catalogs.html Does not seems to apply when used within debian installation. Indeed if one download the testcase from: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=567210#10 and run: $ java -cp /etc/xml/resolver:/usr/share/java/xml-resolver.jar:/usr/share/java/saxonb.jar -Dxml.catalog.files=/etc/xml/catalog -Dxml.catalog.verbosity=1 net.sf.saxon.Transform -x org.apache.xml.resolver.tools.ResolvingXMLReader -y org.apache.xml.resolver.tools.ResolvingXMLReader -r org.apache.xml.resolver.tools.CatalogResolver -s:foo.xml -xsl:foo.xsl I always get an error without network, however it seems to be doing something else when network is up. I have not been able to track down what is actually missing during this XSLT transform. The file /etc/xml/resolver/CatalogManager.properties comes from libxml-commons-resolver1.1-java package. Could you please let me know if there is a way to debug (log) what is actually downloaded from the net that makes this test script fails. Thanks much in advance,
These can be tricky to debug. Are you seeing a trace of URI requests from the catalog resolver? If not, have you tried using a monitoring tool such as Wireshark? The normal cause of problems is that a source file contains a DTD or schema reference to files hosted on www.w3.org. These requests will fail immediately if there is no network, and they will usually fail after a 30 second delay if there is a network, because W3C (since a couple of years ago) has been throttling requests to serve these files. Recent versions of Saxon deal with this by automatically redirecting requests to a local copy held within Saxon itself, but with earlier releases (including 9.1.0.8) the redirection has to be done at user level. This is all several levels removed from Saxon itself: Saxon calls the XML parser (usually Xerces) to do the parsing, and requests for DTDs etc emanate from the parser, not from Saxon. So Saxon doesn't actually know what files are being requested. Regards, Michael Kay Saxonica
Hi Michael, Thanks for your previous replay. I checked with tcpdump and Wireshark that requests are made to www.w3.org [0]. Hence the failure without network. Could you please elaborate how to patch Saxonb 9.1.0.8 to redirect those requests to a local copy held within Saxon itself? You can find previous discussion here: http://bugs.debian.org/567210 [0] http://paste.debian.net/78819/ Thanks & Regards, Eugene
Saxon-B 9.1 does not include copies of these resources. You can always write a URIResolver and direct the request to copies held at application level, but it can't be done "behind the scenes". My recommendation would be to move forward to a later Saxon release that fixes the problem. The current release is 9.5. We have no plans to issue further maintenance releases for 9.1, although we do appreciate that some users have been sticking with that release because of the discontinuities introduced between 9.1 and 9.2. Michael Kay Saxonica
We have Saxon-HE 9.4.0.7 in Debian archive. So I tried the above test-case with it: $ java -cp /etc/xml/resolver:/usr/share/java/xml-resolver.jar:/usr/share/java/Saxon-HE.jar -Dxml.catalog.files=/etc/xml/catalog -Dxml.catalog.verbosity=1 net.sf.saxon.Transform -s:foo.xml -xsl:foo.xsl The result is it still fails without network. With network it works. Also, when I look into the source code of Saxon-HE 9.4.0.7 at [0], I cannot find the local copies of those resources. So I don't understand how it would work without the network. What did I miss? [0] https://dev.saxonica.com/repos/archive/opensource/tags/9.4.0.7/ Eugene
If you use the -t option on the command line, then attempts to use local copies of W3C DTDs will be traced on System.err. Hopefully this will shed more light on why the mechanism isn't working for you. The EntityResolver that Saxon uses in 9.4 can be found here: https://dev.saxonica.com/repos/archive/opensource/tags/9.4.0.7/hej/net/sf/saxon/lib/StandardEntityResolver.java I'm not sure why the data files aren't included under the 9.4.0.7 Subversion tag, but the files are here: https://dev.saxonica.com/repos/archive/opensource/latest9.4/data/w3c/ I note that your JAR file has been renamed, so it's possible it has also been rebuilt. Look inside it with a ZIP utility and check for the directory named "w3c". A list of the W3C documents bundled with Saxon for 9.5 can also be found here: http://www.saxonica.com/documentation/index.html#!sourcedocs/w3c-dtds and the corresponding list for 9.4 is at: http://www.saxonica.com/documentation9.4-demo/index.html#!sourcedocs/w3c-dtds Michael Kay Saxonica
option, when the test succeeds, you can see Saxon fetching a local copy, but that doesn't seem to be the case without network. [0] http://sourceforge.net/projects/saxon/files/Saxon-HE/9.4/SaxonHE9-4-0-7J.zip/download Eugene
OK, so the problem seems to be here: Cannot read xhtml11/xhtml-inlpres-1.mod file and the reason would appear to be the absence of the w3c/ prefix on the file name. This takes us to here: https://saxonica.plan.io/boards/3/topics/5625 and that in turn leads me to https://saxonica.plan.io/issues/1813 which I think is fixed in the 9.5 branch but not in 9.4. The underlying cause is inconsistent use of system IDs and public IDs in the W3C-published DTDs. Michael Kay Saxonica
I tried with latest 9.4 and it wasn't the case anymore, but main problem persisted. network, but now it gives some valuable output: [...] Warning: SXXP0005: The source document is in namespace http://www.w3.org/2005/Atom, but none of the template rules match elements in this namespace [...] Saxon does not have a local copy of PUBLIC -//W3C//DTD XHTML+RDFa 1.0//EN SYSTEM http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd [...] [0] http://sourceforge.net/projects/saxon/files/Saxon-HE/9.5/SaxonHE9-5-1-4J.zip/download
You can ignore that warning for present purposes. Unfortunately there is no complete list of DTDs on the W3C site that might potentially needed, and even if there were, I probably wouldn't want to ship them all with Saxon. So you might have to go back to using catalogs. On the other hand, if you can identify where this was referenced from, I can take a look and see if it ought to be included. It looks as if it comes from one of the XHTML variants, but there seem to be many of these in use. Michael Kay Saxonica
with the latest Saxon-HE from [0]: $ java -cp /etc/xml/resolver:/usr/share/java/xml-resolver.jar:../saxon9he.jar -Dxml.catalog.files=/etc/xml/catalog -Dxml.catalog.verbosity=1 net.sf.saxon.Transform -t -x:org.apache.xml.resolver.tools.ResolvingXMLReader -y:org.apache.xml.resolver.tools.ResolvingXMLReader -r:org.apache.xml.resolver.tools.CatalogResolver -s:foo.xml -xsl:foo.xsl It fails immediately without network, but hangs and succeeds with network. "Saxon does not have a local copy of PUBLIC -//W3C//DTD XHTML+RDFa" is not shown anymore! With tcpdump and wireshark I see a request: GET /MarkUp/DTD/xhtml-rdfa-1.dtd HTTP/1.1 User-Agent: Java/1.6.0_27 Host: www.w3.org Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2 Connection: keep-alive As you say in the previous letter there is no complete list of DTDs on the W3C site that might potentially be needed, and even if there were, Saxon would not ship them all. Should we conclude then this is a corner test-case and is not supported by Saxon offline? [0] http://sourceforge.net/projects/saxon/files/Saxon-HE/9.5/SaxonHE9-5-1-4J.zip/download Eugene
I just wanted to amend this test-case with more-or-less full HTTP request/response chain from tcpdump: http://paste.debian.net/79423/
Hello,
A fresh look/opinion is welcome! Could someone please take a fresh
look at the bug #567210.
After closer look at my last test results (described in my two last
mails to the bug), I tend to think the problem lies in Saxon-{B|HE}.
Any comments are very much appreciated,
Eugene