#670040 pdfannotextractor & pax do not work at all

Package:
texlive-latex-extra
Source:
texlive-extra
Submitter:
Juhapekka Tolvanen
Date:
2022-07-23 22:12:08 UTC
Severity:
important
Tags:
#670040#5
Date:
2012-04-22 13:27:57 UTC
From:
To:
% java -jar /usr/share/texlive/texmf-dist/scripts/pax/pax.jar kilinorganisaatio.pdf
Exception in thread "main" java.lang.NoClassDefFoundError: org/pdfbox/cos/ICOSVisitor
        at java.lang.Class.getDeclaredMethods0(Native Method)
        at java.lang.Class.privateGetDeclaredMethods(Class.java:2442)
        at java.lang.Class.getMethod0(Class.java:2685)
        at java.lang.Class.getMethod(Class.java:1620)
        at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:488)
        at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:480)
Caused by: java.lang.ClassNotFoundException: org.pdfbox.cos.ICOSVisitor
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
        ... 6 more


I tried that with three different PDF-files. One of them was my
Master's Thesis:

http://iki.fi/juhtolv/gradu/sosl70c.screen.pdf

My java-command points to this:

% whence -savc java
/usr/bin/java -> /usr/lib/jvm/java-7-openjdk-i386/jre/bin/java

That file belongs to package called openjdk-7-jre-headless

#670040#10
Date:
2012-04-22 15:33:30 UTC
From:
To:
severity 670040 normal
stop

On 22.04.12 Juhapekka Tolvanen (juhtolv@iki.fi) wrote:

Hi,
This is overdone. Just b/c you found a single program in
texlive-latex-extra not working, doesn't mean it is completely useless.
AFAICT the problem in the moment is: the pdfbox package in Debian
unstable delivers a too recent pdfbox lib, pax is not able to use it.
Please read
http://ftp3.gwdg.de/pub/ctan/macros/latex/contrib/pax/README (I'm
pretty sure it is packaged in Debian). Does this help you?

H.

#670040#17
Date:
2012-04-22 22:59:58 UTC
From:
To:
Dear Heiko,

we got a bug report in Debian that pax does not work, and the
reason is that pdfbox is much newer in Debian, namely 1.6.0 from
Apache:
http://pdfbox.apache.org/index.html
while pax uses the one from
	pdfbox.org
which is an empty wordpress, and one download. I don't know what the
history of that is, maybe pdfbox was later taken over into apache
and further developped.

I am quite sure that the pdfbox from pdfbox.org will not be
packaged in Debian.

Heiko, do you see any problem in porting your pax to the recent
pdfbox by apache?

Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining            preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan                                 TeX Live & Debian Developer
DSA: 0x09C5B094   fp: 14DF 2E6C 0307 BE6D AD76  A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
HIBBING (n.)
The marks left on the outside breast pocket of a storekeeper's overall
where he has put away his pen and missed.
			--- Douglas Adams, The Meaning of Liff

#670040#22
Date:
2012-04-22 23:05:32 UTC
From:
To:
Dear Heiko,

(now to the right email adr of Heiko)

we got a bug report in Debian that pax does not work, and the
reason is that pdfbox is much newer in Debian, namely 1.6.0 from
Apache:
http://pdfbox.apache.org/index.html
while pax uses the one from
	pdfbox.org
which is an empty wordpress, and one download. I don't know what the
history of that is, maybe pdfbox was later taken over into apache
and further developped.

I am quite sure that the pdfbox from pdfbox.org will not be
packaged in Debian.

Heiko, do you see any problem in porting your pax to the recent
pdfbox by apache?

Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining            preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan                                 TeX Live & Debian Developer
DSA: 0x09C5B094   fp: 14DF 2E6C 0307 BE6D AD76  A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
HIBBING (n.)
The marks left on the outside breast pocket of a storekeeper's overall
where he has put away his pen and missed.
			--- Douglas Adams, The Meaning of Liff

#670040#27
Date:
2012-04-23 06:23:57 UTC
From:
To:
Hi Norbert,

pax works with pdfbox 0.7.2 or 0.7.3.

The download page for pdfbox
http://pdfbox.apache.org/download.html
contains a section "Download old releases" and a link to
the old sourceforge page:
http://sourceforge.net/project/showfiles.php?group_id=78314
There both 0.7.2 or 0.7.3 can be downloaded.

Currently I don't have time to update pax to the new pdfbox.
TL freeze is nearing and my bundle update is more important to me.
At least the update of README is on the way to CTAN.

Yours sincerely
  Heiko Oberdiek

#670040#32
Date:
2012-04-23 09:51:08 UTC
From:
To:
Hi Heiko,

thanks for the quick answer!

Thanks for all your great work!

Best wishes

Norbert
------------------------------------------------------------------------
Norbert Preining            preining@{jaist.ac.jp, logic.at, debian.org}
JAIST, Japan                                 TeX Live & Debian Developer
DSA: 0x09C5B094   fp: 14DF 2E6C 0307 BE6D AD76  A9C0 D2BF 4AA3 09C5 B094
------------------------------------------------------------------------
SPOFFORTH (vb.)
To tidy up a room before the cleaning lady arrives.
			--- Douglas Adams, The Meaning of Liff

#670040#43
Date:
2014-03-23 22:32:28 UTC
From:
To:
Dear Heiko,,
<snip>
Any chance to have pax/pdfannotextractor working w/ recent pdfbox in
the future?

Thanks!

Regards,
  Hilmar

#670040#52
Date:
2016-08-30 11:44:26 UTC
From:
To:
Hi,

Please correct me if I'm wrong, but it seems to me that pdfannotextractor
does work fine provided that:

 1 - one runs pdfannotextractor --install once (which downloads pdfbox
     0.7.3 and saves it to ~/.texliveYYYY/..., so that it can be used
     later on).

 2 - one does not have the libpdfbox-java package installed, because in
     this situation the package from /usr/share/java/pdfbox.jar takes
     priority. For example we see:

    localhost ~ $ pdfannotextractor  --install
    PDFAnnotExtractor 0.1l, 2012/04/18 - Copyright (c) 2008, 2011, 2012 by Heiko Oberdiek.
    * Nothing to do, because PDFBox is already found:
      /usr/share/java/pdfbox.jar


IMO, the latter can be considered a problem, because it leads to the
error message encountered by the reporter. In a sense, the libpdfbox-java
package (which is pulled, e.g., by jabref) conflicts with that
functionality of texlive-latex-extra.

I suggest to amend the perl script
/usr/share/texlive/texmf-dist/scripts/pax/pdfannotextractor.pl so that it
deliberately skips a jar file which happens to *not* contain the class
org/pdfbox/cos/ICOSVisitor.class, as a hint that we're probably talking
to a newer pdfbox version which pdfannotextractor doesn't grok yet.

This is essentially what the attached patch does.

Cheers,

E.

#670040#57
Date:
2018-01-28 13:46:50 UTC
From:
To:
Ich bin Fräulein Gabriela Michel, Ich bin 23 Jahre alt, ich verlor meine
Eltern am 16.09.2010.) Ich möchte, dass Sie mir helfen, in Ihr Land
umzuziehen

und Sie werden mir auch helfen, nach einem sehr guten Geschäft zu suchen,
 dass ich Investition in Ihrem Land werde

Ich bitte Sie um Unterstützung, da ich kein Wissen über Geschäfte und
die Regeln habe, die Ihr Land zu sicheren Investitionen führen.

Wirst du versprechen, ehrlich mit mir zu sein?

Ich hoffe, von Ihnen so bald wie möglich für mehr Details zu lesen

Mit freundlichen Grüßen

Fräulein Gabriela Michel.

#670040#62
Date:
2019-08-11 10:01:13 UTC
From:
To:
Dear Heiko,

Reminder for an old issue.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=670040
<snip>
Any chance to have pax/pdfannotextractor working w/ recent pdfbox in
the future?

Thanks!

Regards,
  Hilmar

#670040#67
Date:
2020-02-15 23:17:54 UTC
From:
To:
control: tags -1 + patch

Fixed here :
https://github.com/bastien-roucaries/latex-pax

The javapath are not fixed but the pax could by using java -cp
/usr/share/texlive/texmf-dist/scripts/pax/pax.jar:/usr/share/java/pdfbox.jar:/usr/share/java/commons-logging.jar
pax.PDFAnnotExtractor file.pdf

#670040#74
Date:
2020-02-15 23:46:40 UTC
From:
To:
Hi Bastien,

Would you like to take over PAX as upstream? Heiko has left all behind
and most packages are maintained by others now.

Best

Norbert

#670040#79
Date:
2020-02-16 16:47:22 UTC
From:
To:
May be, how can I become upstream ?

BTW you will need to rebuild pax from source.And I have no idea how to
wrap java from macos and windosw

#670040#84
Date:
2021-01-16 15:01:37 UTC
From:
To:
Am 16.02.2020 um 17:47 teilte Bastien ROUCARIES mit:

Hi Norbert, hi Bastien,
This issue should have been solved by the patch for #951398, right? At
least I can use the pax jar file again.

Consider to use Bastiens repository as upstream to we can drop parts of
the patch.

Hilmar

#670040#89
Date:
2022-07-23 22:10:16 UTC
From:
To:
Am 22.04.2012 um 15:27 teilte Juhapekka Tolvanen mit:

Hi,
hille@sid-amd64:~$ java -jar
/usr/share/texlive/texmf-dist/scripts/pax/pax.jar tps62120.pdf
* Processing file `tps62120.pdf' ...
hille@sid-amd64:~$ ls -ltr

This should have been solved years ago, we just forgot to close it sorry!

Hilmar