#910280 pandoc: Please reduce the binary size

Package:
pandoc
Source:
pandoc
Description:
general markup converter
Submitter:
Горбешко Богдан
Date:
2023-12-20 14:06:06 UTC
Severity:
wishlist
Tags:
#910280#5
Date:
2018-10-04 11:02:59 UTC
From:
To:
Dear Maintainer,

the pandoc binary is extremely large. It's the largest file in my
/usr/bin, exceeding even blender's binary in almost 2 times.

 From my experience, ghc is not good at making small binaries, and even
stripping doesn't do much. However UPX does it's job great on binaries
produced by ghc. I tried compressing pandoc in --best mode and achieved
14% compression (from 141M to 20M); however the compression took more
than an hour on my system.

If you are afraid of performance decreasing that may arise because of
UPXing, you can make pandoc a virtual package, pointing by default to a
non-compressed real package, but providing a compressed real package as
well, for those who care about disk space.

#910280#10
Date:
2018-10-04 16:31:55 UTC
From:
To:
I'm happy to leave this issue up to the Debian
maintainer, since the upx compression would need to
be done in the packaging process and doesn't involve
upstream.

I tested upx compression (with --best --ultra-brute)
on macOS.  Time for 'pandoc --version' went from 0.031s
without compression to 0.756s with compression.  This
is potentially an issue for some users of pandoc
(particularly people who shell out to pandoc to
convert small bits of text).  Offering a compressed
version as an option sounds like a good idea.

#910280#15
Date:
2018-10-04 17:33:57 UTC
From:
To:
Hi Горбешко,

Quoting Горбешко Богдан (2018-10-04 13:02:59)

I agree that the binary is big, but I disagree with shipping a
compressed binary - even as an alternative only.

Reason Pandoc is big is that it is statically linked.  If statically
linked with FFmpeg, Boost, Cairo, Mesa, GDAL, GTK+, HDF4, HDF5, Lapack,
etc., Blender would be much much larger than Pandoc.

Providing a compressed binary will just shift the burden elsewhere, and
providing as alternative shifts the burden to the distribution mirrors.

The proper solution here, I guess (but am not expert in Haskell so may
be wrong) is to switch to using shared linking, so that 5 Haskell
binaries will not consume 5 x the disk space of the parts reused among
them.


 - Jonas

#910280#22
Date:
2018-10-04 17:58:54 UTC
From:
To:
Jonas Smedegaard <jonas@jones.dk> writes:

Yes, in theory.  But this didn't work well in practice
when arch linux tried it.  It meant that installing
pandoc forced installation of a very large number of
dynamic libraries, and people really didn't like this.

https://www.reddit.com/r/haskell/comments/6jj8ha/whats_going_on_in_archlinux_pandoc_requires_1gb/

#910280#27
Date:
2018-10-04 22:50:55 UTC
From:
To:
Quoting John MacFarlane (2018-10-04 19:58:54)

Well, seems they chose to not properly separate development code from
runtime code.

Try install a single KDE program program - e.g. Krita - on an otherwise
GNOME or Xfce system - that'll also pull in several hundred megabytes.
But then the next one will not.  I would expect similarly that
installing Pandoc as the _only_ application written in Haskell would
pull in maybe 50 MB or maybe 150 MB (but not 1GB when sensibly packaged)
and then installing another Haskell-based application would require far
less.

But I don't know.


 - Jonas

#910280#32
Date:
2018-10-05 00:28:17 UTC
From:
To:
[2018-10-05 00:50] Jonas Smedegaard <jonas@jones.dk>

We could generate packages with compressed binaries in a way, similar to
*-dbg packages. All compiled languages, except C (Go, Rust, Haskell)
would benefit, but it is quite a bit of work -- changes to debhelper and
reprorepo, at least.

#910280#37
Date:
2018-10-05 07:33:34 UTC
From:
To:
Quoting KAction@gnu.org (2018-10-05 02:28:17)

We could, yes.

Personally I will not drive that effort, however.  I encourage you to
consider driving it yourself, KAction.


 - Jonas

#910280#42
Date:
2018-10-06 13:45:36 UTC
From:
To:
[2018-10-04 17:41] John MacFarlane <jgm@berkeley.edu>