#1082023 python3-fitz: PyMuPDF is not build with OCR support

Package:
python3-fitz
Source:
python3-fitz
Description:
Python binding for MuPDF
Submitter:
Alex Pyrgiotis
Date:
2026-06-15 11:35:01 UTC
Severity:
normal
Tags:
#1082023#5
Date:
2024-09-17 16:21:03 UTC
From:
To:
Dear Maintainer,

PyMuPDF has built-in OCR support, based on Tesseract. See:

https://pymupdf.readthedocs.io/en/latest/recipes-ocr.html

The python3-fitz package in Debian lacks this support though. Is there
any particular reason why this is the case? If not, is it possible to
enable it?

Thanks,
Alex

#1082023#10
Date:
2026-06-10 22:02:53 UTC
From:
To:
Hi Alex, and the mupdf maintainers,

The fact that pymupdf is not able to use OCR seem to be related to the
fact that mupdf is compiled without support for it, and so I'm
reassigning this issue to src:mupdf.

I'm attaching a patch to enable OCR using Tesseract and Leptonica.
This would enable an use via pymudf, amongst other things.

It would mean all mupdf users would fetch tesseract and leptonica, but
the data files aren't actually required.

Would you be keep to add such dependencies, to add support for doing
OCR? It would enable a set of use cases, especially automatically doing
OCR and add a layer of invisible text on top of PDFs built by mupdf.

All the best,
Alexis

PS: That's my first time with the BTS, and so I hope I did things right :-)
------ BEGIN PATCH ------

diff --git a/debian/control b/debian/control
index dff28c5..6c30d6c 100644
--- a/debian/control
+++ b/debian/control
@@ -19,9 +19,11 @@ Build-Depends: afdko-bin:native,
                 libjbig2dec0-dev (>= 0.20),
                 libjpeg-dev,
                 liblcms2-dev,
+               libleptonica-dev,
                 libmujs-dev (>= 1.3.8),
                 libopenjp2-7-dev,
                 libssl-dev,
+               libtesseract-dev,
                 libx11-dev,
                 libxext-dev,
                 libxrandr-dev,
diff --git a/debian/rules b/debian/rules
index a496f69..dfc9c0d 100755
--- a/debian/rules
+++ b/debian/rules
@@ -43,6 +43,10 @@ BUILD_FLAGS += USE_SYSTEM_MUJS=yes
  BUILD_FLAGS += USE_SYSTEM_LIBS=yes
  # Force using system lcms2 (was not included in default system library
set)
  BUILD_FLAGS += USE_SYSTEM_LCMS2=yes
+# Enable Tesseract OCR support, linking against the system tesseract and
+# leptonica libraries. This appends "-tesseract" to the build suffix,
+# so keep LIB_DIR below in sync.
+BUILD_FLAGS += tesseract=yes USE_SYSTEM_TESSERACT=yes
USE_SYSTEM_LEPTONICA=yes
  BUILD_FLAGS += LD=$(DEB_HOST_GNU_TYPE)-ld

  ifeq (,$(filter terse,$(DEB_BUILD_OPTIONS)))
@@ -73,7 +77,7 @@ override_dh_auto_install:
         install -m744 -T $(CURDIR)/debian/mupdf.sh
$(CURDIR)/debian/tmp/usr/bin/mupdf
         sh debian/install_icons.sh
  ifeq (,$(filter nocheck,$(DEB_BUILD_OPTIONS)))
-       $(MAKE) -C $(CURDIR)/debian/tests test
LIB_DIR=$(CURDIR)/build/shared-debug-Py_LIMITED_API_0x030d0000
+       $(MAKE) -C $(CURDIR)/debian/tests test
LIB_DIR=$(CURDIR)/build/shared-debug-Py_LIMITED_API_0x030d0000-tesseract
  endif

  override_dh_gencontrol:
------ END PATCH ------
#1082023#23
Date:
2026-06-10 21:51:58 UTC
From:
To:
Hi Alex, and the mupdf maintainers,

The fact that pymupdf is not able to use OCR seem to be related to the
fact that mupdf is compiled without support for it, and so I'm
reassigning this issue to src:mupdf.

I'm attaching a patch to enable OCR using Tesseract and Leptonica.
This would enable an use via pymudf, amongst other things.

It would mean all mupdf users would fetch tesseract and leptonica, but
the data files aren't actually required.

Would you be keep to add such dependencies, to add support for doing
OCR? It would enable a set of use cases, especially automatically doing
OCR and add a layer of invisible text on top of PDFs built by mupdf.

All the best,
Alexis

PS: That's my first time with the BTS, and so I hope I did things right :-)
------ BEGIN PATCH ------ diff --git a/debian/control b/debian/control index dff28c5..6c30d6c 100644 --- a/debian/control +++ b/debian/control @@ -19,9 +19,11 @@ Build-Depends: afdko-bin:native, libjbig2dec0-dev (>= 0.20), libjpeg-dev, liblcms2-dev, + libleptonica-dev, libmujs-dev (>= 1.3.8), libopenjp2-7-dev, libssl-dev, + libtesseract-dev, libx11-dev, libxext-dev, libxrandr-dev, diff --git a/debian/rules b/debian/rules index a496f69..dfc9c0d 100755 --- a/debian/rules +++ b/debian/rules @@ -43,6 +43,10 @@ BUILD_FLAGS += USE_SYSTEM_MUJS=yes BUILD_FLAGS += USE_SYSTEM_LIBS=yes # Force using system lcms2 (was not included in default system library set) BUILD_FLAGS += USE_SYSTEM_LCMS2=yes +# Enable Tesseract OCR support, linking against the system tesseract and +# leptonica libraries. This appends "-tesseract" to the build suffix, +# so keep LIB_DIR below in sync. +BUILD_FLAGS += tesseract=yes USE_SYSTEM_TESSERACT=yes USE_SYSTEM_LEPTONICA=yes BUILD_FLAGS += LD=$(DEB_HOST_GNU_TYPE)-ld ifeq (,$(filter terse,$(DEB_BUILD_OPTIONS))) @@ -73,7 +77,7 @@ override_dh_auto_install: install -m744 -T $(CURDIR)/debian/mupdf.sh $(CURDIR)/debian/tmp/usr/bin/mupdf sh debian/install_icons.sh ifeq (,$(filter nocheck,$(DEB_BUILD_OPTIONS))) - $(MAKE) -C $(CURDIR)/debian/tests test LIB_DIR=$(CURDIR)/build/shared-debug-Py_LIMITED_API_0x030d0000 + $(MAKE) -C $(CURDIR)/debian/tests test LIB_DIR=$(CURDIR)/build/shared-debug-Py_LIMITED_API_0x030d0000-tesseract endif override_dh_gencontrol:
------ END PATCH ------