- Package:
- src:pandas
- Source:
- src:pandas
- Submitter:
- Bas Couwenberg
- Date:
- 2025-08-17 18:14:31 UTC
- Severity:
- normal
- Tags:
Dear Maintainer, Your package FTBFS with python-xarray 2024.11.0 in unstable. There are quite a few failures like this: > tm.assert_frame_equal(result.to_dataframe(), expected) E AssertionError: Attributes of DataFrame.iloc[:, 5] (column name="f") are different E E Attribute "dtype" are different E [left]: CategoricalDtype(categories=['a', 'b', 'c', 'd'], ordered=False, categories_dtype=object) E [right]: object pandas/tests/generic/test_to_xarray.py:59: AssertionError See the attached buildlog for details. The autopkgtest likewise fails, see: https://ci.debian.net/packages/p/pandas/testing/amd64/55005061/ Kind Regards, Bas
Not *obviously* known upstream. Some type mismatches (like this one), some shape mismatches that suggest different handling of duplicates in indexes. (There's also two other apparently unrelated issues that can make pandas FTBFS: test_spss_metadata, and parso not supporting python 3.13.) I intend to investigate further.
These are testing conversion between DataFrame and xarray, and are probably failing because xarray now handles extension types in some places it previously didn't, but not everywhere - https://github.com/pydata/xarray/issues/9661 . "Attributes of DataFrame.iloc[:, 5] (column name="f") are different" - Categorical columns now keep that dtype where they previously lost it (which is actually an improvement, but the tests need to be told to expect it). "DataFrame.index classes are different" - However, extension-type indices now lose that type where they previously kept it. "DataFrame shape mismatch" - Possibly because of this, DataFrames with repeated index values now gain more rows on conversion to and from xarray, and the number suggests that n repeats becomes n^2 repeats. (test_spss_metadata is a new item added in later pyreadstat, not an actual problem, so the fix is to change the reference to expect it: https://github.com/pandas-dev/pandas/pull/60109 . The parso failure no longer appears, presumably because it now supports Python 3.13.)
After fixing the easy parts (see Salsa), there are 3 remaining issues: - Datetime-with-timezone, categorical, python-string, and nullable indices lose this dtype (becoming object dtype, keeping their values; note that none of the nullable indices tested actually contain nulls) when going from pandas to xarray. - For datetime-with-timezone indices (but not the other cases with the above issue), the values of the categorical column (but not all the columns) are lost, replaced by NaNs. As the xarray object does not display this column's values, I do not know which step this happens at. *Possibly* related to the exception when *all* columns are nullable reported in upstream https://github.com/pydata/xarray/issues/9661 . - If the index contains duplicate values, each row gets repeated as many times as there are rows with that index value, when going from xarray to pandas. Not obviously known upstream.
Hi, Looking at the latest uploads from Pandas, it seems this bug has been solved since 2.2.3+dfsg-6, which includes changelog entries about solving xarray incompatibilities. Is this really still a bug, or can it be closed?
didn't close this bug). As far as I'm aware, the status of this bug hasn't changed since the description earlier in this thread.
Hi Rebecca, Thanks for the reply. Maybe the bug severity can be downgraded to important, given that the package is building correctly?
severity 1088988 important thanks Hi, as mentioned in my previous update, I'm downgrading severity to important. The severity was serious because the package was failing to build from source. It's now building successfully, and thus this shouldn't be marked serious.