#1088988 pandas<->xarray data conversion issues

Package:
src:pandas
Source:
src:pandas
Submitter:
Bas Couwenberg
Date:
2025-08-17 18:14:31 UTC
Severity:
normal
Tags:
#1088988#5
Date:
2024-12-03 20:36:34 UTC
From:
To:
Dear Maintainer,

Your package FTBFS with python-xarray 2024.11.0 in unstable.

There are quite a few failures like this:

 >       tm.assert_frame_equal(result.to_dataframe(), expected)
 E       AssertionError: Attributes of DataFrame.iloc[:, 5] (column name="f") are different
 E
 E       Attribute "dtype" are different
 E       [left]:  CategoricalDtype(categories=['a', 'b', 'c', 'd'], ordered=False, categories_dtype=object)
 E       [right]: object

 pandas/tests/generic/test_to_xarray.py:59: AssertionError

See the attached buildlog for details.

The autopkgtest likewise fails, see:

https://ci.debian.net/packages/p/pandas/testing/amd64/55005061/

Kind Regards,

Bas

#1088988#10
Date:
2024-12-03 22:42:46 UTC
From:
To:
Not *obviously* known upstream.  Some type mismatches (like this one),
some shape mismatches that suggest different handling of duplicates in
indexes.

(There's also two other apparently unrelated issues that can make pandas
FTBFS: test_spss_metadata, and parso not supporting python 3.13.)

I intend to investigate further.

#1088988#19
Date:
2025-01-04 22:42:42 UTC
From:
To:
These are testing conversion between DataFrame and xarray, and are
probably failing because xarray now handles extension types in some
places it previously didn't, but not everywhere -
https://github.com/pydata/xarray/issues/9661 .
"Attributes of DataFrame.iloc[:, 5] (column name="f") are different" -
Categorical columns now keep that dtype where they previously lost it
(which is actually an improvement, but the tests need to be told to
expect it).
"DataFrame.index classes are different" - However, extension-type
indices now lose that type where they previously kept it.
"DataFrame shape mismatch" - Possibly because of this, DataFrames with
repeated index values now gain more rows on conversion to and from
xarray, and the number suggests that n repeats becomes n^2 repeats.

(test_spss_metadata is a new item added in later pyreadstat, not an
actual problem, so the fix is to change the reference to expect it:
https://github.com/pandas-dev/pandas/pull/60109 .  The parso failure no
longer appears, presumably because it now supports Python 3.13.)

#1088988#24
Date:
2025-01-05 14:49:43 UTC
From:
To:
After fixing the easy parts (see Salsa), there are 3 remaining issues:

- Datetime-with-timezone, categorical, python-string, and nullable
indices lose this dtype (becoming object dtype, keeping their values;
note that none of the nullable indices tested actually contain nulls)
when going from pandas to xarray.
- For datetime-with-timezone indices (but not the other cases with the
above issue), the values of the categorical column (but not all the
columns) are lost, replaced by NaNs.  As the xarray object does not
display this column's values, I do not know which step this happens at.
*Possibly* related to the exception when *all* columns are nullable
reported in upstream https://github.com/pydata/xarray/issues/9661 .
- If the index contains duplicate values, each row gets repeated as many
times as there are rows with that index value, when going from xarray to
pandas.  Not obviously known upstream.

#1088988#33
Date:
2025-04-06 13:07:21 UTC
From:
To:
Hi,

Looking at the latest uploads from Pandas, it seems this bug has been
solved since 2.2.3+dfsg-6, which includes changelog entries about
solving xarray incompatibilities.

Is this really still a bug, or can it be closed?

#1088988#38
Date:
2025-04-06 15:31:27 UTC
From:
To:
didn't close this bug).

As far as I'm aware, the status of this bug hasn't changed since the
description earlier in this thread.

#1088988#43
Date:
2025-04-06 15:53:32 UTC
From:
To:
Hi Rebecca,

Thanks for the reply. Maybe the bug severity can be downgraded to
important, given that the package is building correctly?

#1088988#48
Date:
2025-04-13 08:36:59 UTC
From:
To:
severity 1088988 important
thanks

Hi, as mentioned in my previous update, I'm downgrading severity to
important. The severity was serious because the package was failing to
build from source. It's now building successfully, and thus this
shouldn't be marked serious.