Skip to content

BUG: incorrect unstacking with NaNs in the index #7403

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

Related to #7401, but another issue I think. Some more strange behaviour with NaNs in the index when unstacking (but now not specifically to datetime).

First case:

In [9]: df = pd.DataFrame({'A': list('aaaabbbb'),
   ...:                    'B':range(8),
   ...:                    'C':range(8)})

In [10]: df.set_index(['A', 'B']).unstack(0)
Out[10]:
    C
A   a   b
B
0   0 NaN
1   1 NaN
2   2 NaN
3   3 NaN
4 NaN   4
5 NaN   5
6 NaN   6
7 NaN   7

In [11]: df.iloc[3,1] = np.NaN

In [12]: df.set_index(['A', 'B']).unstack(0)
Out[12]:
      C
A     a   b
B
 0    3 NaN
 1    0 NaN
 2    1 NaN
NaN NaN NaN
 4  NaN   2
 5  NaN   4
 6  NaN   5
 7    6   7

The values in the first column are totally mixed up.

Second case (with repeating values in the second level):

In [13]: df = pd.DataFrame({'A': list('aaaabbbb'),
   ....:                    'B':range(4)*2,
   ....:                    'C':range(8)})
In [14]: df
Out[14]:
   A  B  C
0  a  0  0
1  a  1  1
2  a  2  2
3  a  3  3
4  b  0  4
5  b  1  5
6  b  2  6
7  b  3  7

In [15]: df.set_index(['A', 'B']).unstack(0)
Out[15]:
   C
A  a  b
B
0  0  4
1  1  5
2  2  6
3  3  7

In [16]: df.iloc[2,1] = np.NaN

In [17]: df.set_index(['A', 'B']).unstack(0)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-17-2f4735e48b98> in <module>()
----> 1 df.set_index(['A', 'B']).unstack(0)

...

c:\users\vdbosscj\scipy\pandas-joris\pandas\core\reshape.pyc in _make_selectors(
self)
    139
    140         if mask.sum() < len(self.index):
--> 141             raise ValueError('Index contains duplicate entries, '
    142                              'cannot reshape')
    143

ValueError: Index contains duplicate entries, cannot reshape

and another error message with the NaN on the last place (of the sublevel):

In [20]: df = pd.DataFrame({'A': list('aaaabbbb'),
   ....:                    'B':range(4)*2,
   ....:                    'C':range(8)})
In [21]: df.iloc[3,1] = np.NaN

In [22]: df.set_index(['A', 'B']).unstack(0)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)

...

c:\users\vdbosscj\scipy\pandas-joris\pandas\core\reshape.pyc in get_result(self)

    173                 values_indexer = com._ensure_int64(l[~mask])
    174                 for i, j in enumerate(values_indexer):
--> 175                     values[j] = orig_values[i]
    176             else:
    177                 index = index.take(self.unique_groups)

IndexError: index 4 is out of bounds for axis 0 with size 4

I know NaNs in the index is not really recommended, but just exploring this (as I was caught by such an issue, you don't always think of looking if you have NaNs if you get such errors)

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

        翻译: