Closed
Description
Related to #7401, but another issue I think. Some more strange behaviour with NaNs in the index when unstacking (but now not specifically to datetime).
First case:
In [9]: df = pd.DataFrame({'A': list('aaaabbbb'),
...: 'B':range(8),
...: 'C':range(8)})
In [10]: df.set_index(['A', 'B']).unstack(0)
Out[10]:
C
A a b
B
0 0 NaN
1 1 NaN
2 2 NaN
3 3 NaN
4 NaN 4
5 NaN 5
6 NaN 6
7 NaN 7
In [11]: df.iloc[3,1] = np.NaN
In [12]: df.set_index(['A', 'B']).unstack(0)
Out[12]:
C
A a b
B
0 3 NaN
1 0 NaN
2 1 NaN
NaN NaN NaN
4 NaN 2
5 NaN 4
6 NaN 5
7 6 7
The values in the first column are totally mixed up.
Second case (with repeating values in the second level):
In [13]: df = pd.DataFrame({'A': list('aaaabbbb'),
....: 'B':range(4)*2,
....: 'C':range(8)})
In [14]: df
Out[14]:
A B C
0 a 0 0
1 a 1 1
2 a 2 2
3 a 3 3
4 b 0 4
5 b 1 5
6 b 2 6
7 b 3 7
In [15]: df.set_index(['A', 'B']).unstack(0)
Out[15]:
C
A a b
B
0 0 4
1 1 5
2 2 6
3 3 7
In [16]: df.iloc[2,1] = np.NaN
In [17]: df.set_index(['A', 'B']).unstack(0)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-17-2f4735e48b98> in <module>()
----> 1 df.set_index(['A', 'B']).unstack(0)
...
c:\users\vdbosscj\scipy\pandas-joris\pandas\core\reshape.pyc in _make_selectors(
self)
139
140 if mask.sum() < len(self.index):
--> 141 raise ValueError('Index contains duplicate entries, '
142 'cannot reshape')
143
ValueError: Index contains duplicate entries, cannot reshape
and another error message with the NaN on the last place (of the sublevel):
In [20]: df = pd.DataFrame({'A': list('aaaabbbb'),
....: 'B':range(4)*2,
....: 'C':range(8)})
In [21]: df.iloc[3,1] = np.NaN
In [22]: df.set_index(['A', 'B']).unstack(0)
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
...
c:\users\vdbosscj\scipy\pandas-joris\pandas\core\reshape.pyc in get_result(self)
173 values_indexer = com._ensure_int64(l[~mask])
174 for i, j in enumerate(values_indexer):
--> 175 values[j] = orig_values[i]
176 else:
177 index = index.take(self.unique_groups)
IndexError: index 4 is out of bounds for axis 0 with size 4
I know NaNs in the index is not really recommended, but just exploring this (as I was caught by such an issue, you don't always think of looking if you have NaNs if you get such errors)