Closed
Description
The value "-nan" (yes, this is a valid type of NaN!) in a CSV input file causes that column to be treated as 'object' instead of float64.
pd.read_csv(StringIO.StringIO('a,b\n1,2.0\n2,nan\n3,-nan')).b
Out[15]:
0 2.0
1 NaN
2 -nan
Name: b, dtype: object
pd.read_csv(StringIO.StringIO('a,b\n1,2.0\n2,nan\n')).b
Out[16]:
0 2
1 NaN
Name: b, dtype: float64
When the file is sufficiently large, the following error is generated:
In [3]: pd.read_csv('big.bad.csv')
/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py:1033: DtypeWarning: Columns (58,64) have mixed types. Specify dtype option on import or set low_memory=False.
data = self._reader.read(nrows)
If the string "-nan" is replaced with "nan" all is well. I don't really need to distinguish negative NaN from NaN but would like to be able to read my data files w/o having to pre-process them to scrub all the '-nan's.