Skip to content

BUG in clipboard (linux, python2) with unicode and separator #13747

Closed
@pijucha

Description

@pijucha

This is probably a known bug but I couldn't find a github issue.

There is a disabled test test_clipboard.py which fails with the following error

======================================================================
FAIL: test_round_trip_frame_sep (pandas.io.tests.test_clipboard.TestClipboard)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/users/piotr/workspace/pandas-pijucha/pandas/io/tests/test_clipboard.py", line 73, in test_round_trip_frame_sep
    self.check_round_trip_frame(dt, sep=',')
  File "/home/users/piotr/workspace/pandas-pijucha/pandas/io/tests/test_clipboard.py", line 69, in check_round_trip_frame
    tm.assert_frame_equal(data, result, check_dtype=False)
  File "/home/users/piotr/workspace/pandas-pijucha/pandas/util/testing.py", line 1276, in assert_frame_equal
    right.columns))
  File "/home/users/piotr/workspace/pandas-pijucha/pandas/util/testing.py", line 1022, in raise_assert_detail
    raise AssertionError(msg)
AssertionError: DataFrame are different

DataFrame shape (number of columns) are different
[left]:  2, Index([u'en', u'es'], dtype='object')
[right]: 0, Index([], dtype='object')

Code Sample, a copy-pastable example if possible

More explicitly (the example from the above test):

nonascii = pd.DataFrame({'en': 'in English'.split(), 'es': 'en español'.split()})

nonascii.to_clipboard(sep=',')

read_clipboard(sep=',', index_col=0)
Out[154]: 
Empty DataFrame
Columns: []
Index: [0       in       en, 1  English  español]

read_clipboard()
Out[155]: 
        en       es
0       in       en
1  English  español

Expected Output

read_clipboard(sep=',', index_col=0)
Out[134]: 
        en       es
0       in       en
1  English  español

read_clipboard()
Out[135]: 
              ,en,es
0            0,in,en
1  1,English,español

output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Linux
OS-release: 4.1.20-1
machine: x86_64
processor: Intel(R)_Core(TM)_i5-2520M_CPU_@_2.50GHz
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.18.1+240.gbb6b5e5
nose: 1.3.7
pip: 8.1.2
setuptools: 21.2.0
Cython: 0.24.1
numpy: 1.11.0

There are probably 2 issues in the code.

  1. `.encode('utf-8') is called on a py2 string, which raises if there is a non-ascii character in the string, and then
  2. to_clipboard falls back to to_string method.
    (In this case, fixing 1 solves the problem. But in general, if something else raises and we fall back here, a separator is ignored.)

I don't know what to do about 2, but 1 seems to be easy.
Part of the code in util.clipboard.py calls subprocess.Popen.communicate(), which operates on byte types (bytes in PY3 and strings in PY2). So, encode/decode are needed only in PY3.

I believe this 6d4fdb0 fixes the problem. But for now I tested only one pair of functions (in KDE) and couldn't possibly test it on OS X.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugCompatpandas objects compatability with Numpy or Python functionsIO DataIO issues that don't fit into a more specific labelUnicodeUnicode strings

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

        翻译: