Skip to content

API/ENH: Accept nan-likes in StringArray constructor #40839

Closed
@lithomas1

Description

@lithomas1

Is your feature request related to a problem?

Currently, StringArray can only be instantiated directly with a ndarray with strings or NA values represented by pd.NA. The only way to instantiate a StringArray with other missing value indicators(like np.nan and None) is to use pandas.array, which has a side effect of casting non-string elements to strings instead of erroring.

The proposed solution would allow StringArray instantiation from a numpy array containing np.nan/None without casting non-strings. This is useful if you want the StringArray constructor to validate that inputs are strings and also accepts other missing values other than pd.NA. At the very least, it should support np.nan since StringArray is created from a numpy array, and np.nan is the missing value indicator for numpy.

Describe the solution you'd like

Either accept nan-likes in the constructor directly(breaking change) or add a parameter to the constructor allowing other na_values, maybe something like the na_values parameter from read_csv.

API breaking implications

Either breaking change or new parameter.

Describe alternatives you've considered

You'd have to do the validation yourself and validating yourself and then having StringArray validate again is not good for perf.

cc @jorisvandenbossche

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNA - MaskedArraysRelated to pd.NA and nullable extension arraysStringsString extension data type and string data

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

        翻译: