Description
Is your feature request related to a problem?
Currently, StringArray can only be instantiated directly with a ndarray with strings or NA values represented by pd.NA. The only way to instantiate a StringArray with other missing value indicators(like np.nan
and None
) is to use pandas.array, which has a side effect of casting non-string elements to strings instead of erroring.
The proposed solution would allow StringArray instantiation from a numpy array containing np.nan/None without casting non-strings. This is useful if you want the StringArray constructor to validate that inputs are strings and also accepts other missing values other than pd.NA. At the very least, it should support np.nan since StringArray is created from a numpy array, and np.nan is the missing value indicator for numpy.
Describe the solution you'd like
Either accept nan-likes in the constructor directly(breaking change) or add a parameter to the constructor allowing other na_values, maybe something like the na_values parameter from read_csv.
API breaking implications
Either breaking change or new parameter.
Describe alternatives you've considered
You'd have to do the validation yourself and validating yourself and then having StringArray validate again is not good for perf.