Closed
Description
I am using groupby().apply() to compute new columns in a dataframe, for example like this:
df1 = DataFrame([{"val1": 1, "val2" : 20}, {"val1":1, "val2": 19}, {"val1":2, "val2": 27}, {"val1":2, "val2": 12}])
def func(dataf):
return dataf["val2"] - dataf["val2"].mean()
print type(df1.groupby("val1").apply(func)) # this is a Series
df1["centered"] = df1.groupby("val1").apply(func)
print df1
However, if the set of values of the grouped by column ("val1") is unique, the groupby above returns a dataframe as opposed to a Serie, in which case the assignment of the result to a column fails:
df2 = DataFrame([{"val1": 1, "val2" : 20}, {"val1":1, "val2": 19}, {"val1":1, "val2": 27}, {"val1":1, "val2": 12}])
def func(dataf):
return dataf["val2"] - dataf["val2"].mean()
print type(df2.groupby("val1").apply(func)) # this is a DataFrame
df2["centered"] = df2.groupby("val1").apply(func) # this fails: cannot assign a DataFrame to a column
print df2
As a result, my code is littered by check of uniqueness of grouped by parameter:
if len(dataframe["val1"].unique()) == 1:
df2["centered"] = func(df2)
else:
df2["centered"] = df2.groupby("val1").apply(func)
Am I taking the wrong approach there? If not, would it be possible to have a consistent returned type?
Thanks!