首页 > 解决方案 > Numpy unwrap error when using apply() on Pandas dataframe

问题描述

I have a Pandas DataFrame which has two columns containing some angles in the range [-pi, pi). I need to calculate the instantaneous angular velocity on each row, which I can do using diff(), however this naive approach fails when my data crosses the discontinuity from pi to -pi, e.g.

I'm trying to use numpy.unwrap() on my columns but when I try the code below I get a ValueError.

angle_data["theta"].apply(np.unwrap)
<Traceback message> 
ValueError: diff requires input that is at least one dimensional

This also occurs if I copy the columns to a Pandas Series and try to use apply(np.unwrap). I can workaround this by doing

angle_data["theta"] = pd.Series(np.unwrap(angle_data["theta"]))

or by using apply on multiple columns at once, but I'd like to know why the apply(np.unwrap) method doesn't work for a Pandas Series.

标签: pythonpandasnumpy

解决方案


From the doc :

Help on function unwrap in module numpy:

unwrap(p, discont=3.141592653589793, axis=-1)
    ...
    Parameters
    ----------
    p : array_like
        Input array.
    ...

What your traceback is saying is that by using apply, you are iterating over the column, then applying unwrap to each individual value (which goes against the doc about p).

You can see what is happening by using some custom print like this :

def my_print(x):
    print(x)
    print('-'*50)
df['theta'].apply(my_print)

You will see that each value of the column is passed as an argument one after the other. In other terms, you are looping as you would through a list : quite inefficient.

You already found the right way to use unwrap : by applying it straight to the series, which doesn't iterate over it : np.unwrap(df['theta']).

This is the way to use all numpy functions (spoiler alert : huge performances gains are due if you drop the "apply" method).

So as a rule of thumb : stay away of "apply" when you can (an most of the time, you can indeed) and stick to numpy or built-in functions from pandas.


推荐阅读