Pandas中的数据重塑(Reshaping): pivot, pivot_table, stack, unstack.

1. pivot_table()

Generalization of pivot that can handle duplicate values for one index/column pair.

import pandas as pd
import numpy as np
df = pd.DataFrame({
    "A": ["aaa", "aaa", "aaa", "aaa", "aaa", "ccc", "ccc", "ccc", "ccc"],
    "B": ["xxx", "xxx", "xxx", "yyy", "yyy", "xxx", "xxx", "yyy", "yyy"],
    "C": [
        "small", "LARGE", "LARGE", "small", "small", "LARGE", "small", "small",
        "LARGE"
    ],
    "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
    "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]
})

df

pd.pivot_table(df, values='D', index=['A', 'B'],
               columns=['C'], aggfunc=np.sum)

pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'], aggfunc=np.sum, fill_value=0)

pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'],
            aggfunc={'D': [np.mean,np.min],
                     'E': [np.mean,np.max]})

2. pivot()

image-20190731153901039

df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two',
                           'two'],
                   'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
                   'baz': [1, 2, 3, 4, 5, 6],
                   'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
df

df.pivot(index='foo', columns='bar', values='baz')
# or
df.pivot(index='foo', columns='bar')['baz']

df.pivot(index='foo', columns='bar')

# 如果有重复的值则会报错:

df = pd.DataFrame({"foo": ['one', 'one', 'two', 'two'],
                   "bar": ['A', 'A', 'B', 'C'],
                   "baz": [1, 2, 3, 4]})
df

df.pivot(index='foo', columns='bar', values='baz')
# ValueError: Index contains duplicate entries, cannot reshape

3. stack() & unstack()

image-20190731153917418

image-20190731153935946

image-20190731154004573

image-20190731154014423

Stack the prescribed level(s) from columns to index.

Pivot a level of the (necessarily hierarchical) index labels, returning a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels.

If the index is not a MultiIndex, the output will be a Series (the analogue of stack when the columns are not a MultiIndex).

The level involved will automatically get sorted.

data = pd.DataFrame(np.random.randn(9,2), columns=['num1','num2'],
                    index=[['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd', 'd'],[1, 2, 3, 1, 3, 1, 2, 2, 3]])
data

data.unstack()

data.unstack().stack()

data.unstack(level=0)


frame = pd.DataFrame(np.arange(12).reshape((4, 3)),
                     index=[['a', 'a', 'b', 'b'], [1, 2, 1, 2]],
                     columns=[['Ohio', 'Ohio', 'Colorado'],
                              ['Green', 'Red', 'Green']])
frame

frame.index.names = ['key1', 'key2']
frame.columns.names = ['state', 'color']
frame

frame.unstack()

frame.index.names = [None,None]
frame.columns.names = [None,None]
frame

frame.stack(level=1)

frame.stack(level=1).stack()

frame1 = frame.stack(level=1).reset_index()
frame1

pd.pivot_table(frame1, index=['level_0','level_1'],columns=['level_2'])

pd.pivot_table(frame1, index=['level_0','level_1','level_2'])

REFERENCE