PythonのPandasの列単位での計算について

2025年7月25日

PythonのPandasを使った「列単位での計算」は、データ分析・前処理の中でも非常に重要で頻繁に行われる操作の1つです。

ここでは以下の観点から体系的に詳しく説明していきます。

基本：列を使った四則演算

Pandasでは、DataFrameの列（Series）をベクトル演算のように扱うことができ、次のように計算できます。

import pandas as pd

df = pd.DataFrame({
    'price': [100, 200, 300],
    'quantity': [1, 3, 2]
})

# 合計金額の列を追加
df['total'] = df['price'] * df['quantity']
print(df)

特徴

演算は「各行」ごとに自動で適用される
ブロードキャスト（同じ長さのSeriesや定数）に対応している

列と定数の演算

# 定数との演算（10%割引後の価格）
df['discount_price'] = df['price'] * 0.9
print(df[['price', 'discount_price']])

→ 各要素に0.9を掛けた値が返る（10%オフの計算）。

複数列の条件を使った論理演算

# 条件判定：数量が2以上かつ価格が150より大きいか
df['is_bulk'] = (df['quantity'] >= 2) & (df['price'] > 150)
print(df[['quantity', 'price', 'is_bulk']])

ポイント

&, |, ~ は bit演算子（かつ、または、否定）を使う
and, or はPythonのスカラー用なので使えません（エラーになります）

列の加算・減算・除算の具体例

# 平均価格（合計 ÷ 数量）
df['average'] = df['total'] / df['quantity']

# 値引き処理
df['discount'] = df['price'] - 20
print(df[['total', 'average', 'discount']])

列の計算と関数の組み合わせ（apply）

列の要素ごとに独自の関数を適用したい場合。

# 独自関数を定義して使う
def custom_price(price):
    if price > 250:
        return price * 0.8
    else:
        return price * 0.95

df['special_price'] = df['price'].apply(custom_price)
print(df[['price', 'special_price']])

またはラムダ関数でも可能。

# 消費税を加えた税込価格
df['tax_included'] = df['price'].apply(lambda x: x * 1.1)
print(df[['price', 'tax_included']])

複数列にまたがる行単位の関数適用（`axis=1`）

def calc_total(row):
    return row['price'] * row['quantity']

df['calc_total'] = df.apply(calc_total, axis=1)

axis=1 にすることで「行単位」で処理される。

統計量を使った列の変換（正規化や標準化）

例えば、Zスコア（平均0、標準偏差1のスケール）を作るには

# 平均0、標準偏差1に変換（正規化）
df['price_zscore'] = (df['price'] - df['price'].mean()) / df['price'].std()
print(df[['price', 'price_zscore']])

列の文字列処理（strアクセス）

df_str = pd.DataFrame({'name': ['Alice', 'Bob', 'Charlie']})

# 大文字変換と文字数カウント
df_str['upper_name'] = df_str['name'].str.upper()
df_str['name_length'] = df_str['name'].str.len()
print(df_str)

列の日時処理（dtアクセス）

df_date = pd.DataFrame({
    'date': pd.to_datetime(['2025-07-01', '2025-07-15', '2025-07-24'])
})

# 曜日を抽出
df_date['day_of_week'] = df_date['date'].dt.day_name()
print(df_date)

NumPyと組み合わせた高速な列計算

import numpy as np

# 対数を取る
df['log_price'] = np.log(df['price'])
print(df[['price', 'log_price']])

NumPyはPandasの中で裏側のエンジンとして動いており、np.sqrt, np.exp, np.power などの関数もそのまま使用できます。

列の更新・削除・並べ替え

# 値の更新
df['price'] = df['price'] * 1.2

# 列の削除
df.drop(columns=['total'], inplace=True)

# 列の並べ替え（任意順）
df = df[['price', 'quantity', 'discount_price', 'tax_included', 'log_price']]
print(df)

よくあるエラーとその対処法

エラー内容	原因	解決策
`ValueError: operands could not be broadcast together`	列の長さが違う	両列が同じ長さか確認
`TypeError: cannot compare a Series and int`	`&`, `	`と`and`,` or` を混同
`SettingWithCopyWarning`	チェーン代入などで予期せぬ動作	`.loc` を使って明示的に代入する

まとめ

操作	方法	使用例
列と列の演算	`df['A'] + df['B']`	四則演算、比率など
列と定数の演算	`df['A'] * 1.08`	税込み、割引など
論理演算	`(df['A'] > 100) & (df['B'] < 50)`	条件判定
関数適用	`apply(lambda x: ...)`	カスタム計算
行単位の関数	`apply(func, axis=1)`	price × quantity など
統計処理	`.mean()`, `.std()`	正規化、Zスコアなど