Back to list
ベル・カーブがどこにでも現れる理由
The Bell Curve and Why It Shows Up Everywhere
Translated: 2026/4/25 3:39:26
Japanese Translation
あなたの街にいるすべての大人の身長を測り、各身長に属する人数を描きます。左側は背が低く、右側は背が高く、縦軸に人数を描くと、ベルの形が得られます。両端は細く、中央は広いです。大多数が平均的な身長に集まり、背が高いあるいは低い人物は次第に減少します。次に、心理学実験における反応時間を測り、それを描きます。ベルの形になります。生産ラインから出荷されるアボカドの重さを測り、それを描きます。ベルの形になります。あらゆる慎重な科学測定における誤差を測り、それを描きます。ベルの形になります。このような事は何度も起こり続けます。完全に異なるドメインにおいて、同じ形が繰り返されるのです。それは偶然ではありません。この形状が多様な小さな独立した要因が加算されて結果が产生される whenever 出現する理由には数学的な説明があります。その理由こそがこの記事のテーマです。
import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
np.random.seed(42)
heights = np.random.normal(loc=170, scale=10, size=10000)
plt.figure(figsize=(10, 5))
plt.hist(heights, bins=60, edgecolor='black', color='steelblue', alpha=0.7)
plt.axvline(heights.mean(), color='red', linewidth=2, label=f'Mean: {heights.mean():.1f}')
plt.xlabel('Height (cm)')
plt.ylabel('Count')
plt.title('Distribution of heights (10,000 people)')
plt.legend()
plt.savefig('normal_dist.png', dpi=100, bbox_inches='tight')
plt.close()
print(f"Mean: {heights.mean():.2f} cm")
print(f"Std: {heights.std():.2f} cm")
print(f"Min: {heights.min():.2f} cm")
print(f"Max: {heights.max():.2f} cm")
Output:
Mean: 169.98 cm
Std: 10.03 cm
Min: 131.74 cm
Max: 209.85 cm
np.random.normal(loc=170, scale=10, size=10000) は、中央に 170、スプレッドに 10 を持った正規分布から 10,000 値を生成します。これから得られるヒストグラムはベルの曲線になります。loc は平均、つまりベルの中心を表します。scale は標準偏差、つまりベルの広さを表します。scale を 2 に変更するとベルは狭く高くなり、30 に変更すると広かつ平らになります。同じ中心、異なるスプレッドです。これは正規分布において最も実用的な特性です。mean = 170, std = 10 とします。
within_1_std = (mean - std, mean + std)
within_2_std = (mean - 2*std, mean + 2*std)
within_3_std = (mean - 3*std, mean + 3*std)
sample = np.random.normal(mean, std, 100000)
pct_1 = np.mean((sample >= within_1_std[0]) & (sample <= within_1_std[1])) * 100
pct_2 = np.mean((sample >= within_2_std[0]) & (sample <= within_2_std[1])) * 100
pct_3 = np.mean((sample >= within_3_std[0]) & (sample <= within_3_std[1])) * 100
print(f"Within 1 std ({within_1_std[0]} to {within_1_std[1]}): {pct_1:.1f}%")
print(f"Within 2 std ({within_2_std[0]} to {within_2_std[1]}): {pct_2:.1f}%")
print(f"Within 3 std ({within_3_std[0]} to {within_3_std[1]}): {pct_3:.1f}%")
Output:
Within 1 std (160 to 180): 68.3%
Within 2 std (150 to 190): 95.4%
Within 3 std (140 to 200): 99.7%
68% のデータは平均から 1 標準偏差以内で分布します。3 標準偏差を超える残りの 0.3% は極めて稀であり、これらがアウトライヤーです。異常値。調べる価値のあるものです。このルールは平均と標準偏差に関わらずどの正規分布にも適用されます。百分率は同じままです。実際には値のみが変化します。機械学習において、正規分布が常に現れる 4 つの場所があります。
1. Weights initialization(重みの初期化):ニューラルネットワークを作成する際、すべての重みはゼロから始まることはできません。異なるニューロンが異なることを学ぶために、互いに異なる値が必要です。標準的なアプローチは、平均 0 で小さな標準偏差を持つ正規分布から重みを初期化することです。
layer_weights = np.random.normal(loc=0, scale=0.01, size=(256, 128))
print(f"Weight matrix shape: {layer_weights.shape}")
print(f"Mean of weights: {layer_weights.mean():.6f}")
print(f"Std of weights: {layer_weights.std():.6f}")
Output:
Weight matrix shape: (256, 128)
Mean of weights: 0.000023
Std of weights: 0.010001
ランダムで、小さく、ゼロを中心に、正規分布に従っています。これがどのニューラルネットワークも始めたところです。
Original Content
Measure the height of every adult in your city. Plot how many people are at each height. Short on the left, tall on the right, count of people on the vertical axis. You get a bell. Narrow at the extremes, wide in the middle. Most people clustered around the average height, fewer and fewer as you go taller or shorter. Now measure reaction times in a psychology study. Plot them. Bell. Measure the weight of apples coming off a production line. Plot them. Bell. Measure the errors in any careful scientific measurement. Plot them. Bell. This keeps happening. The same shape, over and over, in completely unrelated domains. It is not a coincidence. There is a mathematical reason this shape appears whenever many small independent factors add together to produce an outcome. That reason is what this post is about. import numpy as np import matplotlib matplotlib.use('Agg') import matplotlib.pyplot as plt np.random.seed(42) heights = np.random.normal(loc=170, scale=10, size=10000) plt.figure(figsize=(10, 5)) plt.hist(heights, bins=60, edgecolor='black', color='steelblue', alpha=0.7) plt.axvline(heights.mean(), color='red', linewidth=2, label=f'Mean: {heights.mean():.1f}') plt.xlabel('Height (cm)') plt.ylabel('Count') plt.title('Distribution of heights (10,000 people)') plt.legend() plt.savefig('normal_dist.png', dpi=100, bbox_inches='tight') plt.close() print(f"Mean: {heights.mean():.2f} cm") print(f"Std: {heights.std():.2f} cm") print(f"Min: {heights.min():.2f} cm") print(f"Max: {heights.max():.2f} cm") Output: Mean: 169.98 cm Std: 10.03 cm Min: 131.74 cm Max: 209.85 cm np.random.normal(loc=170, scale=10, size=10000) generates 10,000 values from a normal distribution centered at 170 with a spread of 10. The histogram you get from this is a bell curve. loc is the mean. The center of the bell. scale is the standard deviation. How wide the bell is. Change scale to 2 and the bell gets narrow and tall. Change it to 30 and it gets wide and flat. Same center, different spread. This is the most practically useful thing about the normal distribution. mean = 170 std = 10 within_1_std = (mean - std, mean + std) within_2_std = (mean - 2*std, mean + 2*std) within_3_std = (mean - 3*std, mean + 3*std) sample = np.random.normal(mean, std, 100000) pct_1 = np.mean((sample >= within_1_std[0]) & (sample <= within_1_std[1])) * 100 pct_2 = np.mean((sample >= within_2_std[0]) & (sample <= within_2_std[1])) * 100 pct_3 = np.mean((sample >= within_3_std[0]) & (sample <= within_3_std[1])) * 100 print(f"Within 1 std ({within_1_std[0]} to {within_1_std[1]}): {pct_1:.1f}%") print(f"Within 2 std ({within_2_std[0]} to {within_2_std[1]}): {pct_2:.1f}%") print(f"Within 3 std ({within_3_std[0]} to {within_3_std[1]}): {pct_3:.1f}%") Output: Within 1 std (160 to 180): 68.3% Within 2 std (150 to 190): 95.4% Within 3 std (140 to 200): 99.7% 68% of the data falls within one standard deviation of the mean. The remaining 0.3% beyond three standard deviations is extremely rare. These are your outliers. The anomalies. The things worth investigating. This rule works for any normal distribution regardless of what the mean and standard deviation are. The percentages stay the same. Only the actual values change. Four places the normal distribution shows up constantly in machine learning. Weight initialization. When you create a neural network, its weights cannot all start at zero. They need to be different from each other so different neurons learn different things. The standard approach: initialize weights from a normal distribution with mean 0 and a small standard deviation. layer_weights = np.random.normal(loc=0, scale=0.01, size=(256, 128)) print(f"Weight matrix shape: {layer_weights.shape}") print(f"Mean of weights: {layer_weights.mean():.6f}") print(f"Std of weights: {layer_weights.std():.6f}") Output: Weight matrix shape: (256, 128) Mean of weights: 0.000023 Std of weights: 0.010001 Random, small, centered at zero, normally distributed. This is how every neural network starts its life. Feature distributions. Many real-world features are approximately normally distributed. When your features follow a normal distribution, many algorithms work better and faster. When they don't, you sometimes transform them to be closer to normal before training. Residuals in regression. When you fit a line to data, the errors between your predictions and the true values should be normally distributed if your model is working well. If they're not, something is wrong with your model assumptions. Anomaly detection. Values more than three standard deviations from the mean are rare under a normal distribution. Mark them as anomalies. sensor_readings = np.array([ 23.1, 22.8, 23.4, 22.9, 23.2, 23.0, 22.7, 23.3, 22.6, 87.4, 23.1, 22.9 ]) mean = sensor_readings.mean() std = sensor_readings.std() print(f"Mean: {mean:.2f}, Std: {std:.2f}\n") for i, reading in enumerate(sensor_readings): z = (reading - mean) / std status = "ANOMALY" if abs(z) > 2 else "normal" print(f"Reading {i+1:2d}: {reading:6.1f} z={z:6.2f} {status}") Output: Mean: 30.04, Std: 18.41 Reading 1: 23.1 z=-0.38 normal Reading 2: 22.8 z=-0.39 normal Reading 3: 23.4 z=-0.36 normal Reading 4: 22.9 z=-0.39 normal Reading 5: 23.2 z=-0.37 normal Reading 6: 23.0 z=-0.38 normal Reading 7: 22.7 z=-0.40 normal Reading 8: 23.3 z=-0.37 normal Reading 9: 22.6 z=-0.40 normal Reading 10: 87.4 z= 3.12 ANOMALY Reading 11: 23.1 z=-0.38 normal Reading 12: 22.9 z=-0.39 normal One sensor reading spiked to 87.4. Everything else was between 22 and 24. The z-score of 3.12 flags it immediately. Real data is often not perfectly normal. It is skewed, has heavy tails, or has multiple peaks. Knowing what a normal distribution looks like helps you spot when something is off. normal_data = np.random.normal(100, 15, 5000) skewed_data = np.random.exponential(scale=50, size=5000) print("Normal data:") print(f" Mean: {normal_data.mean():.1f}") print(f" Median: {np.median(normal_data):.1f}") print(f" Diff: {abs(normal_data.mean() - np.median(normal_data)):.1f}") print("\nSkewed data:") print(f" Mean: {skewed_data.mean():.1f}") print(f" Median: {np.median(skewed_data):.1f}") print(f" Diff: {abs(skewed_data.mean() - np.median(skewed_data)):.1f}") Output: Normal data: Mean: 99.9 Median: 100.0 Diff: 0.1 Skewed data: Mean: 49.8 Median: 34.3 Diff: 15.5 When mean and median are close, data is likely symmetric and possibly normal. When they diverge significantly, the distribution is skewed. Income, response times, and user session lengths tend to be skewed, not normal. Always check before assuming. Here is the mathematical reason the bell curve shows up in unrelated domains. Take any distribution. Roll a die. Draw from it randomly. Average several draws together. Repeat this many times. Plot the distribution of those averages. Normal distribution. Every time. Regardless of the original distribution. np.random.seed(42) die_rolls_single = np.random.randint(1, 7, size=10000) sample_means = [] for _ in range(10000): sample = np.random.randint(1, 7, size=30) sample_means.append(sample.mean()) sample_means = np.array(sample_means) print("Single die roll:") print(f" Mean: {die_rolls_single.mean():.2f}") print(f" Std: {die_rolls_single.std():.2f}") print(f" Shape: roughly uniform (1 through 6)") print("\nAverage of 30 die rolls (10,000 experiments):") print(f" Mean: {sample_means.mean():.2f}") print(f" Std: {sample_means.std():.2f}") print(f" Shape: bell curve, centered at 3.5") Output: Single die roll: Mean: 3.50 Std: 1.71 Shape: roughly uniform (1 through 6) Average of 30 die rolls (10,000 experiments): Mean: 3.50 Std: 0.31 Shape: bell curve, centered at 3.5 A single die roll is uniformly distributed. Flat. Every outcome equally likely. But average 30 rolls together and suddenly you have a bell curve. Human heights result from averaging many genetic and environmental factors. Measurement errors average out many tiny random disturbances. Product weights in a factory result from many small random variations in the manufacturing process. Averages of many independent things follow the normal distribution. That is why the bell shows up everywhere. This result, called the Central Limit Theorem, is one of the most powerful ideas in all of statistics. Create normal_distribution_practice.py. Part one: generate 5000 student exam scores from a normal distribution with mean 72 and standard deviation 12. Using only numpy (no scipy), calculate what percentage of students scored above 90. What percentage scored below 50. What percentage scored between 60 and 85. Then verify using the 68-95-99.7 rule: approximately what percentage should be within one standard deviation of the mean? Count how many actually are and compare. Part two: you have this real dataset of daily temperatures: temps = np.array([ 24, 26, 23, 25, 28, 24, 27, 25, 26, 24, 23, 26, 25, 27, 24, 26, 23, 25, 42, 24, 26, 25, 27, 24, 23, 26, 25, 28, 24, 26 ]) Calculate mean and standard deviation. Find any temperatures more than 2 standard deviations from the mean. Remove those outliers and recalculate statistics. How much did things change? Part three: demonstrate the Central Limit Theorem using a skewed distribution instead of a die. Use np.random.exponential(scale=10, size=...). Take samples of size 50 and compute their means 5000 times. Print the mean and standard deviation of your sample means. Does the result look normally distributed even though the original distribution was not? Phase 2 is almost done. One post left: all of this math running as real code using NumPy. No theory. Just you, numpy arrays, and every concept from the last eleven posts firing at once. After that, Phase 3. The actual data tools. NumPy, Pandas, visualization. The stuff you will use every single day.