**Page updated:**
January 19, 2021 **Author:** Emmanuel Boss

View PDF

# Creating Particle Size Distributions from Data

Emmanuel Boss and Nils Haentjens contributed to this page.

Particle size distributions (PSDs), the descriptions of how particle number, area, or volume depend on particle size, are useful tools in oceanography used in applications such as characterizing the ecosystem (Cermeno and Figueiras, 2008) or computing the carbon ﬂux to depth (Guidi et al., 2009). While PSDs have been extensively used in the literature, surprisingly little has been written about how they are actually derived from measured data, so we therefore felt the need to write this short note.

### Empirical PSDs

How one constructs a PSD depends on the tool used to size the assembly of particles. Single-particle analyzers, such as the Coulter Counter or cytometers, provide information on the size of each individual particle passing through the instrument (typically based on volume or cross-sectional area and assuming an equivalent sphere). Other methods, such as the Laser In Situ Scattering and Transmission meter (LISST, Agrawal and Pottsmith, 2000), provide a PSD of the bulk assembly of particles by inverting a bulk measurement (near-forward angular scattering in the case of the LISST) to obtain the most likely underlying PSD.

The process of building a PSD from the size information of individual particles is as follows. Choose a number of bins ($M$) and denote their boundaries ${b}_{1},{b}_{2},,{b}_{M+1}$. Place a particle in the particular bin for which its diameter ($D$) obeys ${b}_{j}\le D<{b}_{j+1}$ (by “placing it” is meant that the number of particles in that bin is incremented by one). Ideally, the size characterizing each bin is based on the mean size of the particles in that bin. Typically, however, that size is based on the boundaries of the bin (e.g. the arithmetic or geometric mean of the bin boundaries). Thus the “discrete” PSD, denoted by $N\left({D}_{j}\right)$, gives the number of particles with mean diameter ${D}_{j}$ (units of number per volume of water). To obtain a continuous PSD, $n\left({D}_{j}\right)$, (for example for the purpose of comparison between diﬀerent instruments each having diﬀerent bin sizes), one divides the discrete PSD by the bin width: $n\left({D}_{j}\right)=N\left({D}_{j}\right)\u2215\left({b}_{j+1}-{b}_{j}\right)$ (units of number per volume per size). To obtain a volume or area size distribution, the number of particles in each bin is multiplied by the average volume or area of a particle in that bin, respectively.

Continuous particle number size distributions in the surface ocean are often approximated by a power-law distribution (e.g. Jackson et al., 1997):

$$n\left(D\right)=A{D}^{-\xi}\phantom{\rule{1em}{0ex}}\left[number\phantom{\rule{2.6108pt}{0ex}}c{m}^{-3}\phantom{\rule{2.6108pt}{0ex}}\mu {m}^{-1}\right]\phantom{\rule{0.3em}{0ex}}.$$ | (1) |

Power-law diﬀerential distributions are observed to have an exponent ($\xi $ above) varying between 3 and 4 in the surface ocean (Jackson et al., 1997; Sheldon et al., 1972). An exponent of 4 implies that volume is constant within bins increasing in size by a power-law rule (e.g. Sheldon et al., 1972). A problem in the above equation is that, in principle, we should never exponentiate with a fraction a quantity that has physical units. Often, (1) will be written instead as function of a nondimensional ratio, e.g.

$$n\left(D\right)=A{\left(D\u2215{D}_{0}\right)}^{-\xi}\phantom{\rule{0.3em}{0ex}},$$ |

with ${D}_{0}$ being a reference diameter. In what follows we assume, without loss of generality, that ${D}_{0}=1\phantom{\rule{2.6108pt}{0ex}}\mu m$, and $D$ is reported in $\mu m$ and hence this normalization is implicitly assumed.

Before the PSD can be built certain decisions need to be made. The upper and lower bounds for particle size range, the number of bins, and the rule according to which bins are allocated. Traditionally, due to the rapid decrease in particle concentration with size, bin sizes have been chosen to follow a power-law scaling (Sheldon et al., 1972; Jackson et al., 1997; Agrawal and Pottsmith, 2000). That is to say, a subsequent bin is $q$ times larger than the previous bin (for other possible choices see the Appendix below). This choice has the advantage that bins are of equal size on a logarithmic $D$ (abscissa) axis and that oceanic volume distributions are nearly ﬂat (Sheldon et al., 1972), which provides a quick check on the data. The downside is that over a decade in size the number of particles per bin still decreases rapidly. For example, for a choice of ten bins over a decade, the number of particles between the ﬁrst and last bin fall by a factor of $\sim 1000$. This means that to reduce counting errors to 10% at the largest bin (counting errors scale like $\sqrt{N}$) for such a choice, more than 100,000 particles are required per sample, which is typically unrealistically large for cytometry.

### Parametric Description of a PSD

Assume we want to produce a size distribution for oceanic plankton (e.g. Fig. 2 in Lombard et al., 2019). Denote the boundaries of the bins by ${b}_{1},{b}_{2},,{b}_{M+1}$ . Assuming that the bins they bound grow following a power-law:

$$q=\frac{{b}_{3}-{b}_{2}}{{b}_{2}-{b}_{1}}=\frac{{b}_{4}-{b}_{3}}{{b}_{3}-{b}_{2}}=\cdots =\frac{{b}_{M+1}-{b}_{M}}{{b}_{M}-{b}_{M-1}}\phantom{\rule{0.3em}{0ex}},$$ |

which is satisﬁed if for any $j$,

$${b}_{j}={b}_{1}{q}^{j-1}\phantom{\rule{3.26288pt}{0ex}}\Rightarrow \phantom{\rule{3.26288pt}{0ex}}{b}_{M+1}={b}_{1}{q}^{M}\phantom{\rule{3.26288pt}{0ex}}\Rightarrow \phantom{\rule{3.26288pt}{0ex}}q=\sqrt[M]{\frac{{b}_{M+1}}{{b}_{1}}}\phantom{\rule{0.3em}{0ex}}.$$ |

Thus, if we have the lowest and largest boundaries of the PSD (${b}_{1}$ and ${b}_{M+1}$) and the number of bins in the PSD ($M$), we can compute the boundaries of all other bins. The volume of material associated with spherical particles distributed as a power-law with a diﬀerential power law exponent of 4 (the “canonical” value of Sheldon et al., 1972) is

$$V\left({b}_{j}<D<{b}_{j+1}\right)=\frac{\pi A}{6}{\int}_{{b}_{j}}^{{b}_{j+1}}{D}^{-1}dD=\frac{\pi A}{6}lnq\phantom{\rule{0.3em}{0ex}},$$ |

which, as discussed above, is the same for all bins.

The average size of a particle associated with each bin for such a PSD is

This, however, is diﬀerent from the typical size chosen to represent bins in the literature. Typically the size associated with PSD bins is computed as the geometric mean of the bin boundaries:

$$\overline{D}\left({b}_{j}<D<{b}_{j+1}\right)=\sqrt{{b}_{j+1}{b}_{j}}={b}_{j}\sqrt{q}={b}_{1}{q}^{j-1}\sqrt{q}\phantom{\rule{0.3em}{0ex}},$$ |

which is larger than the size computed in (2). While the two may be close (their ratio is constant and depends on $q$), they are not identical. Choosing the arithmetic mean to represent the bin is even more biased.

This issue is of importance because the mean size of the bin is used to convert between number, area, and volume size distributions. For example the LISST particle size output is a volume distribution (in parts per million, ppm). Converting it to a size distribution requires a division of the volume in each bin by the volume of the average particle in that bin. The choice of the average diameter, because it is cubed, can result in a signiﬁcant bias.

In coastal areas the diﬀerential PSD power-law slope of particles has an exponent closer to $\xi =3$ (e.g. Jackson et al., 1997). In such cases, where we expect the power-law exponent not to be 4, the mean size representing a bin changes from that of (2) to (Boss et al., 2001):

$$\overline{D}\left({b}_{j}<D<{b}_{j+1}\right)=\frac{{\int}_{{b}_{j}}^{{b}_{j+1}}n\left(D\right)DdD}{{\int}_{{b}_{j}}^{{b}_{j+1}}n\left(D\right)dD}=\frac{{\int}_{{b}_{j}}^{{b}_{j+1}}{D}^{1-\xi}dD}{{\int}_{{b}_{j}}^{{b}_{j+1}}{D}^{-\xi}dD}=\frac{\left(1-\xi \right)\left({b}_{j+1}^{2-\xi}-{b}_{j}^{2-\xi}\right)}{\left(2-\xi \right)\left({b}_{j+1}^{1-\xi}-{b}_{j}^{1-\xi}\right)}\phantom{\rule{0.3em}{0ex}}.$$ |

Normalizing a PSD by the total number of particles between the lower and upper boundaries provides a probability distribution for a particle to be within a speciﬁc bin. For a power-law PSD with exponent $\xi $:

$$p\left({b}_{j}<D<{b}_{j+1}\right)=\frac{{\int}_{{b}_{j}}^{{b}_{j+1}}n\left(D\right)dD}{{\int}_{{b}_{1}}^{{b}_{M+1}}n\left(D\right)dD}=\frac{{b}_{j+1}^{1-\xi}-{b}_{j}^{1-\xi}}{{b}_{M+1}^{1-\xi}-{b}_{1}^{1-\xi}}\phantom{\rule{0.3em}{0ex}}.$$ |

The cumulative distribution, $P\left(D<{D}_{c}\right)={\int}_{{b}_{1}}^{{D}_{c}}p\left(D\right)dD$ is therefore

$$P\left({b}_{1}<D<{D}_{c}\right)=\frac{{\int}_{{b}_{1}}^{{D}_{c}}n\left(D\right)dD}{{\int}_{{b}_{1}}^{{b}_{M+1}}n\left(D\right)dD}=\frac{{D}_{c}^{1-\xi}-{b}_{1}^{1-\xi}}{{b}_{M+1}^{1-\xi}-{b}_{1}^{1-\xi}}\phantom{\rule{0.3em}{0ex}},$$ |

which is useful to answer questions such as how numerically abundant are certain planktonic species compared to all other groups. A similar calculus is used to derive the probability distribution for particle area or volume.

Jackson et al., 1997, further discuss the case of computing the PSD for the solid fraction of particles if the particles are fractals (as is the case for oceanic aggregates). This requires assumptions regarding the change of fractal dimension with size and hence is not discussed further here (see, for example, Maggi, 2007; Kahlifa and Hill, 2006).

### Ecological Size Spectra

In ecology, size spectra are typically represented by an abundance or biomass size spectrum with the size axis often represented as mass or volume (e.g., the review by Blanchard et al., 2017). A power-law function is often ﬁt to the spectrum whose value is interpreted based on ecological theory. In such a case the number distribution is represented as

$$n\left(V\right)=A{V}^{-{\xi}_{2}}\phantom{\rule{1em}{0ex}}\left[particles\phantom{\rule{2.6108pt}{0ex}}c{m}^{-3}\phantom{\rule{2.6108pt}{0ex}}\mu {m}^{-1}\right]\phantom{\rule{0.3em}{0ex}},$$ |

and we expect that ${\xi}_{2}=\xi \u22153$, which is less steep than is a function of size as seen in (1). Because it is less steep, using the arithmetic mean for bin size is reasonable for the canonical distribution, more so than the geometric mean. In this case,

$$\overline{D}\left({b}_{j}<V<{b}_{j+1}\right)=\frac{{\int}_{{b}_{j}}^{{b}_{j+1}}n\left(V\right)DdV}{{\int}_{{b}_{j}}^{{b}_{j+1}}n\left(V\right)dV}=\frac{{\int}_{{b}_{j}}^{{b}_{j+1}}D{V}^{-{\xi}_{2}}dD}{{\int}_{{b}_{j}}^{{b}_{j+1}}{V}^{-{\xi}_{2}}dD}=\frac{\left(1-{\xi}_{2}\right)\left({b}_{j+1}^{4\u22153-{\xi}_{2}}-{b}_{j}^{4\u22153-{\xi}_{2}}\right)}{\left(4\u22153-{\xi}_{2}\right)\left({b}_{j+1}^{1-{\xi}_{2}}-{b}_{j}^{1-{\xi}_{2}}\right)}\phantom{\rule{0.3em}{0ex}}.$$ |

### Correctly Fitting a Power Law to a PSD

Suppose one is interested in ﬁtting a power-law model to a PSD, for example as a simple descriptor of a relative contribution of large vs. small particles in a given sample. What is the correct way to ﬁt the PSD such that the exponent will be appropriately computed whether it is computed from a number, area, or volume distribution?

The answer is as follows. The number distribution $n\left({D}_{i}\right)$, where ${D}_{i}$ is the size representing the ${i}^{th}$ bin, has an uncertainty $\delta \left({D}_{i}\right)$. For example, if the uncertainty is due to counting alone, $\delta \left({D}_{i}\right)=\sqrt{n\left({D}_{i}\right)}$ (another source of uncertainty may be instrument sensitivity, particularly at the small end of the PSD). If the metric for ﬁtting is to minimize the root-mean-square error, we want to ﬁnd the ﬁt parameters $A$ and $\xi $ that minimize the cost function

$$\chi =\sum _{i=1}^{M}{\left[\frac{n\left({D}_{i}\right)-A{D}_{i}^{\xi}}{\delta \left({D}_{i}\right)}\right]}^{2}\phantom{\rule{0.3em}{0ex}}.$$ |

A more robust ﬁt that reduces the weight of outliers may be found by minimizing

$$\chi =\sum _{i=1}^{M}\frac{|n\left({D}_{i}\right)-A{D}_{i}^{\xi}|}{\delta \left({D}_{i}\right)}\phantom{\rule{0.3em}{0ex}}.$$ |

If the appropriate uncertainties are used for each type of size distribution (be it number, area, or volume), the exponent found should be the same as we would expect, e.g. the exponent for the diﬀerential volume distribution equals that of the diﬀerential number distribution plus three.

On the other hand, if one simply ﬁts a type-I regression line to $log\left\{n\left({D}_{i}\right)\right\}$ vs. $log\left\{{D}_{i}\right\}$, the implicit assumption is that the relative uncertainty in $n\left({D}_{i}\right)$ is constant and the exponent obtained will not be consistent between the diﬀerent size distributions.

### References

Agrawal, Y. C. and H. C. Pottsmith. 2000. Instruments for particle size and settling velocity observations in sediment transport. Mar. Geol. 168(14), 89114.

Boss, E., M. S. Twardowski, and S. Herring. 2001. Shape of the particulate beam attenuation spectrum and its inversion to obtain the shape of the particulate size distribution. Appl. Opt. 40(27), 48854893.

Cermeno, P. and F. G. Figueiras. 2008. Species richness and cell-size distribution: size structure of phytoplankton communities. Mar. Ecol. Prog. Ser., 357, 7985.

Guidi, L., L. Stemmann, G. A. Jackson, F. Ibanez, H. Claustre, L. Legendre, M. Picheral and G. Gorski. 2009. Eﬀects of phytoplankton community on production, size, and export of large aggregates: a world-ocean analysis. Limnol. Oceanogr. 54, 19511963.

Jackson G. A., R. Maﬃone, D. K. Costello, A. L. Alldredge, B. E. Logan, and H. G. Dam. 1997. Particle size spectra between 1 m and 1 cm at Monterey Bay determined using multiple instruments. Deep Sea Res. Part I Oceanogr. Res. Pap. 44(11), 17391767.

Khelifa A., and P. S. Hill. 2006. Models for eﬀective density and settling velocity of ﬂocs. J. Hydraul. Res. 44, 390401.

Lombard, F., E. Boss, A. M. Waite, M. Vogt, J. Uitz, L. Stemmann. H. M. Sosik, J. Schulz, J-B. Romagnan, M. Picheral, J. Pearlman, M. D. Ohman, B, Niehoﬀ, K. O. Mller, P. Miloslavich, A. Lara-Lpez, R. Kudela, R. M. Lopes, R. Kiko, L. Karp-Boss, J. S. Jaﬀe, M. H. Iversen, J-O. Irisson, K. Fennel, H. Hauss, L. Guidi, G. Gorsky, S. L. C. Giering, P. Gaube, S. Gallager, G. Dubelaar, R. K. Cowen, F. Carlotti, C. Briseo-Avena, L. Berline, K. Benoit-Bird, N. Bax, S. Batten, S. D. Ayata, L. F. Artigas and W. Appeltan, 2019. Globally Consistent Quantitative Observations of Planktonic Ecosystems. Front. Mar. Sci. 6:196. doi: 10.3389/fmars.2019.00196F.

Maggi, F., 2007. Variable fractal dimension: A major control for ﬂoc structure and ﬂocculation kinematics of suspended cohesive sediment. J. Geophys. Res. 112(C7), C07012.

Sheldon, R. W., A. Prakash, and W. H. Sutcliﬀ. 1972. Size distribution of particles in ocean. Limnol. Oceanogr. 17: 327340.

### Appendix: Building PSDs with a Diﬀerent Rule for the Bin Size

In principle, one could use a diﬀerent bin size convention from the canonical use of bins increasing as a power law. For example, if one desires a PSD where, for any $\xi $, the volume in each bin is the same, the process is as follows. The total volume (assuming spheres) is

$${\int}_{{b}_{1}}^{{b}_{M+1}}A\frac{\pi {D}^{3}}{6}{D}^{-\xi}dD=\frac{A\pi \left({b}_{1}^{4-\xi}-{b}_{M+1}^{4-\xi}\right)}{6\left(4-\xi \right)}={V}_{0}\phantom{\rule{0.3em}{0ex}}.$$ |

For every bin to have the same volume we have

$${\int}_{{b}_{j}}^{{b}_{j+1}}A\frac{\pi {D}^{3}}{6}{D}^{-\xi}dD=\frac{A\pi \left({b}_{j}^{4-\xi}-{b}_{j+1}^{4-\xi}\right)}{6\left(4-\xi \right)}=\frac{{V}_{0}}{M}\phantom{\rule{0.3em}{0ex}},$$ |

which implies that

$${b}_{1}^{4-\xi}-{b}_{2}^{4-\xi}={b}_{2}^{4-\xi}-{b}_{3}^{4-\xi}=\cdots ={b}_{M}^{4-\xi}-{b}_{M+1}^{4-\xi}=\frac{{b}_{1}^{4-\xi}-{b}_{M+1}^{4-\xi}}{M}\phantom{\rule{0.3em}{0ex}}.$$ |

With a choice of ${b}_{1}$, ${b}_{M+1}$, and $M$, we can compute all the other bin boundaries.

On the other hand, if one wanted to have bins of constant numbers of particles (for example to have similar counting errors in all bins), we would require the same number of particles in each bin. Assuming we have ${N}_{0}$ particles in $M$ bins spanning from ${b}_{1}$ to ${b}_{M+1}$ gives

$${\int}_{{b}_{1}}^{{b}_{M+1}}A{D}^{-\xi}dD=\frac{A\left({b}_{1}^{1-\xi}-{b}_{M+1}^{1-\xi}\right)}{1-\xi}={N}_{0}\phantom{\rule{0.3em}{0ex}}.$$ |

For every bin to have the same number of particles requires that

$${\int}_{{b}_{j}}^{{b}_{j+1}}A{D}^{-\xi}dD=\frac{A\left({b}_{j}^{1-\xi}-{b}_{j+1}^{1-\xi}\right)}{1-\xi}=\frac{{N}_{0}}{M}\phantom{\rule{0.3em}{0ex}},$$ |

which implies that

$${b}_{1}^{1-\xi}-{b}_{2}^{1-\xi}={b}_{2}^{1-\xi}-{b}_{3}^{1-\xi}=\cdots ={b}_{M}^{1-\xi}-{b}_{M+1}^{1-\xi}=\frac{{b}_{1}^{1-\xi}-{b}_{M+1}^{1-\xi}}{M}$$ |

or

$${b}_{2}^{1-\xi}={b}_{1}^{1-\xi}-\frac{{b}_{1}^{1-\xi}-{b}_{M+1}^{1-\xi}}{M},\cdots \phantom{\rule{0.3em}{0ex}},{b}_{j+1}^{1-\xi}={b}_{j}^{1-\xi}-\frac{{b}_{1}^{1-\xi}-{b}_{M+1}^{1-\xi}}{M}\phantom{\rule{0.3em}{0ex}}.$$ |

Again, with a choice of ${b}_{1}$, ${b}_{M+1}$, and $M$, we can compute all of the other bin boundaries.