Sampling methodsSampling is a way of learning about a large group of items or events by examining smaller subsets – or samples – and then inferring things about the larger population from what the samples indicate. Sampling can save time and money over measuring and inspecting every item, but it does pose a risk. There will always be a degree of uncertainty about whether the sample really reflects what is going on with the whole group.
Having a well-thought-out sampling plan is important for ensuring samples are accurately gathered and data accuracy is maintained. Key aspects of the sampling plan are determining how variable the population being studied is, deciding on the appropriate sampling methods, and then ensuring an appropriate sample size and frequency of collection.
Analyzing population data requires different tools than those used for analyzing process data, so you must be clear about what you are sampling. Population data involves nonchanging sets of items, objects, or events, so selecting a representative sample from a population is comparatively straightforward. As Six Sigma teams are, by their very nature, focused on improving quality and business processes, understanding how to accurately sample process data is vitally important. Process data has a time element and gives more information than population data.
The various sampling techniques have specific uses. Six Sigma relies heavily on four commonly used sampling methods:
- Simple random sampling is the most straightforward way of selecting a representative small group from a much larger group of items. By randomly selecting samples, each element in the population under study has an equal chance of being selected. This can be achieved through the use of computer-based random number generators, using manual random numbers tables, or even by drawing numbers from a hat. However, for the results of simple random sampling to be reliable and representative, the sample must be taken from a homogeneous population. If the items in the population are similar, then randomly selecting one here and one there will probably produce a representative sample.
- Stratified sampling is useful for data that can be logically subgrouped. Stratifying sample data is often vital to understanding heterogeneous populations. If the population is highly varied, randomly selecting only a few points is likely to produce a nonrepresentative sample, thus simple random sampling is ineffective in a great many situations. When the group of items under investigation is highly varied, it can first be divided into subgroups, called strata. Then a random sample can be drawn from each subgroup. If stratification factors are well-chosen, the subgroups will be more homogenous. Random sampling of data at each level will generate more representative results.
- Systematic sampling, also known as interval sampling, involves leaving a fixed interval between the items selected for the sample. You might choose to sample every tenth part on a production line, or sample output from a process once every hour. Systematic sampling is the most practical and unbiased method when a sample must be taken from a continually changing process. In Six Sigma, this method is often used to sample items coming off a manufacturing line or to test whether machines are working to specification. It can also be useful for monitoring mixtures, such as taking output water samples to monitor the various chemical contaminant levels.
- Rational sampling is a specialized form used to assess process variation when you already know a bit about the relevant properties of the population. In rational sampling, small groups of items produced consecutively by a process are chosen based on prior knowledge. Then, the short-term variation among the samples is used to predict the long-term variation of the whole process. By restricting data collection to such a short period or small number, special cause variation will be less likely. The hope is that any variation will be common cause, or noise. Rational sampling is useful when you can't get a large enough random sample to be representative otherwise.
Before employing any of the common sampling techniques, there are several decisions to make regarding the key concepts associated with sampling:
- Bias is the influence of any factor that makes the process being studied seem different from what it actually is. A biased sample is by definition not representative, and decisions based on it are, at the very least, questionable.
- Confidence level is an indicator of how certain you are that a sample represents the whole. The decision as to required confidence level depends upon the purpose of the investigation.
- The precision level must be sufficiently adequate that you can be sure your conclusions are based on facts. The purposes of the investigation and what you'll do with the results determine how precise you need to be.
- The sampling frequency is the number of times a measurement is taken per sampling period. This applies only to process sampling.
- One of the most significant decisions involved in sampling is the determination of the sample size. The key is to find the smallest-sized sample that adequately represents the population or process as a whole.
- For a simple random sample to be truly representative, the population must be homogeneous. If the population is widely dissimilar, then stratified sampling will be necessary.
- Very large sample sizes can be prohibitively expensive to gather and evaluate. Sometimes the costs can exceed the relative value of the information the data provides. You should gather only as much data as needed for a representative sample.
- If the process has been studied before, or if a similar process has been studied, large amounts of historical data may be available. This prior knowledge provides perfectly valid information you can use to supplement or even replace your own collection efforts.
- The ease or difficulty of collecting the data sets a limit on how big a sample you can take and how frequently you can take it. If getting the sample is dangerous, difficult, or impacts the operation under investigation, the sample size may need to be small and collected infrequently. In such cases, collect the data you can reasonably collect, and be extra careful about the conclusions you draw.
- The variability of the population or process under investigation is a major determinant of sample size. If the population varies greatly, the sample size needed to get a representative random sample may be very large, and stratification may be required. The more homogenous a population, the smaller the sample size needed.
- Whether you are investigating a population or a process affects the size and number of samples you must collect include. Populations do not change, whereas processes are continually changing. You must also know whether you have a finite or infinite population.
- Different formulas for determining the suggested sample size exist, depending on whether you are gathering continuous or discrete data. These formulas are based on the level of precision desired, the required confidence level, and the estimated standard deviation of the data. These calculations can be adjusted to increase sample reliability for very small populations.
- It is important to maintain data accuracy and integrity. If your data's reliability becomes suspect, all subsequent decisions based on the corrupt or inaccurate data will be questionable. This can lead to costly mistakes. Some best practices include reviewing the collection process periodically, detecting and removing data entry errors, and recording changes as soon as possible.
Finding the most efficient means of gathering a representative sample and then following the best practices for data accuracy and maintaining data integrity will ensure your sampling efforts provide useful process improvement information.
Copyright 2008 SkillSoft.

RSS Feed