HKL2000 Online Manual  
Previous Scaling 
Table of Contents 
After each round of scaling, several statistics and graphs will be displayed in the top half of the window. This section contains 1) Postrefinement statistics, 2) overall statistics, and 3) ten diagnostic graphs.
The top of the screen will have global refinement statics, including the results from postrefinement, including the unit cell parameters and information about the number of reflections (Figure 90). If you have selected the Use Auto Corrections option, more statistics will be available on the left side of the display (Figure 91). If you have selected the No Postrefinement option, only information about the reflections will be available (Figure 92).
Figure 90. Global Refinement with postrefinement information
Figure 91. Global Refinement with postrefinement information and autocorrections
Figure 92. Global statistics with no postrefinement
Ten diagnostic graphs will appear below global refinement statistics. A scroll bar on the right allows you to view graphs that do not originally fit in the window. These graphs are:
Most of the graphs have an explanation button that describes the graph. For example, the explanation for the Completeness vs. Resolution describes what the different colored lines mean (different sigma levels). Graphs that are initially displayed as logarithmic graphs also have a linear scale button that will change the scale of the Yaxis. Note that some plots have two vertical columns; in most cases, the data displayed are colorcoded to match its axis. For example, blue data are plotted with respect to the blue axis and red data to the red axis. Some plots also have a button to toggle between linear and log representations.
When the initial round of scaling is finished, examine the output plots and adjust the Error Scale Factor to bring the overall χ^{2} down to around 1 (the overall χ^{2} is also explicitly specified at the end of the log file). After making changes to the Error Scale Factor, you must delete the reject file by clicking on the Delete Reject File button in the lower Controls panel, and then click scale sets again. Repeat these steps until the overall χ^{2} is acceptable then proceed to subsequent rounds of scaling by activating Use Rejections on Next Run and scale again. Repeat the cycle of scaling and rejecting until the number of new rejections has declined to just a few. At this point, the scaling is completed.
The first two charts can help you determine if your data collection strategy was appropriate and if the overall data collection went well. The Scale and B vs. Frame (Figure 93, right) to see if there are significant variations in intensity and Bfactor of frames across the data set, which could indicate the crystal rotating out of the beam. Variations can also result from the crystal morphology. For example, plates will have more intense reflections when the crystal's edge is along the beam than when the face of the plate is perpendicular to the beam. A dramatic rise of the Bfactor (to around 10) that does not decrease is usually an indication of radiation damage. The Completeness vs. Resolution plot (Figure 93, left) describes the completeness of the data for different resolution shells. The blue line represents all reflections, the purple reflections with I/σ greater than 3, the red reflections with I/σ greater then 5, the orange reflections with I/σ greater then 10, and the yellow reflections with I/σ greater than 20. This chart can be very informative for deciding if you have collected sufficient data to solve your structure. It is not necessary to achieve 100% completeness, but datasets with low completeness can severely hinder or thwart the process. Lowresolution reflections are particularly important for molecular replacement solutions.
Figure 93. The Scale and B vs. Frame and the Completeness vs. Resolution and I/Sigma and CC_{1/2 }vs. Resolution graphs
The I/Sigma and CC_{1/2 }vs. Resolution shows the ratio of the intensity to the error of the intensity (I/σ  blue line) and the CC_{1/2}, the correlation coefficient between intensity estimates of half data sets (upper red line), as a function of resolution (Figure 94). The traditional approach of selecting the maximum resolution was to set it to the resolution where your data falls 2σ, but recent analysis suggests that you can obtain useful information as long as the CC_{1/2 }is above 0.2. This graph will also include an I/σ plot for anomalous data if this data is present.
Figure 94. I/Sigma and CC_{1/2 }vs. Resolution
Detecting an anomalous signal is very easy. In the first step, the data are scaled and postrefined normally with the anomalous option selected (not the scale anomalous option!) This tells the program to output a *.sca file where the I+ and I reflections are separated. In this case, Scalepack will treat the I+ data and the I data as two separate measurements within a data set and the statistics that result from merging the two will reflect the differences between the I+ reflections and the I reflections. Obviously, for a centric reflection, there is no I+, so the merging statistics will only reflect the noncentric reflections.
The presence of an anomalous signal is detected by examining the χ^{2} values on the χ^{2} and RFactor vs. Resolution plot (Figure 95). The two χ^{2} curves show the χ^{2} values for each resolution shell, calculated for merged and unmerged Friedel pairs, in orange and blue respectively. Assuming that the errors scale factors were reasonable, and there is no useful anomalous signal in your data, both curves showing the χ^{2} resolution dependence should be flat and average around 1 when scaling with either merged or unmerged Friedel pairs. On the other hand, if χ^{2} > 1 and you see a clear resolution dependence of the χ^{2} for scaling with merged Friedel pairs (i.e., the χ^{2} increases as the resolution decreases), there is a strong indication of the presence of an anomalous signal. The resolution dependence allows you to determine where to cut off your resolution to calculate an anomalous difference Patterson map with an optimal signal to noise. Please note, however, that this analysis assumes that the error model is reasonable and gives you χ^{2} close to 1 when the anomalous signal is not present. You can only use scale anomalous when you have enough redundancy to treat the F+ and F completely independently
Figure 95. Anomalous signal detection for a strong (left) and moderate (right) anomalous signal.
The control box in the scaling panel (Figure 96) contains the Scale Sets button, and several other options that can only be performed after at least one round of scaling has been performed.
Figure 96. The Scaling Control panel
Scale Sets is the button that starts the scaling process using the parameters selected in the Scaling Options and Global Refinement boxes.
Adjust Error Model allow you to set the different error levels for each resolution shell, giving you more control over the error model than the overall effect of changing the Error Scale Factor.
The error model is the estimate of the systematic error for each of the resolution shells. There will be exactly the same number of error estimates here as there are number of zones.
The error estimates do not all have to be the same. The estimated error applies to the data which are read after this keyword, so you can apply different error scale factors to subsequent batches by repeating this input with different values. This is an important point if you enter data from a previous Scalepack output that does not need its σ to be increased.
The error estimates should be approximately equal to the Rfactors for the resolution shells where statistical errors are small, namely the lower resolution shells where the data is strong. This is a crude estimate of the systematic error (to be multiplied by I) and is usually invariant with resolution. The default is 0.03 (i.e. 3%) for all zones. Examine the difference between the total error and the statistical error in the final table of statistics (this can be viewed by clicking the show logfile button and scrolling to the bottom). The difference between these numbers tells you what contribution statistical error makes to the total error (σ). If the difference is small, then reasonable changes in the estimated error values following the will not help your χ^{2} much. This is because the estimated errors represent your guess and/or knowledge of the contribution of the systematic error to the total error, and a small difference indicates that systematic error is not contributing much. If the difference between total and statistical error is significant, and the χ^{2}s are far from 1, then consider adjusting the estimated error values in the affected resolution shells.
Figure 97. Adjusting the error model
Edit Rejection File will open the rejects fine in an external editor, allowing you to remove particular reflections from the list of rejected reflections. This is rarely necessary and not advised.
Show Redundancies will display a window that presents a histogram showing the percent of reflections with a specific number of redundancies. A redundancy value is the number of times a unique reflection has been measured after symmetryrelated reflections have been merged. Buttons for different resolution ranges are below the histogram, allowing you to view your redundancies in different bins of data (Figure 98).
Figure 98. Redundancies Window
The Display Frame button works in conjunction with the number input box next to it. It will open the frame specified in the XDisp program.
The Load Output File will open a file selection dialog and allow you to see the output from previous scaling rounds.
The Abort button will stop the current round of scaling. This may come in handy if you realize you have set a parameter wrong.
Reindex allows you to change the reference frame for the Miller indices, effectively changing the orientation of the lattice. For example, if the data has been indexed such that a screw symmetry is on the wrong axis, you can apply a permutation to get the symmetry element on the correct axis. The Reindexing dialog lets you transpose one unit cell axis to another axis by reassigning the Miller indices. For example, if you have indexed your data in the primitive orthorhombic crystal class, the analysis of systematic abscesses may indicate that you have one screw axis. By convention, this screw axis should be on the c axis. The original indexing might have this screw axis along a different axis; therefore, you will need to reindex the data to make it conform to space group conditions.
Reindexing may also be necessary when comparing two or more data sets that were collected and processed independently. Denzo, when confronted with a choice of more than one possible indexing convention, makes a random choice. This is not a problem unless it makes a different choice for a second data set because the two will not be compatible for scaling together without first reindexing. One cannot distinguish nonequivalent alternatives without scaling the data, which is why this is not done in Denzo (i.e., at the indexing or integration step). You can tell if you need to reindex one of two data set if the χ^{2} values upon merging the two are very high (e.g., 50). This makes sense when you consider that scaling two or more data sets together involves comparing reflections with the same hkl or index.
To reindex the data, use the reindex button. You will be presented with different reindexing options and a manual assignment option (Figure 99). If you are working with more than one data set, you can select which datasets will be reindexed, or you can reindex all of the data sets. For example, if you have a screw axis on the b & c axes, but they should be on the a & b axes, you should select the option "abc → bca: h←k, k ←lm k←h". After choosing reindexing options, click Reindex. This example is consistent with the POINTLESS results in Figure 104.
Figure 99. The Reindexing dialog
No reindexing, no new autoindexing, and nothing except changing the sign of y scale in Denzo can change the sign of the anomalous signal.
There are two ways to manually check whether or not you have systematic absences. The first is to look at the Intensities of systematic absences table at the bottom of the log file, which can be shown using the Show Log File button. If you have scaled your data in a space group that is expected to have systematic absences, then the reflections that should be missing will be listed at the bottom of the log file. The table lists these reflections' intensity, sigma, and I/sigma. The other way to evaluate the systematic absences is to view particular slices of reciprocal space using the reciprocal lattice button.
Show Log File will open the log file that resulted from the last round of scaling. Much of the data in this file is displayed in the graphs as described below. Most of the important information is in the tables at the end of the log. The Scalepack log file can be very long since it contains the list of all *.x files read in and a record of every cycle of scaling. The log file is divided into several sections. In order these are:
1. the list of *.x files read in, and the list of reflections from each of these files that will be rejected;
2. the output file name;
3. the total number of raw reflections used before combining partial reflections;
4. the initial scale and B factors for each frame, goniostat parameters, and the space group;
5. the initial scaling procedure;
6. analysis and results tables;
7. systematic absences (if expected)
8. automatic corrections statistics (if used)
The log file should be examined after each iteration. In particular, the errors, χ^{2} and both R factors should be checked.
Reciprocal Lattice will open a display one of three reciprocal lattices. There are two good reasons you may want to view the reciprocal lattice: 1) If you would like to verify the space group, especially the presence of a screw axis or centering and 2) if you want to visualize the completeness of the data. Clicking the on the Reciprocal Lattice button will reveal the choice of three different slices of the reciprocal lattice to view: h,0,l; h,k,0; 0,k,l. Looking at these slices can make it apparent if you have systematic absences or, conversely, if you are missing the data necessary to determine if you have systematic absences. The h,0,l slice of the P2_{1}2_{1}2_{1} data in Figure 100 clearly shows a screw axis along h and l because the odd reflections are systematically absent.
Figure 100. The reciprocal lattice viewer
Reprocess lets you take a step back and reindex all of the images using values calculated during postrefinement. You can use 1) the unit cell from prostrefinement, 2) the crystal orientation from postrefinement, and/or 3) the mosaicity from postrefinement (Figure 101). This dialog box allows you to select more than one of these parameters. This could be beneficial if there were large changes in these values as the data was initially processed. This option will overwrite your original *.x files.
Figure 101. Reprocessing options
Delete Reject File removes the file that contains the reflections that are excluded from scaling if you have the Use rejections on next run option selected. It is important to delete the reject file if you change any of the options that affect the error model.
Diagnostics will present you with a selector panel that will allow you to run three types of diagnostics: 1) Check Integration, 2) Check Spindle Movement, and Check Shutter Timing 3). The Check Integration option will display four of the graphs generated while the data was being integrated (Figure 102). If you have more than one data set, each set will have a tab. The resulting graphs should be relatively flat. If they are not, you may benefit from the Reprocess option described above.
The Check Spindle Movement calculates the Rmerge of your data using a macro that corrects for uneven rotations of the spindle axis. This is not a common problem but does happen. If no problems are found, you should see the message "No problems with uneven spindle movement were detected." Otherwise, you will get the message, "The decrease in Rmerge indicates uneven spindle movement." You should report this type of problem to the site administrator where you collected your data.
The Check Shutter Timing calculates the Rmerge of your data using a macro that corrects for uneven shutter timings. This is not a common problem but does happen. If no problems are detected, you should see the message "No problems with synchronization between shutter and spindle axis rotation were found," otherwise you will get the message "The decrease in Rmerge indicates poor synchronization between the shutter and the spindle axis." You should report this type of problem to the site administrator where you collected your data.
Figure 102. The Integration Graphs from the Inspect Integration option
Exclude Frames provides you with the flexibility to omit particular images from the scaling process. After integration, some frames may appear to have strong outliers in the integration information plots, like unacceptably high χ^{2} values i.e., far above the mean value. You can exclude those frames from scaling by clicking on the exclude frames button. A table with the list of frames will appear (Figure 103). Select a single frame by clicking on its box or on a point on the plot. Select a range of frames by checking Multiple Select and clicking twice, once on the first frame and once on the last. You may have to move to a different page with the Previous or Next buttons to see all frames. You may exclude frames above a certain limit by selecting the Select Greater Than option. This will turn on crosshairs controlled by the mouse. When you click, all frames with a χ^{2} higher than the horizontal crosshair. Frames marked in red will be excluded when you close the dialog and rescale. There is also an Auto Select button that will automatically select images with a χ^{2} higher than 10.
If you would like to see an individual frame, select the Inspect Object button, and then click on an image tile.
Figure 103. Frames selected to be excluded from scaling are marked red
Check Space Group and Space Group Diagram are described in detail in the next section.
Check in PDB will check the PDB to see if any deposited structures have a similar unit cell. It is unlikely, although not impossible, that two different proteins will crystallize with the same space group. This option was added to help alert you to the possibility that you have crystallized one of the proteins that have been known to copurify with recombinant proteins, such as maltosebinding protein or trypsin.
If your installation of HKL has been linked to CCP4, then the first approach to space group determination is to use the Check Space Group button. This button runs the CCP4 program POINTLESS and will present the results in an easy to read table. To use this button, first scale the data in the primary space group for the crystal lattice (the default space group selected after integration). The dialog will not only provide the probability for each possible space group but will also indicate if reindexing the data is necessary to ensure compatibility with space group conventions (Figure 104). The logfile from POINTLESS can be viewed. If you select the Apply button, the space group you select will be applied and put in the Space Group selection box. This will not, however, reindex your data, even if reindexing is suggested. To do this, see the section on reindexing box on page 105.
Figure 104. The Space Group Probability results from POINTLESS.
If your installation is not linked to CCP4, or if you would like to verify the suggested space group, Scalepack can be used to determine the space group of your crystal. What follows is a description of how you would continue from the lattice type given by Denzo to determine your space group. This whole analysis only applies to enantiomorphic compounds such as proteins. It does not apply to small molecules, necessarily, which may crystallize in centrosymmetric space groups. If you expect a centrosymmetric space group, you should use any space group which is a subgroup of the Laue class to which your crystal belongs. You also need enough data for this analysis to work so that you can see systematic absences.
To determine your space group using just HKL, follow these steps:
1. Determine the lattice type in Denzo.
2. Scale by the primary space group in Scalepack. The primary space groups are the first space groups in each Bravais lattice type in the table that follows this discussion. In the absence of lattice pseudosymmetries (e.g., monoclinic with β = 90°) the primary space group will not incorrectly relate symmetry related reflections. Note the χ^{2} statistics. Now try a higher symmetry space group (next down the list) and repeat the scaling, keeping everything else the same. If the χ^{2} is about the same, then you know that this is OK, and you can continue. If the χ^{2} are much worse, then you know that this is the wrong space group, and the previous choice was your space group. The exception is primitive hexagonal, where you should try P6_{1} after if P3_{1}21 and P3_{1}12 fail.
3. Examine the bottom of the log file (the show logfile button) or the simulated reciprocal lattice picture (the reciprocal lattice button) for systematic absences. If this was the correct space group, all of the systemic absence reflections should be absent (or with very small values). Compare this list with the listing of reflection conditions by each of the candidate space groups. The set of absences seen in your data that corresponds to the absences characteristic of the listed space groups identifies your space group or pair of space groups. Note that you cannot do any better than this (i.e., get the handedness of screw axes) without phase information.
4. If it turns out that your space group is orthorhombic and contains one or two screw axes, you may need to reindex to align the screw axes with the standard definition. If you have one screw axis, your space group is P222_{1}, with the screw axis along c. If you have two screw axes, then your space group is P2_{1}2_{1}2, with the screw axes along a and b. If the Denzo indexing is not the same as these, then you should reindex using the reindex button.
5. So far, this is the way to index according to the conventions of the International Tables. If you prefer to use a private convention, you may have to work out your own transformations. One such transformation has been provided in the case of space groups P2 and P2_{1}.
To manually select a space group, use the dropdown box. A selection list will appear with the 14 Bravais lattices. When you click on a lattice type, it will expand to show the available space groups within that lattice type. Click on a space group to select it.
Figure 105. Manual space group selection
Bravais Lattice 
Primary assigned Space Groups 
Candidates 
Reflection Conditions along screw axes 
Primitive Cubic 
P2_{1}3 
195  P23 



198  P2_{1}3 
(2n,0,0) 

P4_{1}32 
207  P432 



208  P4_{2}32 
(2n,0,0) 


212  P4_{3}32 
(4n,0,0)* 


213  P4_{1}32 
(4n,0,0)* 
I Centered Cubic 
I213 
197  I23 
* 


199  I213 
* 

I4132 
211  I432 



214  I4132 
(4n,0,0) 
F Centered Cubic 
F23 
196  F23 


F4_{1}32 
209  F432 



210  F4_{1}32 
(2n,0,0) 
Primitive Rhombohedral 
R3 
146  R3 


R32 
155  R32 

Primitive Hexagonal 
P3_{1} 
143  P3 



144  P3_{1} 
(0,0,3n)* 


145  P3_{2} 
(0,0,3n)* 

P3_{1}12 
149  P312 



151  P3_{1}12 
(0,0,3n)* 


153  P3_{2}12 
(0,0,3n)* 

P3_{1}21 
150  P321 



152  P3_{1}21 
(0,0,3n)* 


154  P3_{2}21 
(0,0,3n)* 

P61 
168  P6 



169  P61 
(0,0,6n)* 


170  P65 
(0,0,6n)* 


171  P6_{2} 
(0,0,3n)** 


172  P6_{4} 
(0,0,3n)** 


173  P6_{3} 
(0,0,2n) 

P6_{1}22 
177  P622 



178  P6_{1}22 
(0,0,6n)* 


179  P6_{5}22 
(0,0,6n)* 


180  P6_{2}22 
(0,0,3n)** 


181  P6_{4}22 
(0,0,3n)** 


182  P6_{3}22 
(0,0,2n) 
Primitive Tetragonal 
P4_{1} 
75  P4 



76  P4_{1} 
(0,0,4n)* 


77  P4_{2} 
(0,0,2n) 


78  P4_{3} 
(0,0,4n)* 

P4_{1}2_{1}2 
89  P422 



90  P4212 
(0,2n,0) 


91  P4_{1}22 
(0,0,4n)* 


95  P4_{3}22 
(0,0,4n)* 


93  P4_{2}22 
(0,0,2n) 


94  P4_{2}2_{1}2 
(0,0,2n),(0,2n,0) 


92  P4_{1}2_{1}2 
(0,0,4n),(0,2n,0)** 


96  P4_{3}2_{1}2 
(0,0,4n),(0,2n,0)** 
I Centered Tetragonal 
I4_{1} 
79  I4 



80  I4_{1} 
(0,0,4n) 

I4_{1}22 
97  I422 



98  I4_{1}22 
(0,0,4n) 
Primitive Orthorhombic 
P2_{1}2_{1}2_{1} 
16  P222 



17  P222_{1} 
(0,0,2n) 


18  P2_{1}2_{1}2 
(2n,0,0),(0,2n,0) 


19  P2_{1}2_{1}2_{1} 
(2n,0,0),(0,2n,0), 
C Centered Orthorhombic 
C222_{1} 
20  C222_{1} 
(0,0,2n) 


21  C222 

I Centered Orthorhombic 
I2_{1}2_{1}2_{1} 
23  I222 
* 


24  I2_{1}2_{1}2_{1} 
* 
F Centered Orthorhombic 
F222 
22  F222 

Primitive Monoclinic 
P2_{1} 
3  P2 



4  P2_{1} 
(0,2n,0) 
C Centered Monoclinic 
C2 
5  C2 

Primitive Triclinic 
P1 
1  P1 

Note that for the pairs of similar candidate space groups followed by the * (or **) symbol, scaling and merging of diffraction intensities cannot resolve which member of the possible pair of space groups to which your crystal form belongs. This is inherent to crystallography.
If you have already integrated your data, you may want to rescale the data without starting from scratch, either during the same session or later on. If you have just finished processing and scaling the data, changing the output file name and changing the parameters is all that is necessary to rescale the data in a different space group. If you are doing this to try to determine the space group, it's a good idea to put the space group in the output file (the .sca file) and the log file.
If you want to scale *.x files that you have previously processed in a new session of HKL, select scaling only when the initial site selection dialog opens, then set the Output Data Directory in the Data panel to be the directory where your *.x files are located using the directory tree. It is not necessary to set the Raw Data Directory. Next, click the Scale Sets Only button followed by load data sets. You will see a list of *.x files in the dialog box. Select the set you want to scale, followed by OK. If you want to scale additional sets of *.x files, repeat this process and add them to the list. The individual sets do not have to be in the same directory. Each set will have the file locations associated with it. Make sure the scale button is selected for each set you want to scale together. For example, if you want to scale two sets of *.x files together (say, from two different crystals, from high and low resolution passes of data collection, or from a native and a derivative, etc.) make sure that scale is clicked for each one. If you don't want them to be scaled together, then make sure that scale is not clicked. You don't have to worry about the Image Display or the Experiment Geometry since this was done when you first generated the *.x files.
The quality of Xray data is initially assessed by statistics. In smallmolecule crystallography, there is almost always a large excess of strong data, so this allows the crystallographer to discard a substantial amount of suspect data and still accurately determine a structure. Compared to small molecules, however, protein crystals diffract poorly. Moreover, important phase information comes from weak differences, and we must be sure these differences do not arise from the noise caused by the limitations of the Xray exposure and detection apparatus. As a result, we cannot simply throw away or statistically downweight marginal data without first making a sophisticated judgment about which data are good and which are bad.
To accurately describe the structure of a protein molecule, we often need higher resolution data than the crystal provides. That is life. One of the main judgments the crystallographer makes in assessing the quality of his data is thus the resolution to which his crystal diffracts. In making this judgment, we wish to use the statistical criteria which are most discriminatory and which are the least subjective. In practice, there are three ways of assessing the highresolution limit of diffraction. The first is the ratio of the intensity to the error of the intensity, i.e., I/σ. The second way, which is traditional, but inferior, is the agreement between symmetryrelated reflections, i.e., R_{merge}. The most recently adopted convention of selecting a data cutoff is to use data that has a CC_{1/2} > 0.2.
From a statistical point of view, I/σ is a superior criterion, for two reasons. First, it defines a resolution limit since by definition I/σ is the signal to noise of your measurements. In contrast, the R_{merge} is an unweighted statistic that does not take into account the measurement error.
Second, the σ assigned to each intensity derives its validity from the χ^{2}'s, which represent the weighted ratio of the difference between the observed I and the average value 〈I〉 squared, divided by the square of the error model and multiplied by a factor correcting for the correlation between I and 〈I〉. Since it depends on an explicit declaration of the expected error in the measurement, the user of the program is part of the Bayesian reasoning process behind the error estimation.
The essence of Bayesian reasoning in Scalepack is that you bring χ^{2} (or technically speaking, the goodnessoffit, which is related to the total χ^{2} by a constant) close to 1.0 by manipulating the parameters of the error model. R_{merge}, on the other hand, is an unweighted statistic which is independent of the error model. It is sensitive to both intentional and unintentional manipulation of the data used to calculate it, and may not correlate with the quality of the data. An example of this is seen when collecting more and more data from the same crystal. As the redundancy goes up, the final averaged data quality definitely improves, yet the R_{merge} also goes up. As a result, R_{merge} is only really useful when comparing data which has been accumulated and treated the same. This will be discussed again later.
In short, I/σ is the preferred way of assessing the quality of diffraction data because it derives its validity from the χ^{2} (likelihood) analysis. Unless all of the explicit and implicit assumptions which have been made in the calculation of an R_{merge} are known, this criterion is less meaningful. This is particularly true when searching for a number that can be used by others to critically evaluate your work.
There are two modes of analysis of data using χ^{2}s. The first mode keeps the &chi^{2} (or more precisely, the goodnessoffit) constant and compares the error models. This means that you are adjusting your estimates of the errors associated with the measurement until the deviations within observations agree with the expectation based on the error model.
The second mode keeps the error model constant and compares &chi^{2}s. This mode is computationally much faster and is used in refinement procedures. Of the two modes, the first is more informative, because it forces you to consider changes in the error model. Which mode you use generally depends on what you are comparing. When assessing the general quality of your detector, the first mode is used. When comparing a derivative to a native, the second mode is used due to an incomplete error model which does not take into account important factors like nonisomorphism. Thus, the &chi^{2} of scaling between a native and a derivative provides a measure of nonisomorphism, assuming the detector error is accurately modeled for both samples.
R_{merge} was historically used as an estimate of the nonisomorphism of data collected on film using several different crystals, and for this purpose, it still has some validity because we do not account for nonisomorphism in our error model. It is not so important now that complete Xray data sets are collected from single, frozen crystals.
One of the drawbacks of using R_{merge} as a measure of the quality of a data set is that it can be intentionally and unintentionally manipulated. Unintentional factors which can artificially lower R_{merge} generally have the effect of reducing the redundancy of the data or eliminating weaker observations. In crystallography, the greater the redundancy of the data is, the worse is the R_{merge}, because of the correlation between I and 〈I〉 which reduces the R_{merge}. The greater the redundancy, the lower is the correlation. For two measurements with the same σ, the correlation is 50%, so R_{merge} is underestimated by √2 compared to the case of no correlation. Known unintentional factors which lower R_{merge} include the following:
1. Data collected so that lower resolution shells, where the data is strong, have a higher redundancy than the higher resolution shells, where the data is generally weaker. This can be accomplished by collecting data on detectors where 2θ ≠ 0, or including data from the corners of rectangular or square image plates. There is nothing wrong with using this data; it will just artificially lower the R_{merge}.
2. Inclusion of single measurements in the calculation of R_{merge} in one widely used program, which is why a table using this erroneous calculation used to be presented in the Scalepack output. Although the bug in the program was unintentional, it nonetheless reduced the R_{merge,} and this may have accounted for the parameter's longevity. A second, more subtle bug that reduced R_{merge} prompted the introduction of the keyword background correction. Fortunately, both bugs have now been fixed, but the point is that errors of this type can persist.
3. The omission of negative or weak reflections from the calculation of R_{merge}. This is often undocumented behavior of crystallographic data scaling/merging software. Examples include:
a) elimination from the R_{merge} calculation the reflections that have negative intensities.
b) conversion of I < 0 to I = 0 before the calculation of R_{merge} and inclusion of this reflection in the data set (the statistics of such a type are included in the Scalepack output for reasons of comparison. This is the first R_{merge} table in the log file, not the final one).
c) omitting reflections with 〈I〉 < 0 from the calculation of R_{merge} but the inclusion of these reflections in the output data set. Default σ cutoffs set to unreasonable values, like 1. This is, in fact, the default of the software used commonly to process image plate data.
4. Use of unreasonable default or recommended rejection criteria in other programs. These eliminate individual Is which should contribute to R_{merge} and yet are still statistically sound measurements.
5. Use of the eigenvalue filter to determine the overall B factor of a data set collected on a nonfrozen, decaying crystal. In this case, the eigenvalue filter will calculate an overall B factor, which is appropriate for the middle of the data set, yet apply this to all data. As a result, the highresolution data will be downweighted compared to data processed with the first, least decayed frame as the reference. The highresolution data is generally weaker than the lowresolution data, and as a result, is more likely to result in higher R_{merge}. By downweighting the highresolution data, the R_{merge} is artificially lowered. Any program which does not allow the option of setting the reference frame will have this problem. Of course, there is no problem with nondecaying crystals.
There are also intentional ways of lowering your R_{merge}. Like those ways listed above, they generally result from the statistically invalid elimination of weak reflections, reduction of the redundancy of the data, or deemphasis of weak data. The difference between these methods and those listed above is that they are generally under the control of the user.
1. Use of an unreasonable sigma cutoff (e.g.≥ 0). The rejection of weak data will always improve R_{merge}. There is a further discussion of sigma cutoff in the Scalepack Keywords Descriptions section.
2. Use of a resolution limit cutoff. Again, the omission of weak data will improve R_{merge}. A reasonable resolution cutoff is the zone where I/σ < 2.
3. Combining into a single zone for the purposes of calculations those resolution shells where R_{merge} is rapidly changing. In this case, the shell will be dominated by the strong data at the lowresolution end of the zone and give the impression that the highresolution limit of the zone has better statistics than it does. For example, if you combined all your data into a single zone, the R_{merge} in the final shell would be pretty good (R_{merge} overall), when in fact it was substantially worse. As described above, it is more sensible to divide your zones into at least 10 (or preferably more) equal volumes.
4. Omitting partially recorded reflections. This has the effect of a) reducing the redundancy, and b) eliminating poorer reflections. Partially recorded reflections will always have a higher σ associated with them because they have a higher total background, due to the inclusion of background from more than one frame in the reflection.
5. Scaling I+ and I reflections separately in the absence of a legitimate anomalous signal (scale anomalous). This has the effect of reducing the redundancy.
6. Ignoring overloaded reflections using the ignore overloads in Scalepack. The intensity of overloaded or saturated, reflections cannot be directly measured because some of the pixels are saturated. Profile fitting only measures these reflections indirectly, by fitting a calculated profile to the spot using the information contained in the wings or tail of the spot. Ignoring the inaccuracies inherent in this method by ignoring overloads may have a dramatic effect on R_{merge}. Note that in case of molecular replacement the option INCLUDE OVERLOADS should be used.
ignore overloads is often a useful tool. For example, when calculating anomalous differences, you do not want to use overloaded reflections because you are looking for very, very small differences and want to use only the most accurate data. Another time you might ignore overloads is when you collect multipass data. In this case, a crystal is exposed twice, once for a short time, the other for a longer time. The longer exposure is to sufficiently darken the high resolution reflections but will result in saturated lowresolution reflections. Since the lowresolution reflections can be obtained from the short exposures, the overloaded ones can be ignored in the long exposures.
A statistically sensible cutoff for the resolution of a diffraction data set is that shell where the average I/σ is 2 after correctly integrating and scaling the data. The resolution of the data is a distinct criterion from its completeness. The best data will be 100% complete. Completeness may suffer due to anisotropic diffraction, overlapped reflections, or geometrical constraints (data from the corners of the detector was used, beam stop or cooling nozzle shadows, etc.). In order to properly estimate the resolution of the data, you have to take into account the I/σ, R_{sym} and the completeness of the data.
.