FreeSAS Package¶
This is the public API of FreeSAS. Since most of the number crunching part is implemented in Cython, it is not documented here: please refer to the source code.
freesas.align¶
-
class
InputModels
[source]¶ Bases:
object
-
assign_models
(molecule=None)[source]¶ Create SASModels from pdb files saved in self.inputfiles and saved them in self.models. Center of mass, inertia tensor and canonical parameters are computed for each SASModel.
- Parameters
molecule – optional 2d array, coordinates of the atoms for the model to create
- Return self.models
list of SASModel
-
rcalculation
()[source]¶ Calculation the maximal value for the R-factors, which is the mean of all the R-factors of inputs plus 2 times the standard deviation. R-factors are saved in the attribute self.rfactors, 1d array, and in percentage.
- Return rmax
maximal value for the R-factor
-
models_selection
()[source]¶ Check if each model respect the limit for the R-factor
- Return self.validmodels
1d array, 0 for a non valid model, else 1
-
rfactorplot
(filename=None, save=False)[source]¶ Create a png file with the table of R factor for each model. A threshold is computed to discarded models with Rfactor>Rmax.
- Parameters
filename – filename for the figure, default to Rfactor.png
save – save automatically the figure if True, else show it
- Return fig
the wanted figures
-
-
class
AlignModels
(files, slow=True, enantiomorphs=True)[source]¶ Bases:
object
Used to align DAM from pdb files
-
assign_models
()[source]¶ Create SASModels from pdb files saved in self.inputfiles and saved them in self.models. Center of mass, inertia tensor and canonical parameters are computed for each SASModel.
- Return self.models
list of SASModel
-
optimize
(reference, molecule, symmetry)[source]¶ Use scipy.optimize to optimize transformation parameters to minimize NSD
- Parameters
reference – SASmodel
molecule – SASmodel
symmetry – 3-list of +/-1
- Return p
transformation parameters optimized
- Return dist
NSD after optimization
-
alignment_sym
(reference, molecule)[source]¶ Apply 8 combinations to the molecule and select the one which minimize the distance between it and the reference.
- Parameters
reference – SASModel, the one which do not move
molecule – SASModel, the one wich has to be aligned
- Return combinaison
best symmetry to minimize NSD
- Return p
transformation parameters optimized if slow is true, unoptimized else
-
makeNSDarray
()[source]¶ Calculate the NSD correlation table and save it in self.arrayNSD
- Return self.arrayNSD
2d array, NSD correlation table
-
plotNSDarray
(rmax=None, filename=None, save=False)[source]¶ Create a png file with the table of NSD and the average NSD for each model. A threshold is computed to segregate good models and the ones to exclude.
- Parameters
rmax – threshold of R factor for the validity of a model
filename – filename for the figure, default to nsd.png
save – save automatically the figure if True, else show it
- Return fig
the wanted figures
-
find_reference
()[source]¶ Find the reference model among the models aligned. The reference model is the one with lower average NSD with other models.
- Return ref_number
position of the reference model in the list self.models
-
freesas.autorg¶
Functions for calculating the radius of gyration and forward scattering intensity.
-
auto_gpa
(data, Rg_min=1.0, qRg_max=1.3, qRg_min=0.5)[source]¶ Uses the GPA theory to guess quickly Rg, the radius of gyration and I0, the forwards scattering
The theory is described in Guinier peak analysis for visual and automated inspection of small-angle X-ray scattering data Christopher D. Putnam J. Appl. Cryst. (2016). 49, 1412–1419
This fits sqrt(q²Rg²)*exp(-q²Rg²/3)*I0/Rg to the curve I*q = f(q²)
The Guinier region goes arbitrary from 0.5 to 1.3 q·Rg qRg_min and qRg_max can be provided
- Parameters
data – the raw data read from disc. Only q and I are used.
Rg_min – the minimal accpetable value for the radius of gyration
qRg_max – the default upper bound for the Guinier region.
qRg_min – the default lower bound for the Guinier region.
- Returns
autRg result with limited information
-
auto_guinier
(data, Rg_min=1.0, qRg_max=1.3, relax=1.2)[source]¶ Yet another implementation of the Guinier fit
The idea: * extract the reasonable range * convert to the Guinier space (ln(I) = f(q²) * scan all possible intervall * keep any with qRg_max<1.3 (or 1.5 in relaxed mode) * select the begining and the end of the guinier region according to the contribution of two parameters:
(q_max·Rg - q_min·Rg)/qRg_max –> in favor of large ranges
1 / RMSD –> in favor of good quality data
For each start and end point, the contribution of all ranges are averaged out (using histograms) The best solution is the start/end position with the maximum average.
All ranges within this region are averaged out to measure Rg, I0 and more importantly their deviation.
The quality is still to be calculated
Aggergation is assessed according a second order polynom fit.
- Parameters
data – 2D array with (q,I,err)
Rg_min – minimum value for Rg
qRg_max – upper bound of the Guinier region
relax – relaxation factor for the upper bound
resolution – step size of the slope histogram
- Returns
autRg result
freesas.average¶
-
class
Grid
(inputfiles)[source]¶ Bases:
object
This class is used to create a grid which include all the input models
-
spatial_extent
()[source]¶ Calculate the maximal extent of input models
- Return self.size
6-list with x,y,z max and then x,y,z min
-
-
class
AverModels
(inputfiles, grid)[source]¶ Bases:
object
Provides tools to create an averaged models using several aligned dummy atom models
-
read_files
(reference=None)[source]¶ Read all the pdb file in the inputfiles list, creating SASModels. The SASModels created are save in a list, the reference model is the first model in the list.
- Parameters
reference – position of the reference model file in the inputfiles list
-
calc_occupancy
(griddot)[source]¶ Assign an occupancy and a contribution factor to the point of the grid.
- Parameters
griddot – 1d-array, coordinates of a point of the grid
- Return tuple
2-tuple containing (occupancy, contribution)
-
assign_occupancy
()[source]¶ For each point of the grid, total occupancy and contribution factor are computed and saved. The grid is then ordered with decreasing value of occupancy. The fourth column of the array correspond to the occupancy of the point and the fifth to the contribution for this point.
- Return sortedgrid
2d-array, coordinates of each point of the grid
-
freesas.bift¶
Bayesian Inverse Fourier Transform
This code is the implementation of Steen Hansen J. Appl. Cryst. (2000). 33, 1415-1421
Based on the BIFT from Jesse Hopkins, available at: https://sourceforge.net/p/bioxtasraw/git/ci/master/tree/bioxtasraw/BIFT.py
Many thanks to Pierre Paleo for the auto-alpha guess
-
auto_bift
(data, Dmax=None, alpha=None, npt=100, start_point=None, end_point=None, scan_size=11, Dmax_over_Rg=3)[source]¶ Calculates the inverse Fourier tranform of the data using an optimisation of the evidence
- Parameters
data – 2D array with q, I(q), δI(q). q can be in 1/nm or 1/A, it imposes the unit for r & Dmax
Dmax – Maximum diameter of the object, this is the starting point to be refined. Can be guessed
alpha – Regularisation parameter, let it to None for automatic scan
npt – Number of point for the curve p(r)
start_point – First useable point in the I(q) curve, this is not the start of the Guinier region
end_point – Last useable point in the I(q) curve
scan_size – size of the initial geometrical scan for alpha values.
Dmax_over_Rg – In average, protein’s Dmax is 3x Rg, use this to adjust
- Returns
BIFT object. Call the get_best to retrieve the optimal solution
freesas.collections¶
Set of namedtuples defined a bit everywhere
-
class
RG_RESULT
(Rg, sigma_Rg, I0, sigma_I0, start_point, end_point, quality, aggregated)¶ Bases:
tuple
-
I0
¶ Alias for field number 2
-
Rg
¶ Alias for field number 0
-
aggregated
¶ Alias for field number 7
-
end_point
¶ Alias for field number 5
-
quality
¶ Alias for field number 6
-
sigma_I0
¶ Alias for field number 3
-
sigma_Rg
¶ Alias for field number 1
-
start_point
¶ Alias for field number 4
-
-
class
FIT_RESULT
(slope, sigma_slope, intercept, sigma_intercept, R, R2, chi2, RMSD)¶ Bases:
tuple
-
R
¶ Alias for field number 4
-
R2
¶ Alias for field number 5
-
RMSD
¶ Alias for field number 7
-
chi2
¶ Alias for field number 6
-
intercept
¶ Alias for field number 2
-
sigma_intercept
¶ Alias for field number 3
-
sigma_slope
¶ Alias for field number 1
-
slope
¶ Alias for field number 0
-
-
class
RT_RESULT
(Vc, sigma_Vc, Qr, sigma_Qr, mass, sigma_mass)¶ Bases:
tuple
-
Qr
¶ Alias for field number 2
-
Vc
¶ Alias for field number 0
-
mass
¶ Alias for field number 4
-
sigma_Qr
¶ Alias for field number 3
-
sigma_Vc
¶ Alias for field number 1
-
sigma_mass
¶ Alias for field number 5
-
-
class
RadiusKey
(Dmax, npt)¶ Bases:
tuple
-
Dmax
¶ Alias for field number 0
-
npt
¶ Alias for field number 1
-
-
class
PriorKey
(type, npt)¶ Bases:
tuple
-
npt
¶ Alias for field number 1
-
type
¶ Alias for field number 0
-
-
class
TransfoValue
(transfo, B, sum_dia)¶ Bases:
tuple
-
B
¶ Alias for field number 1
-
sum_dia
¶ Alias for field number 2
-
transfo
¶ Alias for field number 0
-
-
class
EvidenceKey
(Dmax, alpha, npt)¶ Bases:
tuple
-
Dmax
¶ Alias for field number 0
-
alpha
¶ Alias for field number 1
-
npt
¶ Alias for field number 2
-
-
class
EvidenceResult
(evidence, chi2r, regularization, radius, density, converged)¶ Bases:
tuple
-
chi2r
¶ Alias for field number 1
-
converged
¶ Alias for field number 5
-
density
¶ Alias for field number 4
-
evidence
¶ Alias for field number 0
-
radius
¶ Alias for field number 3
-
regularization
¶ Alias for field number 2
-
-
class
StatsResult
(radius, density_avg, density_std, evidence_avg, evidence_std, Dmax_avg, Dmax_std, alpha_avg, alpha_std, chi2r_avg, chi2r_std, regularization_avg, regularization_std, Rg_avg, Rg_std, I0_avg, I0_std)¶ Bases:
tuple
-
Dmax_avg
¶ Alias for field number 5
-
Dmax_std
¶ Alias for field number 6
-
I0_avg
¶ Alias for field number 15
-
I0_std
¶ Alias for field number 16
-
Rg_avg
¶ Alias for field number 13
-
Rg_std
¶ Alias for field number 14
-
alpha_avg
¶ Alias for field number 7
-
alpha_std
¶ Alias for field number 8
-
chi2r_avg
¶ Alias for field number 9
-
chi2r_std
¶ Alias for field number 10
-
density_avg
¶ Alias for field number 1
-
density_std
¶ Alias for field number 2
-
evidence_avg
¶ Alias for field number 3
-
evidence_std
¶ Alias for field number 4
-
radius
¶ Alias for field number 0
-
regularization_avg
¶ Alias for field number 11
-
regularization_std
¶ Alias for field number 12
-
save
(filename, source=None)¶ Save the results of the fit to the file
-
freesas.cormap¶
-
class
LongestRunOfHeads
[source]¶ Bases:
object
Implements the “longest run of heads” by Mark F. Schilling The College Mathematics Journal, Vol. 21, No. 3, (1990), pp. 196-207
See: http://www.maa.org/sites/default/files/pdf/upload_library/22/Polya/07468342.di020742.02p0021g.pdf
-
A
(n, c)[source]¶ Calculate A(number_of_toss, length_of_longest_run)
- Parameters
n – number of coin toss in the experiment, an integer
c – length of the longest run of
- Returns
The A parameter used in the formula
-
B
(n, c)[source]¶ Calculate B(number_of_toss, length_of_longest_run) to have either a run of Heads either a run of Tails
- Parameters
n – number of coin toss in the experiment, an integer
c – length of the longest run of
- Returns
The B parameter used in the formula
-
probaHeadOrTail
(n, c)[source]¶ Calculate the probability of a longest run of head or tails to occur
- Parameters
n – number of coin toss in the experiment, an integer
c – length of the longest run of heads or tails, an integer
- Returns
The probablility of having c subsequent heads or tails in a n toss of fair coin
-
probaLongerRun
(n, c)[source]¶ Calculate the probability for the longest run of heads or tails to exceed the observed length
- Parameters
n – number of coin toss in the experiment, an integer
c – length of thee observed run of heads or tails, an integer
- Returns
The probablility of having more than c subsequent heads or tails in a n toss of fair coin
-
-
gof
(data1, data2)[source]¶ Calculate the probability for a couple of dataset to be equivalent
Implementation according to: http://www.nature.com/nmeth/journal/v12/n5/full/nmeth.3358.html
- Parameters
data1 – numpy array
data2 – numpy array
- Returns
probablility for the 2 data to be equivalent
freesas.invariants¶
This module is mainly about the calculation of the Rambo-Tainer invariant described in:
https://dx.doi.org/10.1038%2Fnature12070
Some formula taken from Putnam et al, 2007, Table 1 in the review
-
extrapolate
(data, guinier)[source]¶ Extrapolate SAS data according to the Guinier fit until q=0 Uncertainties are extrapolated (linearly) from the Guinier region
- Parameters
data – SAS data in q,I,dI format
guinier – result of a Guinier fit
- Returns
extrapolated SAS data
-
calc_Porod
(data, guinier)[source]¶ Calculate the particle volume according to Porod’s formula:
V = 2*π²I₀²/(sum_q I(q)q² dq)
Formula from Putnam’s review, 2007, table 1 Intensities are extrapolated to q=0 using Guinier fit.
- Parameters
data – SAS data in q,I,dI format
Guinier – result of a Guinier fit (instance of RT_RESULT)
- Returns
Volume calculated according to Porrod’s formula
-
calc_Vc
(data, Rg, dRg, I0, dI0, imin)[source]¶ Calculates the Rambo-Tainer invariant Vc, including extrapolation to q=0
- Parameters
data – SAS data in q,I,dI format, cropped to maximal q that should be used for calculation (normally 2 nm-1)
Rg,dRg,I0,dI0 – results from Guinier approximation/autorg
imin – minimal index of the Guinier range, below that index data will be extrapolated by the Guinier approximation
- Returns
Vc and an error estimate based on non-correlated error propagation
-
calc_Rambo_Tainer
(data, guinier, qmax=2.0)[source]¶ calculates the invariants Vc and Qr from the Rambo & Tainer 2013 Paper, also the the mass estimate based on Qr for proteins
- Parameters
data – data in q,I,dI format, q in nm^-1
guinier – RG_RESULT instance with result from the Guinier fit
qmax – maximum q-value for the calculation in nm^-1
@return: dict with Vc, Qr and mass plus errors
freesas.model¶
-
delta_expand
(vec1, vec2)[source]¶ Create a 2d array with the difference vec1[i]-vec2[j]
- Parameters
vec2 (vec1,) – 1d-array
- Return v1 - v2
difference for any element of v1 and v2 (i.e a 2D array)
-
class
SASModel
(molecule=None)[source]¶ Bases:
object
Tools for Dummy Atoms Model manipulation
-
read
(filename)[source]¶ Read the PDB file, extract coordinates of each dummy atom, extract the R-factor of the model, coordinates of each dummy atom and pdb file header.
- Parameters
filename – name of the pdb file to read
-
save
(filename)[source]¶ Save the position of each dummy atom in a PDB file.
- Parameters
filename – name of the pdb file to write
-
centroid
()[source]¶ Calculate the position of the center of mass of the molecule.
- Return self.com
1d array, coordinates of the center of mass of the molecule
-
inertiatensor
()[source]¶ calculate the inertia tensor of the protein
- Return self.inertensor
inertia tensor of the molecule
-
canonical_translate
()[source]¶ Calculate the translation matrix to translate the center of mass of the molecule on the origin of the base.
- Return trans
translation matrix
-
canonical_rotate
()[source]¶ Calculate the rotation matrix to align inertia momentum of the molecule on principal axis.
- Return rot
rotation matrix det==1
-
canonical_parameters
()[source]¶ Save the 6 canonical parameters of the initial molecule: x0, y0, z0, the position of the center of mass phi, theta, psi, the three Euler angles of the canonical rotation (axis:x,y’,z’’)
-
calc_invariants
(use_cython=True)[source]¶ - Calculate the invariants of the structure:
fineness, ie. average distance between an atoms and its nearest neighbor
radius of gyration of the model
diameter of the model
- Return invariants
3-tuple containing (fineness, Rg, Dmax)
-
property
fineness
¶
-
property
Rg
¶
-
property
Dmax
¶
-
dist
(other, molecule1, molecule2, use_cython=True)[source]¶ Calculate the distance with another model.
- Parameters
self,other – two SASModel
molecule1 – 2d array of the position of each atom of the first molecule
molecule2 – 2d array of the position of each atom of the second molecule
- Return D
NSD between the 2 molecules, in their position molecule1 and molecule2
-
transform
(param, symmetry, reverse=None)[source]¶ Calculate the new coordinates of each dummy atoms of the molecule after a transformation defined by six parameters and a symmetry
- Parameters
param – 6 parameters of transformation (3 coordinates of translation, 3 Euler angles)
symmetry – list of three constants which define a symmetry to apply
- Return mol
2d array, coordinates after transformation
-
dist_after_movement
(param, other, symmetry)[source]¶ The first molecule, molref, is put on its canonical position. The second one, mol2, is moved following the transformation selected
- Parameters
param – list of 6 parameters for the transformation, 3 coordinates of translation and 3 Euler angles
symmetry – list of three constants which define a symmetry to apply
- Return distance
the NSD between the first molecule and the second one after its movement
-
freesas.plot¶
Functions to generating graphs related to SAS.
-
scatter_plot
(data, guinier=None, ift=None, filename=None, img_format='svg', unit='nm', title='Scattering curve', ax=None, labelsize=None, fontsize=None)[source]¶ Generate a scattering plot I = f(q) in semi_log_y.
- Parameters
data – data read from an ASCII file, 3 column (q,I,err)
filename – name of the file where the cuve should be saved
img_format – image format
unit – Unit name for Rg and 1/q
guinier – output of autoRg
ift – converged instance of BIFT (output of auto_bift)
ax – subplot where the plot shall go in
- Returns
the matplotlib figure
-
kratky_plot
(data, guinier, filename=None, img_format='svg', unit='nm', title='Dimensionless Kratky plot', ax=None, labelsize=None, fontsize=None)[source]¶ Generate a Kratky plot q²Rg²I/I₀ = f(q·Rg)
- Parameters
data – data read from an ASCII file, 3 column (q,I,err)
guinier – output of autoRg
filename – name of the file where the cuve should be saved
img_format – image format
unit – Unit name for Rg and 1/q
ax – subplot where the plot shall go in
- Returns
the matplotlib figure
-
guinier_plot
(data, guinier, filename=None, img_format='png', unit='nm', ax=None, labelsize=None, fontsize=None)[source]¶ Generate a guinier plot: ln(I) = f(q²)
- Parameters
data – data read from an ASCII file, 3 column (q,I,err)
guinier – A RG_RESULT object from AutoRg
filename – name of the file where the cuve should be saved
img_format – image format
- Param
ax: subplot where to plot in
- Returns
the matplotlib figure
-
density_plot
(ift, filename=None, img_format='png', unit='nm', ax=None, labelsize=None, fontsize=None)[source]¶ Generate a density plot p(r)
- Parameters
ift – An IFT result comming out of BIFT
filename – name of the file where the cuve should be saved
img_format – image image format
ax – subplotib where to plot in
- Returns
the matplotlib figure
-
hplc_plot
(hplc, fractions=None, title='Chromatogram', filename=None, img_format='png', ax=None, labelsize=None, fontsize=None)[source]¶ Generate an HPLC plot I=f(t)
- Parameters
hplc – stack of diffraction data
fractions – list of 2tuple with first and last ndex if each fraction
filename – name of the file where the cuve should be saved
img_format – image image format
ax – subplotib where to plot in
- Returns
the matplotlib figure
freesas.sasio¶
Contains helper functions for loading SAS data from differents sources.
-
load_scattering_data
(filename: Union[os.PathLike, str, bytes, _io.StringIO, _io.BytesIO]) → numpy.ndarray[source]¶ Load scattering data q, I, err into a numpy array.
- Parameters
filename – ASCII file, 3 column (q,I,err)
- Returns
numpy array with 3 column (q,I,err)
-
parse_ascii_data
(input_file_text: List[str], number_of_columns: int) → numpy.ndarray[source]¶ Parse data from an ascii file into an N column numpy array
- Parameters
input_file_text – List containing one line of input data per element
number_of_columns – Expected number of lines in the data file
- Returns
numpy array with 3 column (q,I,err)