Miscellaneous
This is a collection of various additional features. Among others, time series manipulation and plotting.
Time series
- sfctools.misc.timeseries.convert_numeric(df)
converts all column of a dataframe
- Parameters
df – a pandas dataframe
- Returns
converted dataframe
- sfctools.misc.timeseries.convert_quarterly_to_datetime(df)
converts a dataframe with a string index of form ‘2020-Q2’ (typical Eurostat format) to a dataframe with a datetime index
- Parameters
df – a pandas dataframe
- Returns
df_converted, a pandas dataframe with datetime-formatted index
- sfctools.misc.timeseries.cross_correlate(x1, x2, lags=15)
computes the cross-correlation of x1 and x2 using plt.xcorr returns dataframe with xcorr data
- sfctools.misc.timeseries.cross_correlate_plot(x1, x2, normalize=True, ax=None, show_plot=True, ylabel='Cross-Correlation', lags=15, color='black')
generates a cross-correlation plot of x1 and x2
- Parameters
lags – maxium number of lags to show
ylabel –
normalize –
color – color of the plot
- sfctools.misc.timeseries.difference(x0)
computes difference of x0, i.e. [x1-x0] of a pandas time series
- sfctools.misc.timeseries.interpolate(y, stretch_factor, kind='cubic')
interpolates an array using scipy interp1d.
x is datetime, y is float or somehow numerical stretch_factor = 3 # quarter to month
- sfctools.misc.timeseries.log_difference(x0)
computes difference of log[x0], i.e. log[x1/x0] of a pandas time series
- sfctools.misc.timeseries.percentage_change(x0)
computes difference of x0, i.e. [x1-x0] of a pandas time series
- sfctools.misc.timeseries.stretch_datetime(x, factor)
stretches a series or array containing datetime objects by a certain factor
- Parameters
x – an array
factor – int, factor > 1 by which to stretch the data
- sfctools.misc.timeseries.stretch_pandas(s, factor, kind='cubic')
stretch a pandas dataframe or series by a certain factor
- Parameters
s – pandas dataframe or series
factor – factor by which you want to stretch
- sfctools.misc.timeseries.stretch_to_length(df, new_length, method='cubic')
stretches a dataframe with datetime index to a new, higher length (by interpolation of the values).
- Parameters
df – a pandas data frame with datetime-formatted index
new_length – new desired length
method – intrepolation style (default ‘cubic’)
- Return df_new
the stretched dataframe
Plotting
- sfctools.misc.mpl_plotting.matplotlib_barplot(data, xlabel, ylabel, title, color='indigo', hatches=None, size=(6, 5), tight_layout=True, fmt='{0:f}', stacked=False, show_labels=True, legend='best', show=True, yerr=None, ax=None)
Creates a bar plot in matplotlib. The ‘macro’ plotting style will be used if it is set up
- Parameters
xlabel – x axis label
ylabel – y axis label
title – plot title
color – plot color
hatches – hatch pattern
size – figure size
tight_layout – re-format plot
fmt – number format, {0:d} or {0:f} for example
stacked – stack column? or show side-by side (boolean)
legend – location of legend, or ‘off’
show – if False, only the figure is returned. Else plot is shown
ax – axis to plot on
yerr – data for error bar (if any)
- Paarm show_labels
show the column labels at bottom? (boolean)
- sfctools.misc.mpl_plotting.matplotlib_lineplot(data, xlabel=None, ylabel=None, title='', xlim=None, ylim=None, color='indigo', legend='best', marker=None, show=False, ax=None)
creates a line plot in matplotlib. The ‘macro’ plotting style will be used if it is set up
- Parameters
xlabel – x axis label
ylabel – y axis label
title – plot title
color – plot color
ylim (xlim,) – tuples of plot limits
legend – location of legend, or ‘off’, or ‘outside’
show – if False, only the figure is returned. Else plot is shown
ax – axis to plot on
when overlapping barplot and lineplot, make sure you add an additional axis, see https://stackoverflow.com/questions/42948576/pandas-plot-does-not-overlay
- sfctools.misc.mpl_plotting.plot_sankey(data, title='', show_plot=True, show_values=True, colors=None, label_rot=65, fontsize=10, separation=0.8, norm_factor=1.5, min_width=0.1, dy=10, dx=55, dx_space=8.5, alpha=0.5, filling_fraction=0.85)
plots a sankey diagram from data
- Parameters
data – list of pandas dataframes, of the following format. length at least two.
title – title of the plot
show_plot – boolean switch to show plot window (default True). If False, figure is returned
show_values – boolean switch to print numerical values of data as text label (default True)
colors – (optional) a list of colors (if None, default colors are chosen)
label_rot – rotation of the labels (default 65 degrees)
fontsize – size of label font
separation – separation point of the label along the bands, between 0 and 1,
norm_factor – normalization factor for width
min_width – minimum width of bands
dy – vertical distance of bands
dx – horizontal distance of bands
dx_space – space between layers
filling_fraction – fraction by which the bands align (scaling with dy)
- Return fig
None or matplotlib figure
Example:
from
to
value
color_id
A
C
1.0
0
A
D
2.0
1
B
D
3.0
2
from
to
value
color_id
C
E
5.0
1
C
F
6.0
2
D
G
8.0
3
Reporting
- class sfctools.misc.reporting_sheet.DistributionReport(xlabel, ylabel, data=None)
Bases:
object
A generic class for a logger of a distribution (i.e. )
- Parameters
xlabel – x axis title of the report
ylabel – y axis title of the report
data – initial data (default None). Has to have ‘append’ method, ideally a list of tuples (data_i, tag)
- add_data(x, label=None)
adds data into the data structure. (stores a sorted version of x).
- Parameters
x – list or numpy array to store
label – some additional tag
- plot_data(ax, color=None, s=None)
plot the data onto a matplotlib axis
- Parameters
ax – a matplotlib axis
color – plotting color
s – scatter size
- class sfctools.misc.reporting_sheet.IndicatorReport(xlabel, ylabel, data=None)
Bases:
object
A generic class for logging scalar values (‘indicators’).
- Parameters
xlabel – x axis label, str
ylabel – y axis label, str
data – some initial data (default None). Has to have ‘append’ method
- add_data(x)
inserts some data into the data structure
- Parameters
x – a scalar value
- classmethod getitem(key)
retrieves a certain report from the instances created.
- Parameters
key – the ylabel of the respective report
- plot_data(ax, s=1.8, color='black', scatter_color='gray')
plots the data onto a matplotlib axis
- Parameters
ax – a matplotlib axis
- class sfctools.misc.reporting_sheet.ReportingSheet(instances=None)
Bases:
object
Reporting sheet is an overview sheet for reporting model results in form of a grid plot.
- Parameters
instances – default None, if not None: list of Report instances can be passed here
- add_report(report)
Adds a report item to this sheet.
- Parameters
report – any IndicatorReport or DistributionReport
- plot(show_plot=True, verbose=False)
Generates a plot grid using matplotlib
- Parameters
show_plot – show the figure in a window? default True. If False, figure is returned
verbose – print number of rows and columns if True (default False)
- Return fig
matplotlib figure
- to_dataframe()
combines the data into a single pandas dataframe
- to_latex()
Generates latex code NOT YET IMPLEMENTED