Miscellaneous

This is a collection of various additional features. Among others, time series manipulation and plotting.

Time series

sfctools.misc.timeseries.convert_numeric(df)

converts all column of a dataframe

Parameters

df – a pandas dataframe

Returns

converted dataframe

sfctools.misc.timeseries.convert_quarterly_to_datetime(df)

converts a dataframe with a string index of form ‘2020-Q2’ (typical Eurostat format) to a dataframe with a datetime index

Parameters

df – a pandas dataframe

Returns

df_converted, a pandas dataframe with datetime-formatted index

sfctools.misc.timeseries.cross_correlate(x1, x2, lags=15)

computes the cross-correlation of x1 and x2 using plt.xcorr returns dataframe with xcorr data

sfctools.misc.timeseries.cross_correlate_plot(x1, x2, normalize=True, ax=None, show_plot=True, ylabel='Cross-Correlation', lags=15, color='black')

generates a cross-correlation plot of x1 and x2

Parameters
  • lags – maxium number of lags to show

  • ylabel

  • normalize

  • color – color of the plot

sfctools.misc.timeseries.difference(x0)

computes difference of x0, i.e. [x1-x0] of a pandas time series

sfctools.misc.timeseries.interpolate(y, stretch_factor, kind='cubic')

interpolates an array using scipy interp1d.

x is datetime, y is float or somehow numerical stretch_factor = 3 # quarter to month

sfctools.misc.timeseries.log_difference(x0)

computes difference of log[x0], i.e. log[x1/x0] of a pandas time series

sfctools.misc.timeseries.percentage_change(x0)

computes difference of x0, i.e. [x1-x0] of a pandas time series

sfctools.misc.timeseries.stretch_datetime(x, factor)

stretches a series or array containing datetime objects by a certain factor

Parameters
  • x – an array

  • factor – int, factor > 1 by which to stretch the data

sfctools.misc.timeseries.stretch_pandas(s, factor, kind='cubic')

stretch a pandas dataframe or series by a certain factor

Parameters
  • s – pandas dataframe or series

  • factor – factor by which you want to stretch

sfctools.misc.timeseries.stretch_to_length(df, new_length, method='cubic')

stretches a dataframe with datetime index to a new, higher length (by interpolation of the values).

Parameters
  • df – a pandas data frame with datetime-formatted index

  • new_length – new desired length

  • method – intrepolation style (default ‘cubic’)

Return df_new

the stretched dataframe

Plotting

sfctools.misc.mpl_plotting.matplotlib_barplot(data, xlabel, ylabel, title, color='indigo', hatches=None, size=(6, 5), tight_layout=True, fmt='{0:f}', stacked=False, show_labels=True, legend='best', show=True, yerr=None, ax=None)

Creates a bar plot in matplotlib. The ‘macro’ plotting style will be used if it is set up

Parameters
  • xlabel – x axis label

  • ylabel – y axis label

  • title – plot title

  • color – plot color

  • hatches – hatch pattern

  • size – figure size

  • tight_layout – re-format plot

  • fmt – number format, {0:d} or {0:f} for example

  • stacked – stack column? or show side-by side (boolean)

  • legend – location of legend, or ‘off’

  • show – if False, only the figure is returned. Else plot is shown

  • ax – axis to plot on

  • yerr – data for error bar (if any)

Paarm show_labels

show the column labels at bottom? (boolean)

sfctools.misc.mpl_plotting.matplotlib_lineplot(data, xlabel=None, ylabel=None, title='', xlim=None, ylim=None, color='indigo', legend='best', marker=None, show=False, ax=None)

creates a line plot in matplotlib. The ‘macro’ plotting style will be used if it is set up

Parameters
  • xlabel – x axis label

  • ylabel – y axis label

  • title – plot title

  • color – plot color

  • ylim (xlim,) – tuples of plot limits

  • legend – location of legend, or ‘off’, or ‘outside’

  • show – if False, only the figure is returned. Else plot is shown

  • ax – axis to plot on

when overlapping barplot and lineplot, make sure you add an additional axis, see https://stackoverflow.com/questions/42948576/pandas-plot-does-not-overlay

sfctools.misc.mpl_plotting.plot_sankey(data, title='', show_plot=True, show_values=True, colors=None, label_rot=65, fontsize=10, separation=0.8, norm_factor=1.5, min_width=0.1, dy=10, dx=55, dx_space=8.5, alpha=0.5, filling_fraction=0.85)

plots a sankey diagram from data

Parameters
  • data – list of pandas dataframes, of the following format. length at least two.

  • title – title of the plot

  • show_plot – boolean switch to show plot window (default True). If False, figure is returned

  • show_values – boolean switch to print numerical values of data as text label (default True)

  • colors – (optional) a list of colors (if None, default colors are chosen)

  • label_rot – rotation of the labels (default 65 degrees)

  • fontsize – size of label font

  • separation – separation point of the label along the bands, between 0 and 1,

  • norm_factor – normalization factor for width

  • min_width – minimum width of bands

  • dy – vertical distance of bands

  • dx – horizontal distance of bands

  • dx_space – space between layers

  • filling_fraction – fraction by which the bands align (scaling with dy)

Return fig

None or matplotlib figure

Example:

from

to

value

color_id

A

C

1.0

0

A

D

2.0

1

B

D

3.0

2

from

to

value

color_id

C

E

5.0

1

C

F

6.0

2

D

G

8.0

3

Reporting

class sfctools.misc.reporting_sheet.DistributionReport(xlabel, ylabel, data=None)

Bases: object

A generic class for a logger of a distribution (i.e. )

Parameters
  • xlabel – x axis title of the report

  • ylabel – y axis title of the report

  • data – initial data (default None). Has to have ‘append’ method, ideally a list of tuples (data_i, tag)

add_data(x, label=None)

adds data into the data structure. (stores a sorted version of x).

Parameters
  • x – list or numpy array to store

  • label – some additional tag

plot_data(ax, color=None, s=None)

plot the data onto a matplotlib axis

Parameters
  • ax – a matplotlib axis

  • color – plotting color

  • s – scatter size

class sfctools.misc.reporting_sheet.IndicatorReport(xlabel, ylabel, data=None)

Bases: object

A generic class for logging scalar values (‘indicators’).

Parameters
  • xlabel – x axis label, str

  • ylabel – y axis label, str

  • data – some initial data (default None). Has to have ‘append’ method

add_data(x)

inserts some data into the data structure

Parameters

x – a scalar value

classmethod getitem(key)

retrieves a certain report from the instances created.

Parameters

key – the ylabel of the respective report

plot_data(ax, s=1.8, color='black', scatter_color='gray')

plots the data onto a matplotlib axis

Parameters

ax – a matplotlib axis

class sfctools.misc.reporting_sheet.ReportingSheet(instances=None)

Bases: object

Reporting sheet is an overview sheet for reporting model results in form of a grid plot.

Parameters

instances – default None, if not None: list of Report instances can be passed here

add_report(report)

Adds a report item to this sheet.

Parameters

report – any IndicatorReport or DistributionReport

plot(show_plot=True, verbose=False)

Generates a plot grid using matplotlib

Parameters
  • show_plot – show the figure in a window? default True. If False, figure is returned

  • verbose – print number of rows and columns if True (default False)

Return fig

matplotlib figure

to_dataframe()

combines the data into a single pandas dataframe

to_latex()

Generates latex code NOT YET IMPLEMENTED