A Python library for basic statistical tests

  • One-sample proportion confidence interval
  • One-sample proportion significant test
  • Two-sample proportion significant test
  • One-sample mean confidence interval
  • One-sample test of mean
  • Two-sample test of mean
  • Multi-sample test of mean
  • Paired-sample test of mean
  • Correlation coefficient and non-correlation test for two continuous variables
  • Chi-squred test for two categorical variables
  • Mutual information of two categorical variables

To install

python setup.py install
pip install stat-tests

To use

>>> import stat_tests as st

One-sample proportion confidence interval

>>> st.one_sample_proportion_confidence_interval(n_successes=200, n_trials=1000, confidence=0.95)
(0.17520819870781754, 0.22479180129218249)

One-sample proportion significant test

>>> z_score, p_value = st.one_sample_proportion_test(n_successes=5123, n_trials=10000, p_hypo=0.5)
>>> z_score, p_value
(2.460744684807139, 0.013864899319339763)
>>> z_score, p_value = st.one_sample_proportion_test(n_successes=5123, n_trials=10000, p_hypo=0.5, one_side=True)
>>> z_score, p_value
(2.460744684807139, 0.0069324496596698815)

Two-sample proportion significant test

>>> z_score, p_value = st.two_sample_proportion_test(n_successes_1=20, n_trials_1=300, n_successes_2=21, n_trials_2=298)
>>> z_score, p_value
(-0.18400991456652802, 0.85400567703455788)

One-sample mean confidence interval

>>> data = 1.80 + 0.2 * np.random.randn(1000)
>>> st.one_sample_mean_confidence_interval(data, confidence=0.95)
(1.7980145129603369, 1.8228186217572906)

One-sample test of mean

>>> data = 1.80 + 0.2 * np.random.randn(1000)
>>> t_score, p_value = st.one_sample_mean_test(data, p_hypo=1.78)
>>> t_score, p_value
(4.8430777499530517, 1.4808428763412875e-06)

Two-sample test of mean

>>> data1 = 1.80 + 0.2 * np.random.randn(1000)
>>> data2 = 1.70 + 0.2 * np.random.randn(1000)
>>> t_score, p_value = st.two_sample_mean_test(data1, data2)
>>> t_score, p_value
(11.720297373853812, 9.9185358995527835e-31)

Multi-sample test of mean

# insignificant case
>>> data1 = 1.80 + 0.3 * np.random.randn(1000)
>>> data2 = 1.80 + 0.3 * np.random.randn(1000)
>>> data3 = 1.80 + 0.3 * np.random.randn(1000)
>>> data_sets = (data1, data2, data3)
>>> f_score, p_value = st.multi_sample_mean_test(data_sets)
>>> f_score, p_value
(0.28181526543047031, 0.75443302736823015)
# significant case
>>> data1 = 1.82 + 0.3 * np.random.randn(1000)
>>> data2 = 1.80 + 0.3 * np.random.randn(1000)
>>> data3 = 1.85 + 0.3 * np.random.randn(1000)
>>> data_sets = (data1, data2, data3)
>>> f_score, p_value = st.multi_sample_mean_test(data_sets)
>>> f_score, p_value
(12.397923609923124, 4.343121211676659e-06)

Paired-sample test of mean

>>> data1 = 1.80 + 0.2 * np.random.randn(1000)# insignificant case
>>> data2 = data1 + 0.01 * np.random.randn(len(data1))
>>> t_score, p_value = st.paired_sample_mean_test(data1, data2)
>>> t_score, p_value
(-0.43398665574510392, 0.66439183811383029)
# significant case
>>> data2 = data1 + 0.01 + 0.01 * np.random.randn(len(data1))
>>> t_score, p_value = st.paired_sample_mean_test(data1, data2)
>>> t_score, p_value
(-34.149842899654423, 5.452067026533736e-170)

Correlation coefficient and non-correlation test for two continuous variables

>>> data1 = 3 + 0.2 * np.random.randn(1000)# insignificant case
>>> data2 = 3 + 0.2 * np.random.randn(1000)
>>> r, p_value = st.correlation_coef(data1, data2)
>>> r, p_value
(0.0031232102929028708, 0.92142308109769133)
# significant case
>>> data2 = data1 + 0.5 * np.random.randn(len(data1))
>>> r, p_value = st.correlation_coef(data1, data2)
>>> r, p_value
(0.37799832675973383, 2.566197851682784e-35)

Chi-squred test for two categorical variables

>>> data1 = np.random.choice(4, 1000)# insignificant case
>>> data2 = np.random.choice(2, 1000)
>>> st.make_contingency(data1, data2)
data1 0 1 2 3
data2
0 126 136 121 130
1 115 96 131 145
>>> chi2, p_value = st.chisq(data1, data2)
>>> chi2, p_value
(2.3626413559788064, 0.50062680584524433)
# significant case
>>> data2 = [np.random.choice(2, p=(0.9, 0.1)) if i in (0, 2) else np.random.choice(2, p=(0.1, 0.9)) for i in data1]
>>> st.make_contingency(data1, data2)
data1 0 1 2 3
data2
0 209 27 226 25
1 32 205 26 250
>>> chi2, p_value = st.chisq(data1, data2)
>>> chi2, p_value
(609.12165019624854, 1.0614940711031585e-131)

Mutual information of two categorical variables

>>> data1 = np.random.choice(4, 1000)# insignificant case
>>> data2 = np.random.choice(2, 1000)
>>> mutual_information = st.mutual_information(data1, data2)
>>> mutual_information
0.0017608733794212128
# significant case
>>> data2 = [np.random.choice(2, p=(0.9, 0.1)) if i in (0, 2) else np.random.choice(2, p=(0.1, 0.9)) for i in data1]
>>> mutual_information = st.mutual_information(data1, data2)
>>> mutual_information
0.32882984773406321

--

--

--

Software Engineering SMTS at Salesforce Commerce Cloud Einstein

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Big data & small files problem in HDFS

What promotion to send to your customer? Hear what data has to say.

Analysing survey data with Python and Jupyter Notebooks

Chronic failure stories for better component reliability using Python

The Battle of Neighborhoods | Finding a Better Place in Scarborough, Toronto

Map of Scarborough

How Floodlight Works

Machine Learning for Financial Analysis

Data Mining

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Yang Zhang

Yang Zhang

Software Engineering SMTS at Salesforce Commerce Cloud Einstein

More from Medium

Cracking regression analysis: basic steps using Python.

How to Perform One-Hot Encoding the Right Way Using Pandas

PULLING FINANCIAL DATASET USING YAHOO FINANCE API IN PYTHON

Analyse Heart Attacks with Descriptive Parameters