Generalized linear mixed model
==============================


.. currentmodule:: statistical.glmm
.. autofunction:: run

The following code example shows how to apply the glmm for data evaluation.

.. code:: python
    
   import numpy as np
   np.random.seed(0)

   import finn.statistical.glmm as glmm

   data_size = 100000
   random_factor_count = 20
   nested_random_factor_count = 2

   data_01 = np.random.normal(0, 3, int(data_size/2)); data_02 = np.random.normal(1, 2, int(data_size/2)); data_0 = np.concatenate((data_01, data_02)); data_0 = np.expand_dims(data_0, axis = 0)

   data_11 = np.random.binomial(1, 0.1, int(data_size/2)); data_12 = np.random.binomial(1, 0.9, int(data_size/2)); data_1 = np.concatenate((data_11, data_12)); data_1 = np.expand_dims(data_1, axis = 0)
   data_21 = np.random.binomial(1, 0.5, int(data_size/2))*2-2; data_22 = np.random.binomial(1, 0.5, int(data_size/2))*2-2; data_2 = np.concatenate((data_21, data_22)); data_2 = np.expand_dims(data_2, axis = 0)
   data_31 = np.random.normal(0, 1, int(data_size/2)); data_32 = np.random.normal(0.25, 1, int(data_size/2)); data_3 = np.concatenate((data_31, data_32)); data_3 = np.expand_dims(data_3, axis = 0)
   data_4 = np.repeat(np.arange(0, random_factor_count), data_size/random_factor_count); np.random.shuffle(data_4); data_4 = np.expand_dims(data_4, axis = 0)
   data_5 = np.repeat(np.arange(0, nested_random_factor_count), data_size/nested_random_factor_count); np.random.shuffle(data_5); data_5 = np.expand_dims(data_5, axis = 0)

   data = np.concatenate((data_0, data_1, data_2, data_3, data_4, data_5), axis = 0).transpose()
   data_label = ["measured_variable", "categorical_factor_A", "categorical_factor_B", "continous_factor_A", "random_effect_A", "nested_random_effect_A"]
   glm_formula = "measured_variable ~ categorical_factor_A + categorical_factor_B + continous_factor_A + categorical_factor_A:continous_factor_A + (1|random_effect_A) + (1|random_effect_A:nested_random_effect_A)"
   glm_factor_types = ["continuous", "categorical",  "categorical",   "continuous",  "categorical",  "categorical"]
   glm_contrasts = "list(categorical_factor_A = contr.sum, categorical_factor_B = contr.sum, continous_factor_A = contr.sum, random_effect_A = contr.sum, nested_random_effect_A = contr.sum)"
   glm_model_type = "gaussian"

   stat_results = glmm.run(data = data, label_name= data_label, factor_type = glm_factor_types, formula = glm_formula, contrasts = glm_contrasts, data_type = glm_model_type)

   print("Demo may return a singular fit since the naive applied data generation of this example\n does not guarantee sufficient observations for any random factor/nested random factor.\n")

   for (factor_idx, factor_name) in enumerate(stat_results[5]):
       print("factor: %s | p-value: %2.2f | effect size: %2.2f | std error %2.2f" % (factor_name, stat_results[2][factor_idx], 
                                                                                     stat_results[3][factor_idx], stat_results[4][factor_idx]))
      

Applying the generalized lineax mixed model will identify categorical_factor_A as significant with a large effect size, continous_factor_A is also significant, but has a much smaller effect size, the intercept is the third significant factor with a relatively small effect size. Neither categorical_factor_B nor the interaction between categorical_factor_A and continous_factor_A are statistically significant or exhibit a large effect size (especially in reference to the std error).

=======================================  =======  ===========  =========
Name                                     p-value  effect-size  std-error
=======================================  =======  ===========  =========
categorical_factor_A                     0.00     0.82         0.02
categorical_factor_B                     0.17     -0.02        0.02
continous_factor_A                       0.00     0.04         0.01
categorical_factor_A:continous_factor_A  0.36     -0.01        0.02
(Intercept)                              0.00     0.10         0.01
=======================================  =======  ===========  =========

Note: This demo code may return a singular fit since the naive applied data generation of this example
 does not guarantee sufficient observations for any random factor/nested random factor.