Overview

Dataset statistics

Number of variables4
Number of observations4
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory256.0 B
Average record size in memory64.0 B

Variable types

Categorical4

Warnings

humedad is highly correlated with temperatura and 1 other fieldsHigh correlation
temperatura is highly correlated with humedad and 2 other fieldsHigh correlation
df_index is highly correlated with humedad and 2 other fieldsHigh correlation
presion is highly correlated with temperatura and 1 other fieldsHigh correlation
df_index is uniformly distributed Uniform
temperatura is uniformly distributed Uniform
df_index has unique values Unique
temperatura has unique values Unique

Reproduction

Analysis started2021-04-25 17:00:02.738456
Analysis finished2021-04-25 17:00:06.283925
Duration3.55 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

df_index
Categorical

HIGH CORRELATION
UNIFORM
UNIQUE

Distinct4
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size160.0 B
12h
6h
3h
9h

Length

Max length3
Median length2
Mean length2.25
Min length2

Characters and Unicode

Total characters9
Distinct characters6
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)100.0%

Sample

1st row3h
2nd row6h
3rd row9h
4th row12h
ValueCountFrequency (%)
12h1
25.0%
6h1
25.0%
3h1
25.0%
9h1
25.0%
2021-04-25T19:00:06.587197image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-04-25T19:00:06.703336image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
12h1
25.0%
6h1
25.0%
3h1
25.0%
9h1
25.0%

Most occurring characters

ValueCountFrequency (%)
h4
44.4%
31
 
11.1%
61
 
11.1%
91
 
11.1%
11
 
11.1%
21
 
11.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number5
55.6%
Lowercase Letter4
44.4%

Most frequent character per category

ValueCountFrequency (%)
31
20.0%
61
20.0%
91
20.0%
11
20.0%
21
20.0%
ValueCountFrequency (%)
h4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common5
55.6%
Latin4
44.4%

Most frequent character per script

ValueCountFrequency (%)
31
20.0%
61
20.0%
91
20.0%
11
20.0%
21
20.0%
ValueCountFrequency (%)
h4
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII9
100.0%

Most frequent character per block

ValueCountFrequency (%)
h4
44.4%
31
 
11.1%
61
 
11.1%
91
 
11.1%
11
 
11.1%
21
 
11.1%

humedad
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)75.0%
Missing0
Missing (%)0.0%
Memory size160.0 B
65
67
63

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters8
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)50.0%

Sample

1st row65
2nd row63
3rd row65
4th row67
ValueCountFrequency (%)
652
50.0%
671
25.0%
631
25.0%
2021-04-25T19:00:07.244659image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-04-25T19:00:07.360561image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
652
50.0%
631
25.0%
671
25.0%

Most occurring characters

ValueCountFrequency (%)
64
50.0%
52
25.0%
31
 
12.5%
71
 
12.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number8
100.0%

Most frequent character per category

ValueCountFrequency (%)
64
50.0%
52
25.0%
31
 
12.5%
71
 
12.5%

Most occurring scripts

ValueCountFrequency (%)
Common8
100.0%

Most frequent character per script

ValueCountFrequency (%)
64
50.0%
52
25.0%
31
 
12.5%
71
 
12.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII8
100.0%

Most frequent character per block

ValueCountFrequency (%)
64
50.0%
52
25.0%
31
 
12.5%
71
 
12.5%

temperatura
Categorical

HIGH CORRELATION
UNIFORM
UNIQUE

Distinct4
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size160.0 B
35.5
39.7
22.3
36.7

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters16
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)100.0%

Sample

1st row35.5
2nd row36.7
3rd row22.3
4th row39.7
ValueCountFrequency (%)
35.51
25.0%
39.71
25.0%
22.31
25.0%
36.71
25.0%
2021-04-25T19:00:07.680290image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-04-25T19:00:07.801913image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
36.71
25.0%
22.31
25.0%
39.71
25.0%
35.51
25.0%

Most occurring characters

ValueCountFrequency (%)
34
25.0%
.4
25.0%
52
12.5%
72
12.5%
22
12.5%
61
 
6.2%
91
 
6.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number12
75.0%
Other Punctuation4
 
25.0%

Most frequent character per category

ValueCountFrequency (%)
34
33.3%
52
16.7%
72
16.7%
22
16.7%
61
 
8.3%
91
 
8.3%
ValueCountFrequency (%)
.4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common16
100.0%

Most frequent character per script

ValueCountFrequency (%)
34
25.0%
.4
25.0%
52
12.5%
72
12.5%
22
12.5%
61
 
6.2%
91
 
6.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII16
100.0%

Most frequent character per block

ValueCountFrequency (%)
34
25.0%
.4
25.0%
52
12.5%
72
12.5%
22
12.5%
61
 
6.2%
91
 
6.2%

presion
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)75.0%
Missing0
Missing (%)0.0%
Memory size160.0 B
5.5
2.4
3.6

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters12
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)50.0%

Sample

1st row3.6
2nd row2.4
3rd row5.5
4th row5.5
ValueCountFrequency (%)
5.52
50.0%
2.41
25.0%
3.61
25.0%
2021-04-25T19:00:08.139111image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-04-25T19:00:08.261164image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
5.52
50.0%
3.61
25.0%
2.41
25.0%

Most occurring characters

ValueCountFrequency (%)
.4
33.3%
54
33.3%
31
 
8.3%
61
 
8.3%
21
 
8.3%
41
 
8.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number8
66.7%
Other Punctuation4
33.3%

Most frequent character per category

ValueCountFrequency (%)
54
50.0%
31
 
12.5%
61
 
12.5%
21
 
12.5%
41
 
12.5%
ValueCountFrequency (%)
.4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common12
100.0%

Most frequent character per script

ValueCountFrequency (%)
.4
33.3%
54
33.3%
31
 
8.3%
61
 
8.3%
21
 
8.3%
41
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII12
100.0%

Most frequent character per block

ValueCountFrequency (%)
.4
33.3%
54
33.3%
31
 
8.3%
61
 
8.3%
21
 
8.3%
41
 
8.3%

Correlations

2021-04-25T19:00:08.377403image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-04-25T19:00:08.575596image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-04-25T19:00:08.762955image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-04-25T19:00:08.957003image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-04-25T19:00:09.140779image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-04-25T19:00:05.953821image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-04-25T19:00:06.174658image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexhumedadtemperaturapresion
03h6535.53.6
16h6336.72.4
29h6522.35.5
312h6739.75.5

Last rows

df_indexhumedadtemperaturapresion
03h6535.53.6
16h6336.72.4
29h6522.35.5
312h6739.75.5