{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " \n", " \"Open\n", " \n", "
\n", "
\n", "

\n", " 💡 Si el botó no funciona: Copia i enganxa aquesta URL al teu navegador:
\n", " \n", "https://colab.research.google.com/github/wisaaco/TallerPythonFEE/blob/main/notebooks/2_PandasIntroduccio.ipynb\n", " \n", "

\n", "
" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Unitat 2. La llibreria Pandas\n", "\n", "## Pandas\n", "\n", "**Pandas** és la llibreria per excel·lència per a l'anàlisi de dades del llenguatge Python. El seu nom prové de “panel data” (terme economètric). Inspirada en les funcionalitats de fulls de càlcul i _R_, però amb el potencial d'aquest llenguatge de propòsit general.\n", "\n", "**Pandas** inclou totes les funcionalitats necessàries per al procés d'anàlisi de dades: càrrega, filtrat, tractament, síntesi, agrupament, emmagatzematge i visualització. A més, s'integra amb la resta de llibreries de càlcul numèric com Numpy, visualització amb Matplotlib, aprenentatge automàtic amb scikit-learn, etc.\n", "\n", "**L'objectiu d'aquest curs introductori és adquirir capacitats bàsiques d'anàlisis de dades amb Pandas.**\n", "\n", "Referències:\n", "\n", "- [Pandas](https://pandas.pydata.org/)\n", "\n", "- [Característiques Pandas](https://github.com/pandas-dev/pandas#*main-*features)\n", "\n", "*Nota*: Pandas és una llibreria que ja es troba instal·lada a _Google Colab_." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Càrrega de dades i estructures bàsiques de Pandas.\n", "\n", "\n", "A Pandas hi ha dos tipus de variables: **dataframes i series**.\n", "\n", "A continuació llegirem el nostre primer *dataframe*, ho farem a partir d'un fitxer de dades que presenta una estructura tabular. Aquests fitxers amb extensió CSV (Comma Separated Value) són un dels formats més emprats." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "df_who = pd.read_csv(\"data/WHO.csv\") #dataframe\n", "print(type(df_who))" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryCountryIDContinentAdolescent fertility rate (%)Adult literacy rate (%)Gross national income per capita (PPP international $)Net primary school enrolment ratio female (%)Net primary school enrolment ratio male (%)Population (in thousands) totalPopulation annual growth rate (%)...Total_CO2_emissionsTotal_incomeTotal_reservesTrade_balance_goods_and_servicesUnder_five_mortality_from_CMEUnder_five_mortality_from_IHMEUnder_five_mortality_rateUrban_populationUrban_population_growthUrban_population_pct_of_total
0Afghanistan11151.028.0NaNNaNNaN26088.04.0...692.50NaNNaNNaN257.00231.9257.005740436.05.4422.9
1Albania2227.098.76000.093.094.03172.00.6...3499.124.790000e+0978.14-2.040000e+0918.4715.518.471431793.92.2145.4
2Algeria336.069.95940.094.096.033351.01.5...137535.566.970000e+10351.364.700000e+0940.0031.240.0020800000.02.6163.3
3Andorra42NaNNaNNaN83.083.074.01.0...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
4Angola53146.067.43890.049.051.016557.02.8...8991.461.490000e+1027.139.140000e+09164.10242.5164.108578749.04.1453.3
..................................................................
197Vietnam198625.090.32310.091.096.086206.01.4...101826.234.480000e+1047.11-1.940000e+0920.2023.420.2021900000.02.9026.4
198West Bank and Gaza1991NaNNaNNaNNaNNaNNaNNaN...655.863.780000e+09NaNNaN28.0025.828.002596216.03.3371.6
199Yemen200183.054.12090.065.085.021732.03.0...20148.341.150000e+10114.528.310000e+0882.4087.982.405759120.54.3727.3
200Zambia2013161.068.01140.094.090.011696.01.9...2366.944.090000e+0910.41-4.470000e+08175.30163.8175.304017411.01.9535.0
201Zimbabwe2023101.089.5NaN88.087.013228.00.8...11457.335.620000e+093.39-1.710000e+08106.5067.0106.504709965.01.9035.9
\n", "

202 rows × 358 columns

\n", "
" ], "text/plain": [ " Country CountryID Continent Adolescent fertility rate (%) \\\n", "0 Afghanistan 1 1 151.0 \n", "1 Albania 2 2 27.0 \n", "2 Algeria 3 3 6.0 \n", "3 Andorra 4 2 NaN \n", "4 Angola 5 3 146.0 \n", ".. ... ... ... ... \n", "197 Vietnam 198 6 25.0 \n", "198 West Bank and Gaza 199 1 NaN \n", "199 Yemen 200 1 83.0 \n", "200 Zambia 201 3 161.0 \n", "201 Zimbabwe 202 3 101.0 \n", "\n", " Adult literacy rate (%) \\\n", "0 28.0 \n", "1 98.7 \n", "2 69.9 \n", "3 NaN \n", "4 67.4 \n", ".. ... \n", "197 90.3 \n", "198 NaN \n", "199 54.1 \n", "200 68.0 \n", "201 89.5 \n", "\n", " Gross national income per capita (PPP international $) \\\n", "0 NaN \n", "1 6000.0 \n", "2 5940.0 \n", "3 NaN \n", "4 3890.0 \n", ".. ... \n", "197 2310.0 \n", "198 NaN \n", "199 2090.0 \n", "200 1140.0 \n", "201 NaN \n", "\n", " Net primary school enrolment ratio female (%) \\\n", "0 NaN \n", "1 93.0 \n", "2 94.0 \n", "3 83.0 \n", "4 49.0 \n", ".. ... \n", "197 91.0 \n", "198 NaN \n", "199 65.0 \n", "200 94.0 \n", "201 88.0 \n", "\n", " Net primary school enrolment ratio male (%) \\\n", "0 NaN \n", "1 94.0 \n", "2 96.0 \n", "3 83.0 \n", "4 51.0 \n", ".. ... \n", "197 96.0 \n", "198 NaN \n", "199 85.0 \n", "200 90.0 \n", "201 87.0 \n", "\n", " Population (in thousands) total Population annual growth rate (%) ... \\\n", "0 26088.0 4.0 ... \n", "1 3172.0 0.6 ... \n", "2 33351.0 1.5 ... \n", "3 74.0 1.0 ... \n", "4 16557.0 2.8 ... \n", ".. ... ... ... \n", "197 86206.0 1.4 ... \n", "198 NaN NaN ... \n", "199 21732.0 3.0 ... \n", "200 11696.0 1.9 ... \n", "201 13228.0 0.8 ... \n", "\n", " Total_CO2_emissions Total_income Total_reserves \\\n", "0 692.50 NaN NaN \n", "1 3499.12 4.790000e+09 78.14 \n", "2 137535.56 6.970000e+10 351.36 \n", "3 NaN NaN NaN \n", "4 8991.46 1.490000e+10 27.13 \n", ".. ... ... ... \n", "197 101826.23 4.480000e+10 47.11 \n", "198 655.86 3.780000e+09 NaN \n", "199 20148.34 1.150000e+10 114.52 \n", "200 2366.94 4.090000e+09 10.41 \n", "201 11457.33 5.620000e+09 3.39 \n", "\n", " Trade_balance_goods_and_services Under_five_mortality_from_CME \\\n", "0 NaN 257.00 \n", "1 -2.040000e+09 18.47 \n", "2 4.700000e+09 40.00 \n", "3 NaN NaN \n", "4 9.140000e+09 164.10 \n", ".. ... ... \n", "197 -1.940000e+09 20.20 \n", "198 NaN 28.00 \n", "199 8.310000e+08 82.40 \n", "200 -4.470000e+08 175.30 \n", "201 -1.710000e+08 106.50 \n", "\n", " Under_five_mortality_from_IHME Under_five_mortality_rate \\\n", "0 231.9 257.00 \n", "1 15.5 18.47 \n", "2 31.2 40.00 \n", "3 NaN NaN \n", "4 242.5 164.10 \n", ".. ... ... \n", "197 23.4 20.20 \n", "198 25.8 28.00 \n", "199 87.9 82.40 \n", "200 163.8 175.30 \n", "201 67.0 106.50 \n", "\n", " Urban_population Urban_population_growth Urban_population_pct_of_total \n", "0 5740436.0 5.44 22.9 \n", "1 1431793.9 2.21 45.4 \n", "2 20800000.0 2.61 63.3 \n", "3 NaN NaN NaN \n", "4 8578749.0 4.14 53.3 \n", ".. ... ... ... \n", "197 21900000.0 2.90 26.4 \n", "198 2596216.0 3.33 71.6 \n", "199 5759120.5 4.37 27.3 \n", "200 4017411.0 1.95 35.0 \n", "201 4709965.0 1.90 35.9 \n", "\n", "[202 rows x 358 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_who" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "A continuació veurem diferents maneres d'explorar un dataframe:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(202, 358)\n" ] } ], "source": [ "# Estructura\n", "print(df_who.shape) # files o mostres x columnes" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 202 entries, 0 to 201\n", "Columns: 358 entries, Country to Urban_population_pct_of_total\n", "dtypes: float64(355), int64(2), object(1)\n", "memory usage: 565.1+ KB\n" ] } ], "source": [ "df_who.info()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryIDContinentAdolescent fertility rate (%)Adult literacy rate (%)Gross national income per capita (PPP international $)Net primary school enrolment ratio female (%)Net primary school enrolment ratio male (%)Population (in thousands) totalPopulation annual growth rate (%)Population in urban areas (%)...Total_CO2_emissionsTotal_incomeTotal_reservesTrade_balance_goods_and_servicesUnder_five_mortality_from_CMEUnder_five_mortality_from_IHMEUnder_five_mortality_rateUrban_populationUrban_population_growthUrban_population_pct_of_total
count202.000000202.000000177.000000131.000000178.000000179.000000179.0000001.930000e+02193.000000193.000000...1.860000e+021.780000e+02128.0000001.710000e+02181.000000170.000000181.0000001.880000e+02188.000000188.000000
mean101.5000003.57920859.45762778.87175611250.11236084.03352085.6983243.409805e+041.29792754.911917...1.483596e+052.015567e+1157.2535163.424012e+0856.67762454.35647156.6776241.665763e+072.16585155.195213
std58.4565371.80826349.10528620.41576012586.75341717.78804715.4512121.304957e+051.16386423.554182...6.133091e+059.400689e+11138.6692985.943043e+1060.06092961.16055660.0609295.094867e+071.59662823.742122
min1.0000001.0000000.00000023.600000260.0000006.00000011.0000002.000000e+00-2.50000010.000000...2.565000e+015.190000e+070.990000-7.140000e+112.9000003.0000002.9000001.545600e+04-1.16000010.000000
25%51.2500002.00000019.00000068.4000002112.50000079.00000079.5000001.340000e+030.50000036.000000...1.672615e+033.317500e+0916.292500-1.210000e+0912.4000008.47500012.4000009.171623e+051.10500035.650000
50%101.5000003.00000046.00000086.5000006175.00000090.00000090.0000006.762000e+031.30000057.000000...1.021157e+041.145000e+1028.515000-2.240000e+0829.98000027.60000029.9800003.427661e+061.94500057.300000
75%151.7500005.00000091.00000095.30000014502.50000096.00000096.0000002.173200e+042.10000073.000000...6.549217e+048.680000e+1055.3100001.024000e+0988.70000082.90000088.7000009.837113e+063.25250072.750000
max202.0000007.000000199.00000099.80000060870.000000100.000000100.0000001.328474e+064.300000100.000000...5.776432e+061.100000e+131334.8600001.390000e+11267.000000253.700000267.0000005.270000e+087.850000100.000000
\n", "

8 rows × 357 columns

\n", "
" ], "text/plain": [ " CountryID Continent Adolescent fertility rate (%) \\\n", "count 202.000000 202.000000 177.000000 \n", "mean 101.500000 3.579208 59.457627 \n", "std 58.456537 1.808263 49.105286 \n", "min 1.000000 1.000000 0.000000 \n", "25% 51.250000 2.000000 19.000000 \n", "50% 101.500000 3.000000 46.000000 \n", "75% 151.750000 5.000000 91.000000 \n", "max 202.000000 7.000000 199.000000 \n", "\n", " Adult literacy rate (%) \\\n", "count 131.000000 \n", "mean 78.871756 \n", "std 20.415760 \n", "min 23.600000 \n", "25% 68.400000 \n", "50% 86.500000 \n", "75% 95.300000 \n", "max 99.800000 \n", "\n", " Gross national income per capita (PPP international $) \\\n", "count 178.000000 \n", "mean 11250.112360 \n", "std 12586.753417 \n", "min 260.000000 \n", "25% 2112.500000 \n", "50% 6175.000000 \n", "75% 14502.500000 \n", "max 60870.000000 \n", "\n", " Net primary school enrolment ratio female (%) \\\n", "count 179.000000 \n", "mean 84.033520 \n", "std 17.788047 \n", "min 6.000000 \n", "25% 79.000000 \n", "50% 90.000000 \n", "75% 96.000000 \n", "max 100.000000 \n", "\n", " Net primary school enrolment ratio male (%) \\\n", "count 179.000000 \n", "mean 85.698324 \n", "std 15.451212 \n", "min 11.000000 \n", "25% 79.500000 \n", "50% 90.000000 \n", "75% 96.000000 \n", "max 100.000000 \n", "\n", " Population (in thousands) total Population annual growth rate (%) \\\n", "count 1.930000e+02 193.000000 \n", "mean 3.409805e+04 1.297927 \n", "std 1.304957e+05 1.163864 \n", "min 2.000000e+00 -2.500000 \n", "25% 1.340000e+03 0.500000 \n", "50% 6.762000e+03 1.300000 \n", "75% 2.173200e+04 2.100000 \n", "max 1.328474e+06 4.300000 \n", "\n", " Population in urban areas (%) ... Total_CO2_emissions Total_income \\\n", "count 193.000000 ... 1.860000e+02 1.780000e+02 \n", "mean 54.911917 ... 1.483596e+05 2.015567e+11 \n", "std 23.554182 ... 6.133091e+05 9.400689e+11 \n", "min 10.000000 ... 2.565000e+01 5.190000e+07 \n", "25% 36.000000 ... 1.672615e+03 3.317500e+09 \n", "50% 57.000000 ... 1.021157e+04 1.145000e+10 \n", "75% 73.000000 ... 6.549217e+04 8.680000e+10 \n", "max 100.000000 ... 5.776432e+06 1.100000e+13 \n", "\n", " Total_reserves Trade_balance_goods_and_services \\\n", "count 128.000000 1.710000e+02 \n", "mean 57.253516 3.424012e+08 \n", "std 138.669298 5.943043e+10 \n", "min 0.990000 -7.140000e+11 \n", "25% 16.292500 -1.210000e+09 \n", "50% 28.515000 -2.240000e+08 \n", "75% 55.310000 1.024000e+09 \n", "max 1334.860000 1.390000e+11 \n", "\n", " Under_five_mortality_from_CME Under_five_mortality_from_IHME \\\n", "count 181.000000 170.000000 \n", "mean 56.677624 54.356471 \n", "std 60.060929 61.160556 \n", "min 2.900000 3.000000 \n", "25% 12.400000 8.475000 \n", "50% 29.980000 27.600000 \n", "75% 88.700000 82.900000 \n", "max 267.000000 253.700000 \n", "\n", " Under_five_mortality_rate Urban_population Urban_population_growth \\\n", "count 181.000000 1.880000e+02 188.000000 \n", "mean 56.677624 1.665763e+07 2.165851 \n", "std 60.060929 5.094867e+07 1.596628 \n", "min 2.900000 1.545600e+04 -1.160000 \n", "25% 12.400000 9.171623e+05 1.105000 \n", "50% 29.980000 3.427661e+06 1.945000 \n", "75% 88.700000 9.837113e+06 3.252500 \n", "max 267.000000 5.270000e+08 7.850000 \n", "\n", " Urban_population_pct_of_total \n", "count 188.000000 \n", "mean 55.195213 \n", "std 23.742122 \n", "min 10.000000 \n", "25% 35.650000 \n", "50% 57.300000 \n", "75% 72.750000 \n", "max 100.000000 \n", "\n", "[8 rows x 357 columns]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_who.describe()" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryCountryIDContinentAdolescent fertility rate (%)Adult literacy rate (%)Gross national income per capita (PPP international $)Net primary school enrolment ratio female (%)Net primary school enrolment ratio male (%)Population (in thousands) totalPopulation annual growth rate (%)...Total_CO2_emissionsTotal_incomeTotal_reservesTrade_balance_goods_and_servicesUnder_five_mortality_from_CMEUnder_five_mortality_from_IHMEUnder_five_mortality_rateUrban_populationUrban_population_growthUrban_population_pct_of_total
0Afghanistan11151.028.0NaNNaNNaN26088.04.0...692.50NaNNaNNaN257.00231.9257.005740436.05.4422.9
1Albania2227.098.76000.093.094.03172.00.6...3499.124.790000e+0978.14-2.040000e+0918.4715.518.471431793.92.2145.4
2Algeria336.069.95940.094.096.033351.01.5...137535.566.970000e+10351.364.700000e+0940.0031.240.0020800000.02.6163.3
3Andorra42NaNNaNNaN83.083.074.01.0...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
4Angola53146.067.43890.049.051.016557.02.8...8991.461.490000e+1027.139.140000e+09164.10242.5164.108578749.04.1453.3
\n", "

5 rows × 358 columns

\n", "
" ], "text/plain": [ " Country CountryID Continent Adolescent fertility rate (%) \\\n", "0 Afghanistan 1 1 151.0 \n", "1 Albania 2 2 27.0 \n", "2 Algeria 3 3 6.0 \n", "3 Andorra 4 2 NaN \n", "4 Angola 5 3 146.0 \n", "\n", " Adult literacy rate (%) \\\n", "0 28.0 \n", "1 98.7 \n", "2 69.9 \n", "3 NaN \n", "4 67.4 \n", "\n", " Gross national income per capita (PPP international $) \\\n", "0 NaN \n", "1 6000.0 \n", "2 5940.0 \n", "3 NaN \n", "4 3890.0 \n", "\n", " Net primary school enrolment ratio female (%) \\\n", "0 NaN \n", "1 93.0 \n", "2 94.0 \n", "3 83.0 \n", "4 49.0 \n", "\n", " Net primary school enrolment ratio male (%) \\\n", "0 NaN \n", "1 94.0 \n", "2 96.0 \n", "3 83.0 \n", "4 51.0 \n", "\n", " Population (in thousands) total Population annual growth rate (%) ... \\\n", "0 26088.0 4.0 ... \n", "1 3172.0 0.6 ... \n", "2 33351.0 1.5 ... \n", "3 74.0 1.0 ... \n", "4 16557.0 2.8 ... \n", "\n", " Total_CO2_emissions Total_income Total_reserves \\\n", "0 692.50 NaN NaN \n", "1 3499.12 4.790000e+09 78.14 \n", "2 137535.56 6.970000e+10 351.36 \n", "3 NaN NaN NaN \n", "4 8991.46 1.490000e+10 27.13 \n", "\n", " Trade_balance_goods_and_services Under_five_mortality_from_CME \\\n", "0 NaN 257.00 \n", "1 -2.040000e+09 18.47 \n", "2 4.700000e+09 40.00 \n", "3 NaN NaN \n", "4 9.140000e+09 164.10 \n", "\n", " Under_five_mortality_from_IHME Under_five_mortality_rate \\\n", "0 231.9 257.00 \n", "1 15.5 18.47 \n", "2 31.2 40.00 \n", "3 NaN NaN \n", "4 242.5 164.10 \n", "\n", " Urban_population Urban_population_growth Urban_population_pct_of_total \n", "0 5740436.0 5.44 22.9 \n", "1 1431793.9 2.21 45.4 \n", "2 20800000.0 2.61 63.3 \n", "3 NaN NaN NaN \n", "4 8578749.0 4.14 53.3 \n", "\n", "[5 rows x 358 columns]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Aquesta funció admet un paràmetre numèric, especifica el nombre de files\n", "# Per defecte és 5\n", "df_who.head()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountryCountryIDContinentAdolescent fertility rate (%)Adult literacy rate (%)Gross national income per capita (PPP international $)Net primary school enrolment ratio female (%)Net primary school enrolment ratio male (%)Population (in thousands) totalPopulation annual growth rate (%)...Total_CO2_emissionsTotal_incomeTotal_reservesTrade_balance_goods_and_servicesUnder_five_mortality_from_CMEUnder_five_mortality_from_IHMEUnder_five_mortality_rateUrban_populationUrban_population_growthUrban_population_pct_of_total
197Vietnam198625.090.32310.091.096.086206.01.4...101826.234.480000e+1047.11-1.940000e+0920.223.420.221900000.02.9026.4
198West Bank and Gaza1991NaNNaNNaNNaNNaNNaNNaN...655.863.780000e+09NaNNaN28.025.828.02596216.03.3371.6
199Yemen200183.054.12090.065.085.021732.03.0...20148.341.150000e+10114.528.310000e+0882.487.982.45759120.54.3727.3
200Zambia2013161.068.01140.094.090.011696.01.9...2366.944.090000e+0910.41-4.470000e+08175.3163.8175.34017411.01.9535.0
201Zimbabwe2023101.089.5NaN88.087.013228.00.8...11457.335.620000e+093.39-1.710000e+08106.567.0106.54709965.01.9035.9
\n", "

5 rows × 358 columns

\n", "
" ], "text/plain": [ " Country CountryID Continent Adolescent fertility rate (%) \\\n", "197 Vietnam 198 6 25.0 \n", "198 West Bank and Gaza 199 1 NaN \n", "199 Yemen 200 1 83.0 \n", "200 Zambia 201 3 161.0 \n", "201 Zimbabwe 202 3 101.0 \n", "\n", " Adult literacy rate (%) \\\n", "197 90.3 \n", "198 NaN \n", "199 54.1 \n", "200 68.0 \n", "201 89.5 \n", "\n", " Gross national income per capita (PPP international $) \\\n", "197 2310.0 \n", "198 NaN \n", "199 2090.0 \n", "200 1140.0 \n", "201 NaN \n", "\n", " Net primary school enrolment ratio female (%) \\\n", "197 91.0 \n", "198 NaN \n", "199 65.0 \n", "200 94.0 \n", "201 88.0 \n", "\n", " Net primary school enrolment ratio male (%) \\\n", "197 96.0 \n", "198 NaN \n", "199 85.0 \n", "200 90.0 \n", "201 87.0 \n", "\n", " Population (in thousands) total Population annual growth rate (%) ... \\\n", "197 86206.0 1.4 ... \n", "198 NaN NaN ... \n", "199 21732.0 3.0 ... \n", "200 11696.0 1.9 ... \n", "201 13228.0 0.8 ... \n", "\n", " Total_CO2_emissions Total_income Total_reserves \\\n", "197 101826.23 4.480000e+10 47.11 \n", "198 655.86 3.780000e+09 NaN \n", "199 20148.34 1.150000e+10 114.52 \n", "200 2366.94 4.090000e+09 10.41 \n", "201 11457.33 5.620000e+09 3.39 \n", "\n", " Trade_balance_goods_and_services Under_five_mortality_from_CME \\\n", "197 -1.940000e+09 20.2 \n", "198 NaN 28.0 \n", "199 8.310000e+08 82.4 \n", "200 -4.470000e+08 175.3 \n", "201 -1.710000e+08 106.5 \n", "\n", " Under_five_mortality_from_IHME Under_five_mortality_rate \\\n", "197 23.4 20.2 \n", "198 25.8 28.0 \n", "199 87.9 82.4 \n", "200 163.8 175.3 \n", "201 67.0 106.5 \n", "\n", " Urban_population Urban_population_growth Urban_population_pct_of_total \n", "197 21900000.0 2.90 26.4 \n", "198 2596216.0 3.33 71.6 \n", "199 5759120.5 4.37 27.3 \n", "200 4017411.0 1.95 35.0 \n", "201 4709965.0 1.90 35.9 \n", "\n", "[5 rows x 358 columns]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Aquesta funció admet un paràmetre numèric, especifica el nombre de files\n", "# Per defecte és 5\n", "df_who.tail()" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Index(['Country', 'CountryID', 'Continent', 'Adolescent fertility rate (%)',\n", " 'Adult literacy rate (%)',\n", " 'Gross national income per capita (PPP international $)',\n", " 'Net primary school enrolment ratio female (%)',\n", " 'Net primary school enrolment ratio male (%)',\n", " 'Population (in thousands) total', 'Population annual growth rate (%)',\n", " ...\n", " 'Total_CO2_emissions', 'Total_income', 'Total_reserves',\n", " 'Trade_balance_goods_and_services', 'Under_five_mortality_from_CME',\n", " 'Under_five_mortality_from_IHME', 'Under_five_mortality_rate',\n", " 'Urban_population', 'Urban_population_growth',\n", " 'Urban_population_pct_of_total'],\n", " dtype='object', length=358)" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_who.columns #Ens mostra les columnes del dataframe, es pot indexar com una llista" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "### Carrega de dades (II)\n", "\n", "Els fitxers _csv_ poden tenir informació de maneres no són estàndard: per exemple tenir separadors que no siguin ',' o formats de codificació del text no oberts.\n", "\n", "Podem saber més informació de les possibilitats que ens ofereix la funció *read_csv* si mirem la seva [documentació](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html)\n", "\n", "Anem a veure que passa quan agafem dades d'una administració pública: [enllaç](https://datos.gob.es/es/catalogo?q=bilbao&g-recaptcha-response=&administration_level=L&theme_id=economia&sort=score+desc%2C+metadata_created+desc)\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "ename": "ParserError", "evalue": "Error tokenizing data. C error: Expected 2 fields in line 8, saw 4\n", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mParserError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mdf_gastos\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mread_csv\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"data/presupuesto_gastos_2023.csv\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[1;32m~\\miniconda3\\envs\\fee\\lib\\site-packages\\pandas\\io\\parsers.py\u001b[0m in \u001b[0;36mread_csv\u001b[1;34m(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)\u001b[0m\n\u001b[0;32m 686\u001b[0m )\n\u001b[0;32m 687\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 688\u001b[1;33m \u001b[1;32mreturn\u001b[0m \u001b[0m_read\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mfilepath_or_buffer\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mkwds\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 689\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 690\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;32m~\\miniconda3\\envs\\fee\\lib\\site-packages\\pandas\\io\\parsers.py\u001b[0m in \u001b[0;36m_read\u001b[1;34m(filepath_or_buffer, kwds)\u001b[0m\n\u001b[0;32m 458\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 459\u001b[0m \u001b[1;32mtry\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m--> 460\u001b[1;33m \u001b[0mdata\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mparser\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 461\u001b[0m \u001b[1;32mfinally\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 462\u001b[0m \u001b[0mparser\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mclose\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;32m~\\miniconda3\\envs\\fee\\lib\\site-packages\\pandas\\io\\parsers.py\u001b[0m in \u001b[0;36mread\u001b[1;34m(self, nrows)\u001b[0m\n\u001b[0;32m 1196\u001b[0m \u001b[1;32mdef\u001b[0m \u001b[0mread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mnrows\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;32mNone\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1197\u001b[0m \u001b[0mnrows\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0m_validate_integer\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"nrows\"\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mnrows\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 1198\u001b[1;33m \u001b[0mret\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 1199\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 1200\u001b[0m \u001b[1;31m# May alter columns / col_dict\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;32m~\\miniconda3\\envs\\fee\\lib\\site-packages\\pandas\\io\\parsers.py\u001b[0m in \u001b[0;36mread\u001b[1;34m(self, nrows)\u001b[0m\n\u001b[0;32m 2155\u001b[0m \u001b[1;32mdef\u001b[0m \u001b[0mread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mnrows\u001b[0m\u001b[1;33m=\u001b[0m\u001b[1;32mNone\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2156\u001b[0m \u001b[1;32mtry\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m-> 2157\u001b[1;33m \u001b[0mdata\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_reader\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mread\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mnrows\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 2158\u001b[0m \u001b[1;32mexcept\u001b[0m \u001b[0mStopIteration\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 2159\u001b[0m \u001b[1;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0m_first_chunk\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;32mpandas\\_libs\\parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader.read\u001b[1;34m()\u001b[0m\n", "\u001b[1;32mpandas\\_libs\\parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._read_low_memory\u001b[1;34m()\u001b[0m\n", "\u001b[1;32mpandas\\_libs\\parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._read_rows\u001b[1;34m()\u001b[0m\n", "\u001b[1;32mpandas\\_libs\\parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.TextReader._tokenize_rows\u001b[1;34m()\u001b[0m\n", "\u001b[1;32mpandas\\_libs\\parsers.pyx\u001b[0m in \u001b[0;36mpandas._libs.parsers.raise_parser_error\u001b[1;34m()\u001b[0m\n", "\u001b[1;31mParserError\u001b[0m: Error tokenizing data. C error: Expected 2 fields in line 8, saw 4\n" ] } ], "source": [ "df_gastos = pd.read_csv(\"data/presupuesto_gastos_2023.csv\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df_gastos = pd.read_csv(\"data/presupuesto_gastos_2023.csv\", delimiter=\";\")" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "### Codificació de caràcters\n", "\n", "És un sistema que assigna un valor numèric (codi) a cada caràcter utilitzat en un conjunt de caràcters o alfabet. Aquesta codificació permet representar text i caràcters no només en la forma llegible per als humans, sinó també en una forma que pugui ser emmagatzemada i processada per un ordinador.\n", "\n", "Existeixen moltes codificacions diferents, cada una d'elles amb un conjunt de caràcters i una assignació de codis específica. Alguns exemples inclouen:\n", "\n", "- *ASCII* (American Standard Code for Information Interchange): Una de les codificacions més antigues i bàsiques, que utilitza 7 bits per a representar 128 caràcters, com l'alfabet anglès, números i alguns símbols.\n", "\n", "- *UTF-8* (Unicode Transformation Format - 8 bits): Una codificació Unicode ampliament utilitzada que utilitza 8 bits (1 byte) per a representar una àmplia gamma de caràcters de molts idiomes diferents. És compatible amb ASCII.\n", "\n", "- *UTF-16* (Unicode Transformation Format - 16 bits): Una altra codificació Unicode que utilitza 16 bits (2 bytes) per a representar caràcters Unicode. Pot representar un conjunt encara més gran de caràcters.\n", "\n", "- *ISO-8859-1* (Latin-1): Una codificació àmpliament utilitzada per a representar caràcters llatins, com els utilitzats en molts idiomes europeus.\n", "\n", "\n", "```python\n", "var = \"camión\"\n", "var = \"lul·lià\"\n", "var = \"Ζεύς\"\n", "var = \"ประเทศไทย\"\n", "var = \"日本語で\"\n", "```\n", "\n", "Codificacions:\n", "- [Llistat codificacions](https://docs.python.org/3.11/library/codecs.html#standard-encodings)\n", "- [UTF-8](https://es.wikipedia.org/wiki/UTF-8)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# cp1250 | windows-1250 | Central and Eastern Europe\n", "\n", "# Mètode bàsic: \"Prova i error\" !!!\n", "df_gastos = pd.read_csv(\"data/presupuesto_gastos_2023.csv\", delimiter=\";\", encoding=\"cp1250\")\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df_gastos.head()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "### Guardant un dataframe\n", "\n", "Durant aquest curs aprendrem a modificar els dataframes, afegirem i eliminarem columnes i també modificarem les que ja tenim. Per tant ens serà molt interessant poder crear un fitxer a partir del que tenim en el nostre programa." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df_gastos.to_csv('data/tmp_file.csv', encoding='utf-8') # guardant un dataframe en un fitxer, especificant el format" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df_tmp = pd.read_csv(\"data/tmp_file.csv\") # test codificació i separador" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "### Carrega de dades (III)\n", "\n", "Pandas pot carregar diferents formats de fitxers (csv, json, excel, etc.) els diferents formats tenen les dades organitzades de diferent manera.\n", "\n", "En el següent [enllaç](https://pandas.pydata.org/docs/reference/io.html) podem veure més informació." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "url = \"https://es.wikipedia.org/wiki/Anexo:Comunidades_y_ciudades_aut%C3%B3nomas_de_Espa%C3%B1a\"\n", "comunidades_esp = pd.io.html.read_html(url)\n", "comunidades_esp # Alerta! Encara no és un dataframe! En una web pots trobar més \"taules\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(type(comunidades_esp[0]))\n", "df_comunidades_esp = comunidades_esp[0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "df_comunidades_esp.head() # funciona perfectament??" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Activitat\n", "\n", "En aquesta activitat practicarem la càrrega de dades en diferents formats. Al món real les dades no sempre tenen una estructura i un format com ens agradaríem.\n", "\n", "L'objectiu és que analitzeu i descriviu com ha funcionat la càrrega d'aquestes dades amb les originals mitjançant l'ús de funcions de pandas i de les vostres pròpies observacions :
\n", "- Quina dimensió tenen les dades reals i carregades?\n", "- Quines columnes?\n", "- El concepte de columna com atribut o característica i el concepte de fila com a mostra està present en l'estructura de les dades?\n", "- Coincideixen amb la informació del fitxer?" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### A) Municipios y fenómeno demográfico de les Illes Balears\n", "\n", "- **Font:** Descarregueu el fitxer en format csv. [enllaç](https://datos.gob.es/es/catalogo/ea0010587-balears-illes-por-municipios-y-fenomeno-demografico-mnpd-identificador-api-t20-e301-fenom-a2020-l0-23007-px1).\n", "- **Extra:** També podeu provar de carregar en format json.\n", "- **Fitxer:** També el trobareu a: \"data/municipis.csv\"\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "#TODO" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### B) Speculation Watch List\n", "\n", "- **Font:** [enllaç](https://data.cityofnewyork.us/Housing-Development/Speculation-Watch-List/adax-9mit)\n", "- **Fitxer:** També el trobareu a: \"data/Speculation_Watch_List.csv\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "#TODO" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### C) Europe Inflation monthly data (annual rate of change)\n", "\n", "- **Font:** [enllaç](https://ec.europa.eu/eurostat/databrowser/view/PRC_HICP_MANR__custom_3761882/bookmark/table?lang=en&bookmarkId=4ad27e6f-358a-4a3d-82a0-587d69a833eb)\n", "- **Fitxer:** També el podeu trobar a: \"data/prc_hicp_manr__custom_3761882_spreadsheet.xlsx\"\n", "\n", "- **Nota:**\n", " - Com podem agafar una pàgina concreta del full de càlcul? [documentació](https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html)\n", " - Per aquesta activitat necessitem instal·lar una llibreria específica: executa la següent cel·la" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install openpyxl" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "import pandas as pd\n", "#TODO\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### D) Taula de Naixements\n", "- **Font:** [enllaç](https://ibestat.caib.es/ibestat/estadistiques/poblacio/naixements/414cab4f-b402-4cd1-af05-6617443de384)\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "#TODO\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### E) Nights spent at tourist accommodation establishments by residents/non-residents - monthly data\n", "- **Font:** [enllaç](https://ec.europa.eu/eurostat/databrowser/view/tin00171/default/table?lang=en)\n", "- **Fitxer:** També el podeu trobar a: \"data/tin00171_linear.csv.gz\"
\n", "- **Nota:** Els fitxers comprimits amb format .gz podem obrir-se directament com si fossin fitxers de dades amb pandas, i en aquest cas és un fitxer del tipus CSV. És a dir, no cal descomprimir.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "#TODO" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Activitat 2 del homework\n", "\n", "Com a final del dia d'avui i per la setmana que vé heu d'entregar aquesta activitat." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "[![License: CC BY 4.0](https://img.shields.io/badge/License-CC_BY_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
\n", "Isaac Lera and Gabriel Moya
\n", "Universitat de les Illes Balears
\n", "isaac.lera@uib.edu, gabriel.moya@uib.edu" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3.9.7 64-bit ('3.9.7')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" }, "orig_nbformat": 4, "vscode": { "interpreter": { "hash": "1dd6de46b7c7be538353ccdeaa948031a5b4d43665739a26f55ca28b247c53cc" } } }, "nbformat": 4, "nbformat_minor": 2 }