{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## xwmooc R Meetup 4회차 \n",
"### 날짜 : 11/15(수) 토즈 강남점 오후 7시 30분 ~ 오후 10시 30분\n",
"### 발표자 : 이상열\n",
"\n",
"### Reference \n",
"- https://github.com/catboost/catboost/blob/master/catboost/tutorials/catboost_r_tutorial.ipynb\n",
"- https://www.slideshare.net/MiguelFierro1/speeding-up-machinelearning-applications-with-the-lightgbm-library\n",
"\n",
"### CatBoost: A machine learning library to handle categorical (CAT) data automatically\n",
"\n",
"## “CatBoost” name comes from two words “Category” and “Boosting”.\n",
"\n",
"![boosting](./img/boosting_1.png)\n",
"![boosting](./img/xgboost.jpg)\n",
"![boosting](./img/lightgbm.jpeg)\n",
"\n",
"![boosting](./img/xgbvslightgbm1.png)\n",
"![boosting](./img/xgbvslightgbm2.png)\n",
"![boosting](./img/xgbvslightgbm3.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Python install\n",
"```\n",
"pip install catboost\n",
"```\n",
"\n",
"### R install\n",
"```\n",
"install.packages('devtools')\n",
"devtools::install_github('catboost/catboost', subdir = 'catboost/R-package')\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"library(caret)\n",
"library(titanic)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![catboost_1](./img/catboost_1.png)\n",
"\n",
"### Catboost 장점\n",
"- 성능 : CatBoost는 최신 기술의 결과를 제공하며 성능 향상에 앞선 모든 주요 기계 학습 알고리즘과 경쟁합니다.\n",
"- 범주형 기능 자동 처리 : 명시적 사전 처리없이 CatBoost를 사용하여 범주를 숫자로 변환 할 수 있습니다. CatBoost는 범주 형 기능과 범주 형 및 숫자 형 조합의 대한 다양한 통계를 사용하여 범주 형 값을 숫자로 변환합니다. \n",
"- 견고성 : 광범위한 하이퍼 파라미터 튜닝에 대한 필요성을 줄이고 오버피팅 (overfitting) 기회를 낮추어 더 일반화 된 모델로 이끕니다. CatBoost에는 조정할 여러 매개 변수가 있지만 나무 수, 학습률, 정규화, 트리 깊이, 접기 크기, 배깅 온도 등과 같은 매개 변수가 포함되어 있습니다. 여기서 모든 매개 변수에 대해 읽을 수 있습니다.\n",
"- 사용하기 간편함 : Python과 R 모두에 사용자 친화적 인 API를 사용하여 명령 행에서 CatBoost를 사용할 수 있습니다.\n",
"\n",
"### Catboost에서 카테고리 변수를 numeric 변수로 바꾸는 방법\n",
"- Permutating the set of input objects in a random order.\n",
"- Converting the target from a floating point to an integer.\n",
" - The number of borders for target binarization. Only used for regression problems. Allowed values are integers from 1 to 255 inclusively. The default value is 1.\n",
" \n",
"- Binarization\n",
" - 학습 전에 객체의 가능한 값은 임계 값 (스플릿)으로 구분 된 분리 된 범위 (버킷)로 나뉩니다. \n",
" - 이진화의 크기(스플릿 수)는 시작 매개 변수에 의해 결정됩니다 (숫자 피처 및 숫자 피처로 변환 한 결과로 얻은 숫자).\n",
"\n",
"![catboost_1](./img/catboost_2.png)\n",
"\n",
"![catboost_1](./img/catboost_3.png)\n",
"\n",
"![catboost_1](./img/catboost_4.png)\n",
"\n",
"![catboost_1](./img/catboost_5.png)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"pool_path = system.file(\"extdata\", \"adult_train.1000\", package = \"catboost\")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"'/Library/Frameworks/R.framework/Versions/3.3/Resources/library/catboost/extdata/adult_train.1000'"
],
"text/latex": [
"'/Library/Frameworks/R.framework/Versions/3.3/Resources/library/catboost/extdata/adult\\_train.1000'"
],
"text/markdown": [
"'/Library/Frameworks/R.framework/Versions/3.3/Resources/library/catboost/extdata/adult_train.1000'"
],
"text/plain": [
"[1] \"/Library/Frameworks/R.framework/Versions/3.3/Resources/library/catboost/extdata/adult_train.1000\""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"pool_path"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"column_description_path = system.file(\"extdata\", \"adult.cd\", package = \"catboost\")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"pool <- catboost.load_pool(pool_path, column_description = column_description_path)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"catboost.Pool\n",
"1000 rows, 14 columns"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"pool"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\t\n",
"\t- 1
\n",
"\t- 1
\n",
"\t- 28
\n",
"\t- 3.89195830301646e+36
\n",
"\t- 120135
\n",
"\t- -1.04016785957128e-34
\n",
"\t- 11
\n",
"\t- 1.26145722489731e+32
\n",
"\t- -371032621056
\n",
"\t- 8.07870890394598e-34
\n",
"\t- -9.78215504232611e+30
\n",
"\t- -9.04798744184095e-38
\n",
"\t- 0
\n",
"\t- 0
\n",
"\t- 40
\n",
"\t- 1.21962493117738e+24
\n",
"
\n",
" \n",
"\t\n",
"\t- 1
\n",
"\t- 1
\n",
"\t- 49
\n",
"\t- -1.81610708810891e-18
\n",
"\t- 57665
\n",
"\t- 5.92781046506593e-19
\n",
"\t- 13
\n",
"\t- -1601.40100097656
\n",
"\t- -1.81610708810891e-18
\n",
"\t- -1.10627302684918e+32
\n",
"\t- -9.78215504232611e+30
\n",
"\t- -9.04798744184095e-38
\n",
"\t- 0
\n",
"\t- 0
\n",
"\t- 40
\n",
"\t- 1.21962493117738e+24
\n",
"
\n",
" \n",
"\t\n",
"\t- 1
\n",
"\t- 1
\n",
"\t- 30
\n",
"\t- 3.89195830301646e+36
\n",
"\t- 496414
\n",
"\t- -1.74022152049343e+23
\n",
"\t- 16
\n",
"\t- 1.26145722489731e+32
\n",
"\t- 1.14009086947129e-25
\n",
"\t- 8.07870890394598e-34
\n",
"\t- -9.78215504232611e+30
\n",
"\t- -3.16386135068569e-08
\n",
"\t- 0
\n",
"\t- 0
\n",
"\t- 40
\n",
"\t- -1.81610708810891e-18
\n",
"
\n",
" \n",
"\t\n",
"\t- 1
\n",
"\t- 1
\n",
"\t- 55
\n",
"\t- 3.89195830301646e+36
\n",
"\t- 353881
\n",
"\t- 9.32786036967877e+35
\n",
"\t- 4
\n",
"\t- -2.32643418585933e-34
\n",
"\t- -3.51661115686911e-30
\n",
"\t- 9.09455348664148e-37
\n",
"\t- -9.78215504232611e+30
\n",
"\t- -3.16386135068569e-08
\n",
"\t- 0
\n",
"\t- 0
\n",
"\t- 50
\n",
"\t- 1.21962493117738e+24
\n",
"
\n",
" \n",
"\t\n",
"\t- 1
\n",
"\t- 1
\n",
"\t- 34
\n",
"\t- -6.04938814187138e+22
\n",
"\t- 355700
\n",
"\t- 4.16527045605261e-18
\n",
"\t- 9
\n",
"\t- 4.24491702233354e-07
\n",
"\t- 3.7142509766905e-21
\n",
"\t- -0.00183462153654546
\n",
"\t- -9.78215504232611e+30
\n",
"\t- -9.04798744184095e-38
\n",
"\t- 0
\n",
"\t- 0
\n",
"\t- 20
\n",
"\t- 1.21962493117738e+24
\n",
"
\n",
" \n",
"
\n"
],
"text/latex": [
"\\begin{enumerate}\n",
"\\item \\begin{enumerate*}\n",
"\\item 1\n",
"\\item 1\n",
"\\item 28\n",
"\\item 3.89195830301646e+36\n",
"\\item 120135\n",
"\\item -1.04016785957128e-34\n",
"\\item 11\n",
"\\item 1.26145722489731e+32\n",
"\\item -371032621056\n",
"\\item 8.07870890394598e-34\n",
"\\item -9.78215504232611e+30\n",
"\\item -9.04798744184095e-38\n",
"\\item 0\n",
"\\item 0\n",
"\\item 40\n",
"\\item 1.21962493117738e+24\n",
"\\end{enumerate*}\n",
"\n",
"\\item \\begin{enumerate*}\n",
"\\item 1\n",
"\\item 1\n",
"\\item 49\n",
"\\item -1.81610708810891e-18\n",
"\\item 57665\n",
"\\item 5.92781046506593e-19\n",
"\\item 13\n",
"\\item -1601.40100097656\n",
"\\item -1.81610708810891e-18\n",
"\\item -1.10627302684918e+32\n",
"\\item -9.78215504232611e+30\n",
"\\item -9.04798744184095e-38\n",
"\\item 0\n",
"\\item 0\n",
"\\item 40\n",
"\\item 1.21962493117738e+24\n",
"\\end{enumerate*}\n",
"\n",
"\\item \\begin{enumerate*}\n",
"\\item 1\n",
"\\item 1\n",
"\\item 30\n",
"\\item 3.89195830301646e+36\n",
"\\item 496414\n",
"\\item -1.74022152049343e+23\n",
"\\item 16\n",
"\\item 1.26145722489731e+32\n",
"\\item 1.14009086947129e-25\n",
"\\item 8.07870890394598e-34\n",
"\\item -9.78215504232611e+30\n",
"\\item -3.16386135068569e-08\n",
"\\item 0\n",
"\\item 0\n",
"\\item 40\n",
"\\item -1.81610708810891e-18\n",
"\\end{enumerate*}\n",
"\n",
"\\item \\begin{enumerate*}\n",
"\\item 1\n",
"\\item 1\n",
"\\item 55\n",
"\\item 3.89195830301646e+36\n",
"\\item 353881\n",
"\\item 9.32786036967877e+35\n",
"\\item 4\n",
"\\item -2.32643418585933e-34\n",
"\\item -3.51661115686911e-30\n",
"\\item 9.09455348664148e-37\n",
"\\item -9.78215504232611e+30\n",
"\\item -3.16386135068569e-08\n",
"\\item 0\n",
"\\item 0\n",
"\\item 50\n",
"\\item 1.21962493117738e+24\n",
"\\end{enumerate*}\n",
"\n",
"\\item \\begin{enumerate*}\n",
"\\item 1\n",
"\\item 1\n",
"\\item 34\n",
"\\item -6.04938814187138e+22\n",
"\\item 355700\n",
"\\item 4.16527045605261e-18\n",
"\\item 9\n",
"\\item 4.24491702233354e-07\n",
"\\item 3.7142509766905e-21\n",
"\\item -0.00183462153654546\n",
"\\item -9.78215504232611e+30\n",
"\\item -9.04798744184095e-38\n",
"\\item 0\n",
"\\item 0\n",
"\\item 20\n",
"\\item 1.21962493117738e+24\n",
"\\end{enumerate*}\n",
"\n",
"\\end{enumerate}\n"
],
"text/markdown": [
"1. 1. 1\n",
"2. 1\n",
"3. 28\n",
"4. 3.89195830301646e+36\n",
"5. 120135\n",
"6. -1.04016785957128e-34\n",
"7. 11\n",
"8. 1.26145722489731e+32\n",
"9. -371032621056\n",
"10. 8.07870890394598e-34\n",
"11. -9.78215504232611e+30\n",
"12. -9.04798744184095e-38\n",
"13. 0\n",
"14. 0\n",
"15. 40\n",
"16. 1.21962493117738e+24\n",
"\n",
"\n",
"\n",
"2. 1. 1\n",
"2. 1\n",
"3. 49\n",
"4. -1.81610708810891e-18\n",
"5. 57665\n",
"6. 5.92781046506593e-19\n",
"7. 13\n",
"8. -1601.40100097656\n",
"9. -1.81610708810891e-18\n",
"10. -1.10627302684918e+32\n",
"11. -9.78215504232611e+30\n",
"12. -9.04798744184095e-38\n",
"13. 0\n",
"14. 0\n",
"15. 40\n",
"16. 1.21962493117738e+24\n",
"\n",
"\n",
"\n",
"3. 1. 1\n",
"2. 1\n",
"3. 30\n",
"4. 3.89195830301646e+36\n",
"5. 496414\n",
"6. -1.74022152049343e+23\n",
"7. 16\n",
"8. 1.26145722489731e+32\n",
"9. 1.14009086947129e-25\n",
"10. 8.07870890394598e-34\n",
"11. -9.78215504232611e+30\n",
"12. -3.16386135068569e-08\n",
"13. 0\n",
"14. 0\n",
"15. 40\n",
"16. -1.81610708810891e-18\n",
"\n",
"\n",
"\n",
"4. 1. 1\n",
"2. 1\n",
"3. 55\n",
"4. 3.89195830301646e+36\n",
"5. 353881\n",
"6. 9.32786036967877e+35\n",
"7. 4\n",
"8. -2.32643418585933e-34\n",
"9. -3.51661115686911e-30\n",
"10. 9.09455348664148e-37\n",
"11. -9.78215504232611e+30\n",
"12. -3.16386135068569e-08\n",
"13. 0\n",
"14. 0\n",
"15. 50\n",
"16. 1.21962493117738e+24\n",
"\n",
"\n",
"\n",
"5. 1. 1\n",
"2. 1\n",
"3. 34\n",
"4. -6.04938814187138e+22\n",
"5. 355700\n",
"6. 4.16527045605261e-18\n",
"7. 9\n",
"8. 4.24491702233354e-07\n",
"9. 3.7142509766905e-21\n",
"10. -0.00183462153654546\n",
"11. -9.78215504232611e+30\n",
"12. -9.04798744184095e-38\n",
"13. 0\n",
"14. 0\n",
"15. 20\n",
"16. 1.21962493117738e+24\n",
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"[[1]]\n",
" [1] 1.000000e+00 1.000000e+00 2.800000e+01 3.891958e+36 1.201350e+05\n",
" [6] -1.040168e-34 1.100000e+01 1.261457e+32 -3.710326e+11 8.078709e-34\n",
"[11] -9.782155e+30 -9.047987e-38 0.000000e+00 0.000000e+00 4.000000e+01\n",
"[16] 1.219625e+24\n",
"\n",
"[[2]]\n",
" [1] 1.000000e+00 1.000000e+00 4.900000e+01 -1.816107e-18 5.766500e+04\n",
" [6] 5.927810e-19 1.300000e+01 -1.601401e+03 -1.816107e-18 -1.106273e+32\n",
"[11] -9.782155e+30 -9.047987e-38 0.000000e+00 0.000000e+00 4.000000e+01\n",
"[16] 1.219625e+24\n",
"\n",
"[[3]]\n",
" [1] 1.000000e+00 1.000000e+00 3.000000e+01 3.891958e+36 4.964140e+05\n",
" [6] -1.740222e+23 1.600000e+01 1.261457e+32 1.140091e-25 8.078709e-34\n",
"[11] -9.782155e+30 -3.163861e-08 0.000000e+00 0.000000e+00 4.000000e+01\n",
"[16] -1.816107e-18\n",
"\n",
"[[4]]\n",
" [1] 1.000000e+00 1.000000e+00 5.500000e+01 3.891958e+36 3.538810e+05\n",
" [6] 9.327860e+35 4.000000e+00 -2.326434e-34 -3.516611e-30 9.094553e-37\n",
"[11] -9.782155e+30 -3.163861e-08 0.000000e+00 0.000000e+00 5.000000e+01\n",
"[16] 1.219625e+24\n",
"\n",
"[[5]]\n",
" [1] 1.000000e+00 1.000000e+00 3.400000e+01 -6.049388e+22 3.557000e+05\n",
" [6] 4.165270e-18 9.000000e+00 4.244917e-07 3.714251e-21 -1.834622e-03\n",
"[11] -9.782155e+30 -9.047987e-38 0.000000e+00 0.000000e+00 2.000000e+01\n",
"[16] 1.219625e+24\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"head(pool, 5)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"data = read.table(\"http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data\",\n",
" sep=\",\",header=F,col.names=c(\"age\", \"type_employer\", \"fnlwgt\", \"education\", \n",
" \"education_num\",\"marital\", \"occupation\", \"relationship\", \"race\",\"sex\",\n",
" \"capital_gain\", \"capital_loss\", \"hr_per_week\",\"country\", \"income\"),\n",
" fill=FALSE,strip.white=T)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"age | type_employer | fnlwgt | education | education_num | marital | occupation | relationship | race | sex | capital_gain | capital_loss | hr_per_week | country | income |
\n",
"\n",
"\t39 | State-gov | 77516 | Bachelors | 13 | Never-married | Adm-clerical | Not-in-family | White | Male | 2174 | 0 | 40 | United-States | <=50K |
\n",
"\t50 | Self-emp-not-inc | 83311 | Bachelors | 13 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0 | 0 | 13 | United-States | <=50K |
\n",
"\t38 | Private | 215646 | HS-grad | 9 | Divorced | Handlers-cleaners | Not-in-family | White | Male | 0 | 0 | 40 | United-States | <=50K |
\n",
"\t53 | Private | 234721 | 11th | 7 | Married-civ-spouse | Handlers-cleaners | Husband | Black | Male | 0 | 0 | 40 | United-States | <=50K |
\n",
"\t28 | Private | 338409 | Bachelors | 13 | Married-civ-spouse | Prof-specialty | Wife | Black | Female | 0 | 0 | 40 | Cuba | <=50K |
\n",
"\t37 | Private | 284582 | Masters | 14 | Married-civ-spouse | Exec-managerial | Wife | White | Female | 0 | 0 | 40 | United-States | <=50K |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllllllllllllll}\n",
" age & type\\_employer & fnlwgt & education & education\\_num & marital & occupation & relationship & race & sex & capital\\_gain & capital\\_loss & hr\\_per\\_week & country & income\\\\\n",
"\\hline\n",
"\t 39 & State-gov & 77516 & Bachelors & 13 & Never-married & Adm-clerical & Not-in-family & White & Male & 2174 & 0 & 40 & United-States & <=50K \\\\\n",
"\t 50 & Self-emp-not-inc & 83311 & Bachelors & 13 & Married-civ-spouse & Exec-managerial & Husband & White & Male & 0 & 0 & 13 & United-States & <=50K \\\\\n",
"\t 38 & Private & 215646 & HS-grad & 9 & Divorced & Handlers-cleaners & Not-in-family & White & Male & 0 & 0 & 40 & United-States & <=50K \\\\\n",
"\t 53 & Private & 234721 & 11th & 7 & Married-civ-spouse & Handlers-cleaners & Husband & Black & Male & 0 & 0 & 40 & United-States & <=50K \\\\\n",
"\t 28 & Private & 338409 & Bachelors & 13 & Married-civ-spouse & Prof-specialty & Wife & Black & Female & 0 & 0 & 40 & Cuba & <=50K \\\\\n",
"\t 37 & Private & 284582 & Masters & 14 & Married-civ-spouse & Exec-managerial & Wife & White & Female & 0 & 0 & 40 & United-States & <=50K \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"age | type_employer | fnlwgt | education | education_num | marital | occupation | relationship | race | sex | capital_gain | capital_loss | hr_per_week | country | income | \n",
"|---|---|---|---|---|---|\n",
"| 39 | State-gov | 77516 | Bachelors | 13 | Never-married | Adm-clerical | Not-in-family | White | Male | 2174 | 0 | 40 | United-States | <=50K | \n",
"| 50 | Self-emp-not-inc | 83311 | Bachelors | 13 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0 | 0 | 13 | United-States | <=50K | \n",
"| 38 | Private | 215646 | HS-grad | 9 | Divorced | Handlers-cleaners | Not-in-family | White | Male | 0 | 0 | 40 | United-States | <=50K | \n",
"| 53 | Private | 234721 | 11th | 7 | Married-civ-spouse | Handlers-cleaners | Husband | Black | Male | 0 | 0 | 40 | United-States | <=50K | \n",
"| 28 | Private | 338409 | Bachelors | 13 | Married-civ-spouse | Prof-specialty | Wife | Black | Female | 0 | 0 | 40 | Cuba | <=50K | \n",
"| 37 | Private | 284582 | Masters | 14 | Married-civ-spouse | Exec-managerial | Wife | White | Female | 0 | 0 | 40 | United-States | <=50K | \n",
"\n",
"\n"
],
"text/plain": [
" age type_employer fnlwgt education education_num marital \n",
"1 39 State-gov 77516 Bachelors 13 Never-married \n",
"2 50 Self-emp-not-inc 83311 Bachelors 13 Married-civ-spouse\n",
"3 38 Private 215646 HS-grad 9 Divorced \n",
"4 53 Private 234721 11th 7 Married-civ-spouse\n",
"5 28 Private 338409 Bachelors 13 Married-civ-spouse\n",
"6 37 Private 284582 Masters 14 Married-civ-spouse\n",
" occupation relationship race sex capital_gain capital_loss\n",
"1 Adm-clerical Not-in-family White Male 2174 0 \n",
"2 Exec-managerial Husband White Male 0 0 \n",
"3 Handlers-cleaners Not-in-family White Male 0 0 \n",
"4 Handlers-cleaners Husband Black Male 0 0 \n",
"5 Prof-specialty Wife Black Female 0 0 \n",
"6 Exec-managerial Wife White Female 0 0 \n",
" hr_per_week country income\n",
"1 40 United-States <=50K \n",
"2 13 United-States <=50K \n",
"3 40 United-States <=50K \n",
"4 40 United-States <=50K \n",
"5 40 Cuba <=50K \n",
"6 40 United-States <=50K "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#The dataset I will be using for this tutorial is the “Adult” dataset hosted on UCI’s Machine Learning Repository. \n",
"head(data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- 카테고리 기능은 고유한 방법(예 : 문자열 해시)을 사용하여 숫자 열로 변형되어야합니다. \n",
"- cat_features 벡터의 인덱스는 0부터 시작하며 .cd 파일의 인덱스와 다를 수 있습니다."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"pool_path = system.file(\"extdata\", \"adult_train.1000\", package=\"catboost\")\n",
"\n",
"column_description_vector = rep('numeric', 15)\n",
"cat_features <- c(3, 5, 7, 8, 9, 10, 11, 15)\n",
"for (i in cat_features)\n",
" column_description_vector[i] <- 'factor'\n",
"\n",
"data <- read.table(pool_path, head = F, sep = \"\\t\", colClasses = column_description_vector, na.strings='NAN')"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | V11 | V12 | V13 | V14 | V15 |
\n",
"\n",
"\t1 | 28 | Private | 120135 | Assoc-voc | 11 | Never-married | Sales | Not-in-family | White | Female | 0 | 0 | 40 | United-States |
\n",
"\t1 | 49 | ? | 57665 | Bachelors | 13 | Divorced | ? | Own-child | White | Female | 0 | 0 | 40 | United-States |
\n",
"\t1 | 30 | Private | 496414 | Doctorate | 16 | Never-married | Prof-specialty | Not-in-family | White | Male | 0 | 0 | 40 | ? |
\n",
"\t1 | 55 | Private | 353881 | 7th-8th | 4 | Married-civ-spouse | Transport-moving | Husband | White | Male | 0 | 0 | 50 | United-States |
\n",
"\t1 | 34 | State-gov | 355700 | HS-grad | 9 | Separated | Adm-clerical | Unmarried | White | Female | 0 | 0 | 20 | United-States |
\n",
"\t1 | 23 | Private | 231160 | Some-college | 10 | Never-married | Other-service | Own-child | White | Male | 0 | 0 | 25 | United-States |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllllllllllllll}\n",
" V1 & V2 & V3 & V4 & V5 & V6 & V7 & V8 & V9 & V10 & V11 & V12 & V13 & V14 & V15\\\\\n",
"\\hline\n",
"\t 1 & 28 & Private & 120135 & Assoc-voc & 11 & Never-married & Sales & Not-in-family & White & Female & 0 & 0 & 40 & United-States \\\\\n",
"\t 1 & 49 & ? & 57665 & Bachelors & 13 & Divorced & ? & Own-child & White & Female & 0 & 0 & 40 & United-States \\\\\n",
"\t 1 & 30 & Private & 496414 & Doctorate & 16 & Never-married & Prof-specialty & Not-in-family & White & Male & 0 & 0 & 40 & ? \\\\\n",
"\t 1 & 55 & Private & 353881 & 7th-8th & 4 & Married-civ-spouse & Transport-moving & Husband & White & Male & 0 & 0 & 50 & United-States \\\\\n",
"\t 1 & 34 & State-gov & 355700 & HS-grad & 9 & Separated & Adm-clerical & Unmarried & White & Female & 0 & 0 & 20 & United-States \\\\\n",
"\t 1 & 23 & Private & 231160 & Some-college & 10 & Never-married & Other-service & Own-child & White & Male & 0 & 0 & 25 & United-States \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | V11 | V12 | V13 | V14 | V15 | \n",
"|---|---|---|---|---|---|\n",
"| 1 | 28 | Private | 120135 | Assoc-voc | 11 | Never-married | Sales | Not-in-family | White | Female | 0 | 0 | 40 | United-States | \n",
"| 1 | 49 | ? | 57665 | Bachelors | 13 | Divorced | ? | Own-child | White | Female | 0 | 0 | 40 | United-States | \n",
"| 1 | 30 | Private | 496414 | Doctorate | 16 | Never-married | Prof-specialty | Not-in-family | White | Male | 0 | 0 | 40 | ? | \n",
"| 1 | 55 | Private | 353881 | 7th-8th | 4 | Married-civ-spouse | Transport-moving | Husband | White | Male | 0 | 0 | 50 | United-States | \n",
"| 1 | 34 | State-gov | 355700 | HS-grad | 9 | Separated | Adm-clerical | Unmarried | White | Female | 0 | 0 | 20 | United-States | \n",
"| 1 | 23 | Private | 231160 | Some-college | 10 | Never-married | Other-service | Own-child | White | Male | 0 | 0 | 25 | United-States | \n",
"\n",
"\n"
],
"text/plain": [
" V1 V2 V3 V4 V5 V6 V7 V8 \n",
"1 1 28 Private 120135 Assoc-voc 11 Never-married Sales \n",
"2 1 49 ? 57665 Bachelors 13 Divorced ? \n",
"3 1 30 Private 496414 Doctorate 16 Never-married Prof-specialty \n",
"4 1 55 Private 353881 7th-8th 4 Married-civ-spouse Transport-moving\n",
"5 1 34 State-gov 355700 HS-grad 9 Separated Adm-clerical \n",
"6 1 23 Private 231160 Some-college 10 Never-married Other-service \n",
" V9 V10 V11 V12 V13 V14 V15 \n",
"1 Not-in-family White Female 0 0 40 United-States\n",
"2 Own-child White Female 0 0 40 United-States\n",
"3 Not-in-family White Male 0 0 40 ? \n",
"4 Husband White Male 0 0 50 United-States\n",
"5 Unmarried White Female 0 0 20 United-States\n",
"6 Own-child White Male 0 0 25 United-States"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"head(data)"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\t\n",
"\t- 1
\n",
"\t- 1
\n",
"\t- 28
\n",
"\t- 4
\n",
"\t- 120135
\n",
"\t- 9
\n",
"\t- 11
\n",
"\t- 5
\n",
"\t- 12
\n",
"\t- 2
\n",
"\t- 5
\n",
"\t- 1
\n",
"\t- 0
\n",
"\t- 0
\n",
"\t- 40
\n",
"\t- 32
\n",
"
\n",
" \n",
"\t\n",
"\t- 1
\n",
"\t- 1
\n",
"\t- 49
\n",
"\t- 1
\n",
"\t- 57665
\n",
"\t- 10
\n",
"\t- 13
\n",
"\t- 1
\n",
"\t- 1
\n",
"\t- 4
\n",
"\t- 5
\n",
"\t- 1
\n",
"\t- 0
\n",
"\t- 0
\n",
"\t- 40
\n",
"\t- 32
\n",
"
\n",
" \n",
"\t\n",
"\t- 1
\n",
"\t- 1
\n",
"\t- 30
\n",
"\t- 4
\n",
"\t- 496414
\n",
"\t- 11
\n",
"\t- 16
\n",
"\t- 5
\n",
"\t- 10
\n",
"\t- 2
\n",
"\t- 5
\n",
"\t- 2
\n",
"\t- 0
\n",
"\t- 0
\n",
"\t- 40
\n",
"\t- 1
\n",
"
\n",
" \n",
"\t\n",
"\t- 1
\n",
"\t- 1
\n",
"\t- 55
\n",
"\t- 4
\n",
"\t- 353881
\n",
"\t- 6
\n",
"\t- 4
\n",
"\t- 3
\n",
"\t- 14
\n",
"\t- 1
\n",
"\t- 5
\n",
"\t- 2
\n",
"\t- 0
\n",
"\t- 0
\n",
"\t- 50
\n",
"\t- 32
\n",
"
\n",
" \n",
"\t\n",
"\t- 1
\n",
"\t- 1
\n",
"\t- 34
\n",
"\t- 7
\n",
"\t- 355700
\n",
"\t- 12
\n",
"\t- 9
\n",
"\t- 6
\n",
"\t- 2
\n",
"\t- 5
\n",
"\t- 5
\n",
"\t- 1
\n",
"\t- 0
\n",
"\t- 0
\n",
"\t- 20
\n",
"\t- 32
\n",
"
\n",
" \n",
"\t\n",
"\t- 1
\n",
"\t- 1
\n",
"\t- 23
\n",
"\t- 4
\n",
"\t- 231160
\n",
"\t- 16
\n",
"\t- 10
\n",
"\t- 5
\n",
"\t- 8
\n",
"\t- 4
\n",
"\t- 5
\n",
"\t- 2
\n",
"\t- 0
\n",
"\t- 0
\n",
"\t- 25
\n",
"\t- 32
\n",
"
\n",
" \n",
"
\n"
],
"text/latex": [
"\\begin{enumerate}\n",
"\\item \\begin{enumerate*}\n",
"\\item 1\n",
"\\item 1\n",
"\\item 28\n",
"\\item 4\n",
"\\item 120135\n",
"\\item 9\n",
"\\item 11\n",
"\\item 5\n",
"\\item 12\n",
"\\item 2\n",
"\\item 5\n",
"\\item 1\n",
"\\item 0\n",
"\\item 0\n",
"\\item 40\n",
"\\item 32\n",
"\\end{enumerate*}\n",
"\n",
"\\item \\begin{enumerate*}\n",
"\\item 1\n",
"\\item 1\n",
"\\item 49\n",
"\\item 1\n",
"\\item 57665\n",
"\\item 10\n",
"\\item 13\n",
"\\item 1\n",
"\\item 1\n",
"\\item 4\n",
"\\item 5\n",
"\\item 1\n",
"\\item 0\n",
"\\item 0\n",
"\\item 40\n",
"\\item 32\n",
"\\end{enumerate*}\n",
"\n",
"\\item \\begin{enumerate*}\n",
"\\item 1\n",
"\\item 1\n",
"\\item 30\n",
"\\item 4\n",
"\\item 496414\n",
"\\item 11\n",
"\\item 16\n",
"\\item 5\n",
"\\item 10\n",
"\\item 2\n",
"\\item 5\n",
"\\item 2\n",
"\\item 0\n",
"\\item 0\n",
"\\item 40\n",
"\\item 1\n",
"\\end{enumerate*}\n",
"\n",
"\\item \\begin{enumerate*}\n",
"\\item 1\n",
"\\item 1\n",
"\\item 55\n",
"\\item 4\n",
"\\item 353881\n",
"\\item 6\n",
"\\item 4\n",
"\\item 3\n",
"\\item 14\n",
"\\item 1\n",
"\\item 5\n",
"\\item 2\n",
"\\item 0\n",
"\\item 0\n",
"\\item 50\n",
"\\item 32\n",
"\\end{enumerate*}\n",
"\n",
"\\item \\begin{enumerate*}\n",
"\\item 1\n",
"\\item 1\n",
"\\item 34\n",
"\\item 7\n",
"\\item 355700\n",
"\\item 12\n",
"\\item 9\n",
"\\item 6\n",
"\\item 2\n",
"\\item 5\n",
"\\item 5\n",
"\\item 1\n",
"\\item 0\n",
"\\item 0\n",
"\\item 20\n",
"\\item 32\n",
"\\end{enumerate*}\n",
"\n",
"\\item \\begin{enumerate*}\n",
"\\item 1\n",
"\\item 1\n",
"\\item 23\n",
"\\item 4\n",
"\\item 231160\n",
"\\item 16\n",
"\\item 10\n",
"\\item 5\n",
"\\item 8\n",
"\\item 4\n",
"\\item 5\n",
"\\item 2\n",
"\\item 0\n",
"\\item 0\n",
"\\item 25\n",
"\\item 32\n",
"\\end{enumerate*}\n",
"\n",
"\\end{enumerate}\n"
],
"text/markdown": [
"1. 1. 1\n",
"2. 1\n",
"3. 28\n",
"4. 4\n",
"5. 120135\n",
"6. 9\n",
"7. 11\n",
"8. 5\n",
"9. 12\n",
"10. 2\n",
"11. 5\n",
"12. 1\n",
"13. 0\n",
"14. 0\n",
"15. 40\n",
"16. 32\n",
"\n",
"\n",
"\n",
"2. 1. 1\n",
"2. 1\n",
"3. 49\n",
"4. 1\n",
"5. 57665\n",
"6. 10\n",
"7. 13\n",
"8. 1\n",
"9. 1\n",
"10. 4\n",
"11. 5\n",
"12. 1\n",
"13. 0\n",
"14. 0\n",
"15. 40\n",
"16. 32\n",
"\n",
"\n",
"\n",
"3. 1. 1\n",
"2. 1\n",
"3. 30\n",
"4. 4\n",
"5. 496414\n",
"6. 11\n",
"7. 16\n",
"8. 5\n",
"9. 10\n",
"10. 2\n",
"11. 5\n",
"12. 2\n",
"13. 0\n",
"14. 0\n",
"15. 40\n",
"16. 1\n",
"\n",
"\n",
"\n",
"4. 1. 1\n",
"2. 1\n",
"3. 55\n",
"4. 4\n",
"5. 353881\n",
"6. 6\n",
"7. 4\n",
"8. 3\n",
"9. 14\n",
"10. 1\n",
"11. 5\n",
"12. 2\n",
"13. 0\n",
"14. 0\n",
"15. 50\n",
"16. 32\n",
"\n",
"\n",
"\n",
"5. 1. 1\n",
"2. 1\n",
"3. 34\n",
"4. 7\n",
"5. 355700\n",
"6. 12\n",
"7. 9\n",
"8. 6\n",
"9. 2\n",
"10. 5\n",
"11. 5\n",
"12. 1\n",
"13. 0\n",
"14. 0\n",
"15. 20\n",
"16. 32\n",
"\n",
"\n",
"\n",
"6. 1. 1\n",
"2. 1\n",
"3. 23\n",
"4. 4\n",
"5. 231160\n",
"6. 16\n",
"7. 10\n",
"8. 5\n",
"9. 8\n",
"10. 4\n",
"11. 5\n",
"12. 2\n",
"13. 0\n",
"14. 0\n",
"15. 25\n",
"16. 32\n",
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"[[1]]\n",
" [1] 1 1 28 4 120135 9 11 5 12 2\n",
"[11] 5 1 0 0 40 32\n",
"\n",
"[[2]]\n",
" [1] 1 1 49 1 57665 10 13 1 1 4 5 1\n",
"[13] 0 0 40 32\n",
"\n",
"[[3]]\n",
" [1] 1 1 30 4 496414 11 16 5 10 2\n",
"[11] 5 2 0 0 40 1\n",
"\n",
"[[4]]\n",
" [1] 1 1 55 4 353881 6 4 3 14 1\n",
"[11] 5 2 0 0 50 32\n",
"\n",
"[[5]]\n",
" [1] 1 1 34 7 355700 12 9 6 2 5\n",
"[11] 5 1 0 0 20 32\n",
"\n",
"[[6]]\n",
" [1] 1 1 23 4 231160 16 10 5 8 4\n",
"[11] 5 2 0 0 25 32\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Transform categorical features to numeric.\n",
"for (i in cat_features)\n",
" data[,i] <- as.numeric(factor(data[,i]))\n",
"\n",
"target <- c(1)\n",
"data_matrix <- as.matrix(data)\n",
"pool <- catboost.load_pool(as.matrix(data[,-target]),\n",
" label = as.matrix(data[,target]),\n",
" cat_features = cat_features)\n",
"head(pool, 6)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### From data.fram\n",
"- 범주 형 기능은 factor로 변환되어야합니다 (as.factor (), read.table () 등의 colClasses 인수 사용). \n",
"- 숫자 형 식은 숫자 형으로 표현되어야합니다. 대상 기능은 숫자 형식으로 제공되어야합니다."
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"train_path = system.file(\"extdata\", \"adult_train.1000\", package=\"catboost\")\n",
"test_path = system.file(\"extdata\", \"adult_test.1000\", package=\"catboost\")\n",
"\n",
"column_description_vector = rep('numeric', 15)\n",
"cat_features <- c(3, 5, 7, 8, 9, 10, 11, 15)\n",
"for (i in cat_features)\n",
" column_description_vector[i] <- 'factor'"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"train <- read.table(train_path, head = F, sep = \"\\t\", colClasses = column_description_vector, na.strings='NAN')\n",
"test <- read.table(test_path, head = F, sep = \"\\t\", colClasses = column_description_vector, na.strings='NAN')\n",
"target <- c(1)"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\t- 1000
\n",
"\t- 15
\n",
"
\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item 1000\n",
"\\item 15\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. 1000\n",
"2. 15\n",
"\n",
"\n"
],
"text/plain": [
"[1] 1000 15"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"dim(train)"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"\n",
" -1 1 \n",
"500 500 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"table(train$V1)"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | V11 | V12 | V13 | V14 | V15 |
\n",
"\n",
"\t1 | 28 | Private | 120135 | Assoc-voc | 11 | Never-married | Sales | Not-in-family | White | Female | 0 | 0 | 40 | United-States |
\n",
"\t1 | 49 | ? | 57665 | Bachelors | 13 | Divorced | ? | Own-child | White | Female | 0 | 0 | 40 | United-States |
\n",
"\t1 | 30 | Private | 496414 | Doctorate | 16 | Never-married | Prof-specialty | Not-in-family | White | Male | 0 | 0 | 40 | ? |
\n",
"\t1 | 55 | Private | 353881 | 7th-8th | 4 | Married-civ-spouse | Transport-moving | Husband | White | Male | 0 | 0 | 50 | United-States |
\n",
"\t1 | 34 | State-gov | 355700 | HS-grad | 9 | Separated | Adm-clerical | Unmarried | White | Female | 0 | 0 | 20 | United-States |
\n",
"\t1 | 23 | Private | 231160 | Some-college | 10 | Never-married | Other-service | Own-child | White | Male | 0 | 0 | 25 | United-States |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllllllllllllll}\n",
" V1 & V2 & V3 & V4 & V5 & V6 & V7 & V8 & V9 & V10 & V11 & V12 & V13 & V14 & V15\\\\\n",
"\\hline\n",
"\t 1 & 28 & Private & 120135 & Assoc-voc & 11 & Never-married & Sales & Not-in-family & White & Female & 0 & 0 & 40 & United-States \\\\\n",
"\t 1 & 49 & ? & 57665 & Bachelors & 13 & Divorced & ? & Own-child & White & Female & 0 & 0 & 40 & United-States \\\\\n",
"\t 1 & 30 & Private & 496414 & Doctorate & 16 & Never-married & Prof-specialty & Not-in-family & White & Male & 0 & 0 & 40 & ? \\\\\n",
"\t 1 & 55 & Private & 353881 & 7th-8th & 4 & Married-civ-spouse & Transport-moving & Husband & White & Male & 0 & 0 & 50 & United-States \\\\\n",
"\t 1 & 34 & State-gov & 355700 & HS-grad & 9 & Separated & Adm-clerical & Unmarried & White & Female & 0 & 0 & 20 & United-States \\\\\n",
"\t 1 & 23 & Private & 231160 & Some-college & 10 & Never-married & Other-service & Own-child & White & Male & 0 & 0 & 25 & United-States \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | V11 | V12 | V13 | V14 | V15 | \n",
"|---|---|---|---|---|---|\n",
"| 1 | 28 | Private | 120135 | Assoc-voc | 11 | Never-married | Sales | Not-in-family | White | Female | 0 | 0 | 40 | United-States | \n",
"| 1 | 49 | ? | 57665 | Bachelors | 13 | Divorced | ? | Own-child | White | Female | 0 | 0 | 40 | United-States | \n",
"| 1 | 30 | Private | 496414 | Doctorate | 16 | Never-married | Prof-specialty | Not-in-family | White | Male | 0 | 0 | 40 | ? | \n",
"| 1 | 55 | Private | 353881 | 7th-8th | 4 | Married-civ-spouse | Transport-moving | Husband | White | Male | 0 | 0 | 50 | United-States | \n",
"| 1 | 34 | State-gov | 355700 | HS-grad | 9 | Separated | Adm-clerical | Unmarried | White | Female | 0 | 0 | 20 | United-States | \n",
"| 1 | 23 | Private | 231160 | Some-college | 10 | Never-married | Other-service | Own-child | White | Male | 0 | 0 | 25 | United-States | \n",
"\n",
"\n"
],
"text/plain": [
" V1 V2 V3 V4 V5 V6 V7 V8 \n",
"1 1 28 Private 120135 Assoc-voc 11 Never-married Sales \n",
"2 1 49 ? 57665 Bachelors 13 Divorced ? \n",
"3 1 30 Private 496414 Doctorate 16 Never-married Prof-specialty \n",
"4 1 55 Private 353881 7th-8th 4 Married-civ-spouse Transport-moving\n",
"5 1 34 State-gov 355700 HS-grad 9 Separated Adm-clerical \n",
"6 1 23 Private 231160 Some-college 10 Never-married Other-service \n",
" V9 V10 V11 V12 V13 V14 V15 \n",
"1 Not-in-family White Female 0 0 40 United-States\n",
"2 Own-child White Female 0 0 40 United-States\n",
"3 Not-in-family White Male 0 0 40 ? \n",
"4 Husband White Male 0 0 50 United-States\n",
"5 Unmarried White Female 0 0 20 United-States\n",
"6 Own-child White Male 0 0 25 United-States"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"head(train)"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\t\n",
"\t- 1
\n",
"\t- 1
\n",
"\t- 28
\n",
"\t- 3.89195830301646e+36
\n",
"\t- 120135
\n",
"\t- -1.04016785957128e-34
\n",
"\t- 11
\n",
"\t- 1.26145722489731e+32
\n",
"\t- -371032621056
\n",
"\t- 8.07870890394598e-34
\n",
"\t- -9.78215504232611e+30
\n",
"\t- -9.04798744184095e-38
\n",
"\t- 0
\n",
"\t- 0
\n",
"\t- 40
\n",
"\t- 1.21962493117738e+24
\n",
"
\n",
" \n",
"\t\n",
"\t- 1
\n",
"\t- 1
\n",
"\t- 49
\n",
"\t- -1.81610708810891e-18
\n",
"\t- 57665
\n",
"\t- 5.92781046506593e-19
\n",
"\t- 13
\n",
"\t- -1601.40100097656
\n",
"\t- -1.81610708810891e-18
\n",
"\t- -1.10627302684918e+32
\n",
"\t- -9.78215504232611e+30
\n",
"\t- -9.04798744184095e-38
\n",
"\t- 0
\n",
"\t- 0
\n",
"\t- 40
\n",
"\t- 1.21962493117738e+24
\n",
"
\n",
" \n",
"\t\n",
"\t- 1
\n",
"\t- 1
\n",
"\t- 30
\n",
"\t- 3.89195830301646e+36
\n",
"\t- 496414
\n",
"\t- -1.74022152049343e+23
\n",
"\t- 16
\n",
"\t- 1.26145722489731e+32
\n",
"\t- 1.14009086947129e-25
\n",
"\t- 8.07870890394598e-34
\n",
"\t- -9.78215504232611e+30
\n",
"\t- -3.16386135068569e-08
\n",
"\t- 0
\n",
"\t- 0
\n",
"\t- 40
\n",
"\t- -1.81610708810891e-18
\n",
"
\n",
" \n",
"\t\n",
"\t- 1
\n",
"\t- 1
\n",
"\t- 55
\n",
"\t- 3.89195830301646e+36
\n",
"\t- 353881
\n",
"\t- 9.32786036967877e+35
\n",
"\t- 4
\n",
"\t- -2.32643418585933e-34
\n",
"\t- -3.51661115686911e-30
\n",
"\t- 9.09455348664148e-37
\n",
"\t- -9.78215504232611e+30
\n",
"\t- -3.16386135068569e-08
\n",
"\t- 0
\n",
"\t- 0
\n",
"\t- 50
\n",
"\t- 1.21962493117738e+24
\n",
"
\n",
" \n",
"\t\n",
"\t- 1
\n",
"\t- 1
\n",
"\t- 34
\n",
"\t- -6.04938814187138e+22
\n",
"\t- 355700
\n",
"\t- 4.16527045605261e-18
\n",
"\t- 9
\n",
"\t- 4.24491702233354e-07
\n",
"\t- 3.7142509766905e-21
\n",
"\t- -0.00183462153654546
\n",
"\t- -9.78215504232611e+30
\n",
"\t- -9.04798744184095e-38
\n",
"\t- 0
\n",
"\t- 0
\n",
"\t- 20
\n",
"\t- 1.21962493117738e+24
\n",
"
\n",
" \n",
"
\n"
],
"text/latex": [
"\\begin{enumerate}\n",
"\\item \\begin{enumerate*}\n",
"\\item 1\n",
"\\item 1\n",
"\\item 28\n",
"\\item 3.89195830301646e+36\n",
"\\item 120135\n",
"\\item -1.04016785957128e-34\n",
"\\item 11\n",
"\\item 1.26145722489731e+32\n",
"\\item -371032621056\n",
"\\item 8.07870890394598e-34\n",
"\\item -9.78215504232611e+30\n",
"\\item -9.04798744184095e-38\n",
"\\item 0\n",
"\\item 0\n",
"\\item 40\n",
"\\item 1.21962493117738e+24\n",
"\\end{enumerate*}\n",
"\n",
"\\item \\begin{enumerate*}\n",
"\\item 1\n",
"\\item 1\n",
"\\item 49\n",
"\\item -1.81610708810891e-18\n",
"\\item 57665\n",
"\\item 5.92781046506593e-19\n",
"\\item 13\n",
"\\item -1601.40100097656\n",
"\\item -1.81610708810891e-18\n",
"\\item -1.10627302684918e+32\n",
"\\item -9.78215504232611e+30\n",
"\\item -9.04798744184095e-38\n",
"\\item 0\n",
"\\item 0\n",
"\\item 40\n",
"\\item 1.21962493117738e+24\n",
"\\end{enumerate*}\n",
"\n",
"\\item \\begin{enumerate*}\n",
"\\item 1\n",
"\\item 1\n",
"\\item 30\n",
"\\item 3.89195830301646e+36\n",
"\\item 496414\n",
"\\item -1.74022152049343e+23\n",
"\\item 16\n",
"\\item 1.26145722489731e+32\n",
"\\item 1.14009086947129e-25\n",
"\\item 8.07870890394598e-34\n",
"\\item -9.78215504232611e+30\n",
"\\item -3.16386135068569e-08\n",
"\\item 0\n",
"\\item 0\n",
"\\item 40\n",
"\\item -1.81610708810891e-18\n",
"\\end{enumerate*}\n",
"\n",
"\\item \\begin{enumerate*}\n",
"\\item 1\n",
"\\item 1\n",
"\\item 55\n",
"\\item 3.89195830301646e+36\n",
"\\item 353881\n",
"\\item 9.32786036967877e+35\n",
"\\item 4\n",
"\\item -2.32643418585933e-34\n",
"\\item -3.51661115686911e-30\n",
"\\item 9.09455348664148e-37\n",
"\\item -9.78215504232611e+30\n",
"\\item -3.16386135068569e-08\n",
"\\item 0\n",
"\\item 0\n",
"\\item 50\n",
"\\item 1.21962493117738e+24\n",
"\\end{enumerate*}\n",
"\n",
"\\item \\begin{enumerate*}\n",
"\\item 1\n",
"\\item 1\n",
"\\item 34\n",
"\\item -6.04938814187138e+22\n",
"\\item 355700\n",
"\\item 4.16527045605261e-18\n",
"\\item 9\n",
"\\item 4.24491702233354e-07\n",
"\\item 3.7142509766905e-21\n",
"\\item -0.00183462153654546\n",
"\\item -9.78215504232611e+30\n",
"\\item -9.04798744184095e-38\n",
"\\item 0\n",
"\\item 0\n",
"\\item 20\n",
"\\item 1.21962493117738e+24\n",
"\\end{enumerate*}\n",
"\n",
"\\end{enumerate}\n"
],
"text/markdown": [
"1. 1. 1\n",
"2. 1\n",
"3. 28\n",
"4. 3.89195830301646e+36\n",
"5. 120135\n",
"6. -1.04016785957128e-34\n",
"7. 11\n",
"8. 1.26145722489731e+32\n",
"9. -371032621056\n",
"10. 8.07870890394598e-34\n",
"11. -9.78215504232611e+30\n",
"12. -9.04798744184095e-38\n",
"13. 0\n",
"14. 0\n",
"15. 40\n",
"16. 1.21962493117738e+24\n",
"\n",
"\n",
"\n",
"2. 1. 1\n",
"2. 1\n",
"3. 49\n",
"4. -1.81610708810891e-18\n",
"5. 57665\n",
"6. 5.92781046506593e-19\n",
"7. 13\n",
"8. -1601.40100097656\n",
"9. -1.81610708810891e-18\n",
"10. -1.10627302684918e+32\n",
"11. -9.78215504232611e+30\n",
"12. -9.04798744184095e-38\n",
"13. 0\n",
"14. 0\n",
"15. 40\n",
"16. 1.21962493117738e+24\n",
"\n",
"\n",
"\n",
"3. 1. 1\n",
"2. 1\n",
"3. 30\n",
"4. 3.89195830301646e+36\n",
"5. 496414\n",
"6. -1.74022152049343e+23\n",
"7. 16\n",
"8. 1.26145722489731e+32\n",
"9. 1.14009086947129e-25\n",
"10. 8.07870890394598e-34\n",
"11. -9.78215504232611e+30\n",
"12. -3.16386135068569e-08\n",
"13. 0\n",
"14. 0\n",
"15. 40\n",
"16. -1.81610708810891e-18\n",
"\n",
"\n",
"\n",
"4. 1. 1\n",
"2. 1\n",
"3. 55\n",
"4. 3.89195830301646e+36\n",
"5. 353881\n",
"6. 9.32786036967877e+35\n",
"7. 4\n",
"8. -2.32643418585933e-34\n",
"9. -3.51661115686911e-30\n",
"10. 9.09455348664148e-37\n",
"11. -9.78215504232611e+30\n",
"12. -3.16386135068569e-08\n",
"13. 0\n",
"14. 0\n",
"15. 50\n",
"16. 1.21962493117738e+24\n",
"\n",
"\n",
"\n",
"5. 1. 1\n",
"2. 1\n",
"3. 34\n",
"4. -6.04938814187138e+22\n",
"5. 355700\n",
"6. 4.16527045605261e-18\n",
"7. 9\n",
"8. 4.24491702233354e-07\n",
"9. 3.7142509766905e-21\n",
"10. -0.00183462153654546\n",
"11. -9.78215504232611e+30\n",
"12. -9.04798744184095e-38\n",
"13. 0\n",
"14. 0\n",
"15. 20\n",
"16. 1.21962493117738e+24\n",
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"[[1]]\n",
" [1] 1.000000e+00 1.000000e+00 2.800000e+01 3.891958e+36 1.201350e+05\n",
" [6] -1.040168e-34 1.100000e+01 1.261457e+32 -3.710326e+11 8.078709e-34\n",
"[11] -9.782155e+30 -9.047987e-38 0.000000e+00 0.000000e+00 4.000000e+01\n",
"[16] 1.219625e+24\n",
"\n",
"[[2]]\n",
" [1] 1.000000e+00 1.000000e+00 4.900000e+01 -1.816107e-18 5.766500e+04\n",
" [6] 5.927810e-19 1.300000e+01 -1.601401e+03 -1.816107e-18 -1.106273e+32\n",
"[11] -9.782155e+30 -9.047987e-38 0.000000e+00 0.000000e+00 4.000000e+01\n",
"[16] 1.219625e+24\n",
"\n",
"[[3]]\n",
" [1] 1.000000e+00 1.000000e+00 3.000000e+01 3.891958e+36 4.964140e+05\n",
" [6] -1.740222e+23 1.600000e+01 1.261457e+32 1.140091e-25 8.078709e-34\n",
"[11] -9.782155e+30 -3.163861e-08 0.000000e+00 0.000000e+00 4.000000e+01\n",
"[16] -1.816107e-18\n",
"\n",
"[[4]]\n",
" [1] 1.000000e+00 1.000000e+00 5.500000e+01 3.891958e+36 3.538810e+05\n",
" [6] 9.327860e+35 4.000000e+00 -2.326434e-34 -3.516611e-30 9.094553e-37\n",
"[11] -9.782155e+30 -3.163861e-08 0.000000e+00 0.000000e+00 5.000000e+01\n",
"[16] 1.219625e+24\n",
"\n",
"[[5]]\n",
" [1] 1.000000e+00 1.000000e+00 3.400000e+01 -6.049388e+22 3.557000e+05\n",
" [6] 4.165270e-18 9.000000e+00 4.244917e-07 3.714251e-21 -1.834622e-03\n",
"[11] -9.782155e+30 -9.047987e-38 0.000000e+00 0.000000e+00 2.000000e+01\n",
"[16] 1.219625e+24\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"target <- c(1)\n",
"train_pool <- catboost.load_pool(data=train[,-target], label = train[,target])\n",
"test_pool <- catboost.load_pool(data=test[,-target], label = test[,target])\n",
"head(train_pool, 5)"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\t\n",
"\t- 1
\n",
"\t- 1
\n",
"\t- 73
\n",
"\t- -1220010961797120
\n",
"\t- 30958
\n",
"\t- -40904704
\n",
"\t- 10
\n",
"\t- -2.32643418585933e-34
\n",
"\t- -371032621056
\n",
"\t- 9.09455348664148e-37
\n",
"\t- -9.78215504232611e+30
\n",
"\t- -3.16386135068569e-08
\n",
"\t- 0
\n",
"\t- 0
\n",
"\t- 25
\n",
"\t- 1.21962493117738e+24
\n",
"
\n",
" \n",
"\t\n",
"\t- 1
\n",
"\t- 1
\n",
"\t- 33
\n",
"\t- 3.89195830301646e+36
\n",
"\t- 123031
\n",
"\t- 4.16527045605261e-18
\n",
"\t- 9
\n",
"\t- 1.49414066382737e-29
\n",
"\t- 3.7142509766905e-21
\n",
"\t- -0.00183462153654546
\n",
"\t- -5.7961030040131e+35
\n",
"\t- -3.16386135068569e-08
\n",
"\t- 0
\n",
"\t- 0
\n",
"\t- 40
\n",
"\t- 1.21962493117738e+24
\n",
"
\n",
" \n",
"\t\n",
"\t- 1
\n",
"\t- 1
\n",
"\t- 30
\n",
"\t- 3.89195830301646e+36
\n",
"\t- 159442
\n",
"\t- 4.16527045605261e-18
\n",
"\t- 9
\n",
"\t- 1.26145722489731e+32
\n",
"\t- 3.7142509766905e-21
\n",
"\t- 8.07870890394598e-34
\n",
"\t- -9.78215504232611e+30
\n",
"\t- -9.04798744184095e-38
\n",
"\t- 0
\n",
"\t- 0
\n",
"\t- 35
\n",
"\t- -7.60635120394048e+24
\n",
"
\n",
" \n",
"\t\n",
"\t- 1
\n",
"\t- 1
\n",
"\t- 30
\n",
"\t- 3.89195830301646e+36
\n",
"\t- 187279
\n",
"\t- 4.16527045605261e-18
\n",
"\t- 9
\n",
"\t- -2.32643418585933e-34
\n",
"\t- -1.06509231510206e-17
\n",
"\t- 9.09455348664148e-37
\n",
"\t- -9.78215504232611e+30
\n",
"\t- -3.16386135068569e-08
\n",
"\t- 0
\n",
"\t- 0
\n",
"\t- 44
\n",
"\t- 1.21962493117738e+24
\n",
"
\n",
" \n",
"\t\n",
"\t- 1
\n",
"\t- 1
\n",
"\t- 41
\n",
"\t- 3.89195830301646e+36
\n",
"\t- 112763
\n",
"\t- 4.16527045605261e-18
\n",
"\t- 9
\n",
"\t- -1601.40100097656
\n",
"\t- -1.06509231510206e-17
\n",
"\t- -1.10627302684918e+32
\n",
"\t- -9.78215504232611e+30
\n",
"\t- -9.04798744184095e-38
\n",
"\t- 2597
\n",
"\t- 0
\n",
"\t- 40
\n",
"\t- 1.21962493117738e+24
\n",
"
\n",
" \n",
"
\n"
],
"text/latex": [
"\\begin{enumerate}\n",
"\\item \\begin{enumerate*}\n",
"\\item 1\n",
"\\item 1\n",
"\\item 73\n",
"\\item -1220010961797120\n",
"\\item 30958\n",
"\\item -40904704\n",
"\\item 10\n",
"\\item -2.32643418585933e-34\n",
"\\item -371032621056\n",
"\\item 9.09455348664148e-37\n",
"\\item -9.78215504232611e+30\n",
"\\item -3.16386135068569e-08\n",
"\\item 0\n",
"\\item 0\n",
"\\item 25\n",
"\\item 1.21962493117738e+24\n",
"\\end{enumerate*}\n",
"\n",
"\\item \\begin{enumerate*}\n",
"\\item 1\n",
"\\item 1\n",
"\\item 33\n",
"\\item 3.89195830301646e+36\n",
"\\item 123031\n",
"\\item 4.16527045605261e-18\n",
"\\item 9\n",
"\\item 1.49414066382737e-29\n",
"\\item 3.7142509766905e-21\n",
"\\item -0.00183462153654546\n",
"\\item -5.7961030040131e+35\n",
"\\item -3.16386135068569e-08\n",
"\\item 0\n",
"\\item 0\n",
"\\item 40\n",
"\\item 1.21962493117738e+24\n",
"\\end{enumerate*}\n",
"\n",
"\\item \\begin{enumerate*}\n",
"\\item 1\n",
"\\item 1\n",
"\\item 30\n",
"\\item 3.89195830301646e+36\n",
"\\item 159442\n",
"\\item 4.16527045605261e-18\n",
"\\item 9\n",
"\\item 1.26145722489731e+32\n",
"\\item 3.7142509766905e-21\n",
"\\item 8.07870890394598e-34\n",
"\\item -9.78215504232611e+30\n",
"\\item -9.04798744184095e-38\n",
"\\item 0\n",
"\\item 0\n",
"\\item 35\n",
"\\item -7.60635120394048e+24\n",
"\\end{enumerate*}\n",
"\n",
"\\item \\begin{enumerate*}\n",
"\\item 1\n",
"\\item 1\n",
"\\item 30\n",
"\\item 3.89195830301646e+36\n",
"\\item 187279\n",
"\\item 4.16527045605261e-18\n",
"\\item 9\n",
"\\item -2.32643418585933e-34\n",
"\\item -1.06509231510206e-17\n",
"\\item 9.09455348664148e-37\n",
"\\item -9.78215504232611e+30\n",
"\\item -3.16386135068569e-08\n",
"\\item 0\n",
"\\item 0\n",
"\\item 44\n",
"\\item 1.21962493117738e+24\n",
"\\end{enumerate*}\n",
"\n",
"\\item \\begin{enumerate*}\n",
"\\item 1\n",
"\\item 1\n",
"\\item 41\n",
"\\item 3.89195830301646e+36\n",
"\\item 112763\n",
"\\item 4.16527045605261e-18\n",
"\\item 9\n",
"\\item -1601.40100097656\n",
"\\item -1.06509231510206e-17\n",
"\\item -1.10627302684918e+32\n",
"\\item -9.78215504232611e+30\n",
"\\item -9.04798744184095e-38\n",
"\\item 2597\n",
"\\item 0\n",
"\\item 40\n",
"\\item 1.21962493117738e+24\n",
"\\end{enumerate*}\n",
"\n",
"\\end{enumerate}\n"
],
"text/markdown": [
"1. 1. 1\n",
"2. 1\n",
"3. 73\n",
"4. -1220010961797120\n",
"5. 30958\n",
"6. -40904704\n",
"7. 10\n",
"8. -2.32643418585933e-34\n",
"9. -371032621056\n",
"10. 9.09455348664148e-37\n",
"11. -9.78215504232611e+30\n",
"12. -3.16386135068569e-08\n",
"13. 0\n",
"14. 0\n",
"15. 25\n",
"16. 1.21962493117738e+24\n",
"\n",
"\n",
"\n",
"2. 1. 1\n",
"2. 1\n",
"3. 33\n",
"4. 3.89195830301646e+36\n",
"5. 123031\n",
"6. 4.16527045605261e-18\n",
"7. 9\n",
"8. 1.49414066382737e-29\n",
"9. 3.7142509766905e-21\n",
"10. -0.00183462153654546\n",
"11. -5.7961030040131e+35\n",
"12. -3.16386135068569e-08\n",
"13. 0\n",
"14. 0\n",
"15. 40\n",
"16. 1.21962493117738e+24\n",
"\n",
"\n",
"\n",
"3. 1. 1\n",
"2. 1\n",
"3. 30\n",
"4. 3.89195830301646e+36\n",
"5. 159442\n",
"6. 4.16527045605261e-18\n",
"7. 9\n",
"8. 1.26145722489731e+32\n",
"9. 3.7142509766905e-21\n",
"10. 8.07870890394598e-34\n",
"11. -9.78215504232611e+30\n",
"12. -9.04798744184095e-38\n",
"13. 0\n",
"14. 0\n",
"15. 35\n",
"16. -7.60635120394048e+24\n",
"\n",
"\n",
"\n",
"4. 1. 1\n",
"2. 1\n",
"3. 30\n",
"4. 3.89195830301646e+36\n",
"5. 187279\n",
"6. 4.16527045605261e-18\n",
"7. 9\n",
"8. -2.32643418585933e-34\n",
"9. -1.06509231510206e-17\n",
"10. 9.09455348664148e-37\n",
"11. -9.78215504232611e+30\n",
"12. -3.16386135068569e-08\n",
"13. 0\n",
"14. 0\n",
"15. 44\n",
"16. 1.21962493117738e+24\n",
"\n",
"\n",
"\n",
"5. 1. 1\n",
"2. 1\n",
"3. 41\n",
"4. 3.89195830301646e+36\n",
"5. 112763\n",
"6. 4.16527045605261e-18\n",
"7. 9\n",
"8. -1601.40100097656\n",
"9. -1.06509231510206e-17\n",
"10. -1.10627302684918e+32\n",
"11. -9.78215504232611e+30\n",
"12. -9.04798744184095e-38\n",
"13. 2597\n",
"14. 0\n",
"15. 40\n",
"16. 1.21962493117738e+24\n",
"\n",
"\n",
"\n",
"\n",
"\n"
],
"text/plain": [
"[[1]]\n",
" [1] 1.000000e+00 1.000000e+00 7.300000e+01 -1.220011e+15 3.095800e+04\n",
" [6] -4.090470e+07 1.000000e+01 -2.326434e-34 -3.710326e+11 9.094553e-37\n",
"[11] -9.782155e+30 -3.163861e-08 0.000000e+00 0.000000e+00 2.500000e+01\n",
"[16] 1.219625e+24\n",
"\n",
"[[2]]\n",
" [1] 1.000000e+00 1.000000e+00 3.300000e+01 3.891958e+36 1.230310e+05\n",
" [6] 4.165270e-18 9.000000e+00 1.494141e-29 3.714251e-21 -1.834622e-03\n",
"[11] -5.796103e+35 -3.163861e-08 0.000000e+00 0.000000e+00 4.000000e+01\n",
"[16] 1.219625e+24\n",
"\n",
"[[3]]\n",
" [1] 1.000000e+00 1.000000e+00 3.000000e+01 3.891958e+36 1.594420e+05\n",
" [6] 4.165270e-18 9.000000e+00 1.261457e+32 3.714251e-21 8.078709e-34\n",
"[11] -9.782155e+30 -9.047987e-38 0.000000e+00 0.000000e+00 3.500000e+01\n",
"[16] -7.606351e+24\n",
"\n",
"[[4]]\n",
" [1] 1.000000e+00 1.000000e+00 3.000000e+01 3.891958e+36 1.872790e+05\n",
" [6] 4.165270e-18 9.000000e+00 -2.326434e-34 -1.065092e-17 9.094553e-37\n",
"[11] -9.782155e+30 -3.163861e-08 0.000000e+00 0.000000e+00 4.400000e+01\n",
"[16] 1.219625e+24\n",
"\n",
"[[5]]\n",
" [1] 1.000000e+00 1.000000e+00 4.100000e+01 3.891958e+36 1.127630e+05\n",
" [6] 4.165270e-18 9.000000e+00 -1.601401e+03 -1.065092e-17 -1.106273e+32\n",
"[11] -9.782155e+30 -9.047987e-38 2.597000e+03 0.000000e+00 4.000000e+01\n",
"[16] 1.219625e+24\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"head(test_pool, 5)"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Nrows: 1000 , Ncols: 14 \n",
"\n",
"First row: "
]
},
{
"data": {
"text/html": [
"\n",
"\t- 1
\n",
"\t- 1
\n",
"\t- 28
\n",
"\t- 3.89195830301646e+36
\n",
"\t- 120135
\n",
"\t- -1.04016785957128e-34
\n",
"\t- 11
\n",
"\t- 1.26145722489731e+32
\n",
"\t- -371032621056
\n",
"\t- 8.07870890394598e-34
\n",
"\t- -9.78215504232611e+30
\n",
"\t- -9.04798744184095e-38
\n",
"\t- 0
\n",
"\t- 0
\n",
"\t- 40
\n",
"\t- 1.21962493117738e+24
\n",
"
\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item 1\n",
"\\item 1\n",
"\\item 28\n",
"\\item 3.89195830301646e+36\n",
"\\item 120135\n",
"\\item -1.04016785957128e-34\n",
"\\item 11\n",
"\\item 1.26145722489731e+32\n",
"\\item -371032621056\n",
"\\item 8.07870890394598e-34\n",
"\\item -9.78215504232611e+30\n",
"\\item -9.04798744184095e-38\n",
"\\item 0\n",
"\\item 0\n",
"\\item 40\n",
"\\item 1.21962493117738e+24\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. 1\n",
"2. 1\n",
"3. 28\n",
"4. 3.89195830301646e+36\n",
"5. 120135\n",
"6. -1.04016785957128e-34\n",
"7. 11\n",
"8. 1.26145722489731e+32\n",
"9. -371032621056\n",
"10. 8.07870890394598e-34\n",
"11. -9.78215504232611e+30\n",
"12. -9.04798744184095e-38\n",
"13. 0\n",
"14. 0\n",
"15. 40\n",
"16. 1.21962493117738e+24\n",
"\n",
"\n"
],
"text/plain": [
" [1] 1.000000e+00 1.000000e+00 2.800000e+01 3.891958e+36 1.201350e+05\n",
" [6] -1.040168e-34 1.100000e+01 1.261457e+32 -3.710326e+11 8.078709e-34\n",
"[11] -9.782155e+30 -9.047987e-38 0.000000e+00 0.000000e+00 4.000000e+01\n",
"[16] 1.219625e+24"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Last row: "
]
},
{
"data": {
"text/html": [
"\n",
"\t- -1
\n",
"\t- 1
\n",
"\t- 71
\n",
"\t- -1.81610708810891e-18
\n",
"\t- 177906
\n",
"\t- 5.92781046506593e-19
\n",
"\t- 13
\n",
"\t- -2.32643418585933e-34
\n",
"\t- -1.81610708810891e-18
\n",
"\t- 9.09455348664148e-37
\n",
"\t- -9.78215504232611e+30
\n",
"\t- -3.16386135068569e-08
\n",
"\t- 0
\n",
"\t- 0
\n",
"\t- 10
\n",
"\t- 1.21962493117738e+24
\n",
"
\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item -1\n",
"\\item 1\n",
"\\item 71\n",
"\\item -1.81610708810891e-18\n",
"\\item 177906\n",
"\\item 5.92781046506593e-19\n",
"\\item 13\n",
"\\item -2.32643418585933e-34\n",
"\\item -1.81610708810891e-18\n",
"\\item 9.09455348664148e-37\n",
"\\item -9.78215504232611e+30\n",
"\\item -3.16386135068569e-08\n",
"\\item 0\n",
"\\item 0\n",
"\\item 10\n",
"\\item 1.21962493117738e+24\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. -1\n",
"2. 1\n",
"3. 71\n",
"4. -1.81610708810891e-18\n",
"5. 177906\n",
"6. 5.92781046506593e-19\n",
"7. 13\n",
"8. -2.32643418585933e-34\n",
"9. -1.81610708810891e-18\n",
"10. 9.09455348664148e-37\n",
"11. -9.78215504232611e+30\n",
"12. -3.16386135068569e-08\n",
"13. 0\n",
"14. 0\n",
"15. 10\n",
"16. 1.21962493117738e+24\n",
"\n",
"\n"
],
"text/plain": [
" [1] -1.000000e+00 1.000000e+00 7.100000e+01 -1.816107e-18 1.779060e+05\n",
" [6] 5.927810e-19 1.300000e+01 -2.326434e-34 -1.816107e-18 9.094553e-37\n",
"[11] -9.782155e+30 -3.163861e-08 0.000000e+00 0.000000e+00 1.000000e+01\n",
"[16] 1.219625e+24"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Column names: "
]
},
{
"data": {
"text/html": [
"\n",
"\t- 'V2'
\n",
"\t- 'V3'
\n",
"\t- 'V4'
\n",
"\t- 'V5'
\n",
"\t- 'V6'
\n",
"\t- 'V7'
\n",
"\t- 'V8'
\n",
"\t- 'V9'
\n",
"\t- 'V10'
\n",
"\t- 'V11'
\n",
"\t- 'V12'
\n",
"\t- 'V13'
\n",
"\t- 'V14'
\n",
"\t- 'V15'
\n",
"
\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item 'V2'\n",
"\\item 'V3'\n",
"\\item 'V4'\n",
"\\item 'V5'\n",
"\\item 'V6'\n",
"\\item 'V7'\n",
"\\item 'V8'\n",
"\\item 'V9'\n",
"\\item 'V10'\n",
"\\item 'V11'\n",
"\\item 'V12'\n",
"\\item 'V13'\n",
"\\item 'V14'\n",
"\\item 'V15'\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. 'V2'\n",
"2. 'V3'\n",
"3. 'V4'\n",
"4. 'V5'\n",
"5. 'V6'\n",
"6. 'V7'\n",
"7. 'V8'\n",
"8. 'V9'\n",
"9. 'V10'\n",
"10. 'V11'\n",
"11. 'V12'\n",
"12. 'V13'\n",
"13. 'V14'\n",
"14. 'V15'\n",
"\n",
"\n"
],
"text/plain": [
" [1] \"V2\" \"V3\" \"V4\" \"V5\" \"V6\" \"V7\" \"V8\" \"V9\" \"V10\" \"V11\" \"V12\" \"V13\"\n",
"[13] \"V14\" \"V15\""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# number of rows and colls\n",
"cat(\"Nrows: \", nrow(train_pool), \", Ncols: \", ncol(train_pool), \"\\n\")\n",
"# first rows of pool\n",
"cat(\"\\nFirst row: \")\n",
"head(train_pool, n = 1)[[1]]\n",
"cat(\"\\nLast row: \")\n",
"tail(train_pool, n = 1)[[1]]\n",
"cat(\"\\nColumn names: \")\n",
"colnames(train_pool)"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"fit_params <- list(iterations = 100,\n",
" thread_count = 10,\n",
" loss_function = 'Logloss',\n",
" ignored_features = c(4,9),\n",
" border_count = 32,\n",
" depth = 5,\n",
" learning_rate = 0.03,\n",
" l2_leaf_reg = 3.5,\n",
" border = 0.5,\n",
" train_dir = 'train_dir')\n",
"\n",
"#parameter tuning : https://tech.yandex.com/catboost/doc/dg/concepts/parameter-tuning-docpage/\n",
"model <- catboost.train(train_pool, test_pool, fit_params)"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"$handle\n",
"\n",
"\n",
"$tree_count\n",
"[1] 100\n",
"\n",
"attr(,\"class\")\n",
"[1] \"catboost.Model\""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"model"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Sample predictions: 0.5017934 0.57932 0.1432269 0.2339487 0.6170065 \n"
]
}
],
"source": [
"calc_accuracy <- function(prediction, expected) {\n",
" labels <- ifelse(prediction > 0.5, 1, -1)\n",
" accuracy <- sum(labels == expected) / length(labels)\n",
" return(accuracy)\n",
"}\n",
"\n",
"prediction <- catboost.predict(model, test_pool, prediction_type = 'Probability')\n",
"cat(\"Sample predictions: \", sample(prediction, 5), \"\\n\")"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
" \n",
"labels -1 1\n",
" 0 416 103\n",
" 1 84 397"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Accuracy: 0.813 \n",
"\n",
"Feature importances \n"
]
},
{
"data": {
"text/html": [
"\n",
"\t- V2
\n",
"\t\t- 7.51815126009209
\n",
"\t- V3
\n",
"\t\t- 0.801424775117198
\n",
"\t- V4
\n",
"\t\t- 0.225919455909302
\n",
"\t- V5
\n",
"\t\t- 13.2889492080106
\n",
"\t- V6
\n",
"\t\t- 0
\n",
"\t- V7
\n",
"\t\t- 24.652523633112
\n",
"\t- V8
\n",
"\t\t- 12.5094112649986
\n",
"\t- V9
\n",
"\t\t- 8.36512301356345
\n",
"\t- V10
\n",
"\t\t- 0.996459337062923
\n",
"\t- V11
\n",
"\t\t- 0
\n",
"\t- V12
\n",
"\t\t- 24.4416936685521
\n",
"\t- V13
\n",
"\t\t- 0.581038264067558
\n",
"\t- V14
\n",
"\t\t- 5.25896002757146
\n",
"\t- V15
\n",
"\t\t- 1.36034609194284
\n",
"
\n"
],
"text/latex": [
"\\begin{description*}\n",
"\\item[V2] 7.51815126009209\n",
"\\item[V3] 0.801424775117198\n",
"\\item[V4] 0.225919455909302\n",
"\\item[V5] 13.2889492080106\n",
"\\item[V6] 0\n",
"\\item[V7] 24.652523633112\n",
"\\item[V8] 12.5094112649986\n",
"\\item[V9] 8.36512301356345\n",
"\\item[V10] 0.996459337062923\n",
"\\item[V11] 0\n",
"\\item[V12] 24.4416936685521\n",
"\\item[V13] 0.581038264067558\n",
"\\item[V14] 5.25896002757146\n",
"\\item[V15] 1.36034609194284\n",
"\\end{description*}\n"
],
"text/markdown": [
"V2\n",
": 7.51815126009209V3\n",
": 0.801424775117198V4\n",
": 0.225919455909302V5\n",
": 13.2889492080106V6\n",
": 0V7\n",
": 24.652523633112V8\n",
": 12.5094112649986V9\n",
": 8.36512301356345V10\n",
": 0.996459337062923V11\n",
": 0V12\n",
": 24.4416936685521V13\n",
": 0.581038264067558V14\n",
": 5.25896002757146V15\n",
": 1.36034609194284\n",
"\n"
],
"text/plain": [
" V2 V3 V4 V5 V6 V7 V8 \n",
" 7.5181513 0.8014248 0.2259195 13.2889492 0.0000000 24.6525236 12.5094113 \n",
" V9 V10 V11 V12 V13 V14 V15 \n",
" 8.3651230 0.9964593 0.0000000 24.4416937 0.5810383 5.2589600 1.3603461 "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Tree count: 100 \n"
]
}
],
"source": [
"labels <- catboost.predict(model, test_pool, prediction_type = 'Class')\n",
"table(labels, test[,target])\n",
"\n",
"# works properly only for Logloss\n",
"accuracy <- calc_accuracy(prediction, test[,target])\n",
"cat(\"\\nAccuracy: \", accuracy, \"\\n\")\n",
"\n",
"# feature splits importances (not finished)\n",
"\n",
"cat(\"\\nFeature importances\", \"\\n\")\n",
"catboost.get_feature_importance(model, train_pool)\n",
"\n",
"cat(\"\\nTree count: \", model$tree_count, \"\\n\")"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"TRUE \n",
"TRUE"
]
}
],
"source": [
"library(iterators)\n",
"staged_predictions <- catboost.staged_predict(model, test_pool, ntree_start = 2, ntree_end = 5,\n",
" eval_period = 2, prediction_type = 'Probability')\n",
"\n",
"staged_prediction_2_4 = nextElem(staged_predictions) # 2nd and 3rd trees\n",
"staged_prediction_2_5 = nextElem(staged_predictions) # 2nd, 3rd and 4th trees\n",
"\n",
"prediction_2_4 = catboost.predict(model, test_pool, ntree_start = 2, ntree_end = 4, prediction_type = 'Probability')\n",
"prediction_2_5 = catboost.predict(model, test_pool, ntree_start = 2, ntree_end = 5, prediction_type = 'Probability')\n",
"cat(all(prediction_2_4 == staged_prediction_2_4), '\\n')\n",
"cat(all(prediction_2_5 == staged_prediction_2_5))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Usefull features\n",
"- 근본적으로 유효성 검사 세트가 있는 경우 보다 신속한 훈련을 위해 오버피팅(overfitting) 탐지를 사용하는 것이 항상 쉽고 좋습니다.\n",
"- od_wait : 최적의 손실 함수 값으로 반복 한 후에도 훈련을 계속할 수있는 반복 횟수입니다."
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Simple model tree count: 500 \n",
"Model with od tree count: 224 \n"
]
}
],
"source": [
"params_simple <- list(iterations = 500,\n",
" loss_function = 'Logloss',\n",
" train_dir = 'train_dir')\n",
"model_simple <- catboost.train(train_pool, test_pool, params_simple)\n",
"\n",
"params_with_od <- list(iterations = 500,\n",
" loss_function = 'Logloss',\n",
" train_dir = 'train_dir',\n",
" od_type = 'Iter',\n",
" od_wait = 30)\n",
"model_with_od <- catboost.train(train_pool, test_pool, params_with_od)\n",
"\n",
"cat('Simple model tree count: ', model_simple$tree_count, '\\n')\n",
"cat('Model with od tree count: ', model_with_od$tree_count, '\\n')"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Simple model accuracy: 0.805 \n",
"The best model accuracy: 0.816 \n"
]
}
],
"source": [
"params_simple <- list(iterations = 1000,\n",
" loss_function = 'Logloss',\n",
" train_dir = 'train_dir')\n",
"model_simple <- catboost.train(train_pool, test_pool, params_simple)\n",
"\n",
"params_best <- list(iterations = 1000,\n",
" loss_function = 'Logloss',\n",
" train_dir = 'train_dir',\n",
" use_best_model = TRUE)\n",
"model_best <- catboost.train(train_pool, test_pool, params_best)\n",
"\n",
"prediction_simple <- catboost.predict(model_simple, test_pool, prediction_type = 'Probability')\n",
"prediction_best <- catboost.predict(model_best, test_pool, prediction_type = 'Probability')\n",
"\n",
"cat('Simple model accuracy: ', calc_accuracy(prediction_simple, test[,target]), '\\n')\n",
"cat('The best model accuracy: ', calc_accuracy(prediction_best, test[,target]), '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Catboosting with caret\n",
"- Load and preprocess the Titanic dataset"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"set.seed(12345)\n",
"\n",
"data <- as.data.frame(as.matrix(titanic_train), stringsAsFactors=TRUE)\n",
"\n",
"age_levels <- levels(data$Age)\n",
"most_frequent_age <- which.max(table(data$Age))\n",
"data$Age[is.na(data$Age)] <- age_levels[most_frequent_age]\n",
"\n",
"drop_columns = c(\"PassengerId\", \"Survived\", \"Name\", \"Ticket\", \"Cabin\")\n",
"x <- data[,!(names(data) %in% drop_columns)]\n",
"y <- data[,c(\"Survived\")]"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"Pclass | Sex | Age | SibSp | Parch | Fare | Embarked |
\n",
"\n",
"\t3 | male | 22.00 | 1 | 0 | 7.2500 | S |
\n",
"\t1 | female | 38.00 | 1 | 0 | 71.2833 | C |
\n",
"\t3 | female | 26.00 | 0 | 0 | 7.9250 | S |
\n",
"\t1 | female | 35.00 | 1 | 0 | 53.1000 | S |
\n",
"\t3 | male | 35.00 | 0 | 0 | 8.0500 | S |
\n",
"\t3 | male | 24.00 | 0 | 0 | 8.4583 | Q |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|lllllll}\n",
" Pclass & Sex & Age & SibSp & Parch & Fare & Embarked\\\\\n",
"\\hline\n",
"\t 3 & male & 22.00 & 1 & 0 & 7.2500 & S \\\\\n",
"\t 1 & female & 38.00 & 1 & 0 & 71.2833 & C \\\\\n",
"\t 3 & female & 26.00 & 0 & 0 & 7.9250 & S \\\\\n",
"\t 1 & female & 35.00 & 1 & 0 & 53.1000 & S \\\\\n",
"\t 3 & male & 35.00 & 0 & 0 & 8.0500 & S \\\\\n",
"\t 3 & male & 24.00 & 0 & 0 & 8.4583 & Q \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"Pclass | Sex | Age | SibSp | Parch | Fare | Embarked | \n",
"|---|---|---|---|---|---|\n",
"| 3 | male | 22.00 | 1 | 0 | 7.2500 | S | \n",
"| 1 | female | 38.00 | 1 | 0 | 71.2833 | C | \n",
"| 3 | female | 26.00 | 0 | 0 | 7.9250 | S | \n",
"| 1 | female | 35.00 | 1 | 0 | 53.1000 | S | \n",
"| 3 | male | 35.00 | 0 | 0 | 8.0500 | S | \n",
"| 3 | male | 24.00 | 0 | 0 | 8.4583 | Q | \n",
"\n",
"\n"
],
"text/plain": [
" Pclass Sex Age SibSp Parch Fare Embarked\n",
"1 3 male 22.00 1 0 7.2500 S \n",
"2 1 female 38.00 1 0 71.2833 C \n",
"3 3 female 26.00 0 0 7.9250 S \n",
"4 1 female 35.00 1 0 53.1000 S \n",
"5 3 male 35.00 0 0 8.0500 S \n",
"6 3 male 24.00 0 0 8.4583 Q "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"head(x)"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"'data.frame':\t891 obs. of 7 variables:\n",
" $ Pclass : Factor w/ 3 levels \"1\",\"2\",\"3\": 3 1 3 1 3 3 1 3 3 2 ...\n",
" $ Sex : Factor w/ 2 levels \"female\",\"male\": 2 1 1 1 2 2 2 2 1 1 ...\n",
" $ Age : Factor w/ 88 levels \" 0.42\",\" 0.67\",..: 29 52 35 48 48 32 70 7 36 19 ...\n",
" $ SibSp : Factor w/ 7 levels \"0\",\"1\",\"2\",\"3\",..: 2 2 1 2 1 1 1 4 1 2 ...\n",
" $ Parch : Factor w/ 7 levels \"0\",\"1\",\"2\",\"3\",..: 1 1 1 1 1 1 1 2 3 1 ...\n",
" $ Fare : Factor w/ 248 levels \" 0.0000\",\" 4.0125\",..: 19 208 42 190 44 52 187 125 75 155 ...\n",
" $ Embarked: Factor w/ 4 levels \"\",\"C\",\"Q\",\"S\": 4 2 4 4 4 3 4 4 4 2 ...\n"
]
}
],
"source": [
"str(x)"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\t- 0
\n",
"\t- 1
\n",
"\t- 1
\n",
"\t- 1
\n",
"\t- 0
\n",
"\t- 0
\n",
"
\n"
],
"text/latex": [
"\\begin{enumerate*}\n",
"\\item 0\n",
"\\item 1\n",
"\\item 1\n",
"\\item 1\n",
"\\item 0\n",
"\\item 0\n",
"\\end{enumerate*}\n"
],
"text/markdown": [
"1. 0\n",
"2. 1\n",
"3. 1\n",
"4. 1\n",
"5. 0\n",
"6. 0\n",
"\n",
"\n"
],
"text/plain": [
"[1] 0 1 1 1 0 0\n",
"Levels: 0 1"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"head(y)"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"fit_control <- trainControl(method = \"cv\",\n",
" number = 5,\n",
" classProbs = TRUE)\n",
"\n",
"grid <- expand.grid(depth = c(4, 6, 8),\n",
" learning_rate = 0.1,\n",
" iterations = 100,\n",
" l2_leaf_reg = 0.1,\n",
" rsm = 0.95,\n",
" border_count = 64)\n",
"\n",
"model <- train(x, as.factor(make.names(y)),\n",
" method = catboost.caret,\n",
" verbose = FALSE, preProc = NULL,\n",
" tuneGrid = grid, trControl = fit_control)"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Catboost \n",
"\n",
"891 samples\n",
" 7 predictor\n",
" 2 classes: 'X0', 'X1' \n",
"\n",
"No pre-processing\n",
"Resampling: Cross-Validated (5 fold) \n",
"Summary of sample sizes: 714, 712, 713, 713, 712 \n",
"Resampling results across tuning parameters:\n",
"\n",
" depth Accuracy Kappa \n",
" 4 0.8046929 0.5725103\n",
" 6 0.8001922 0.5602606\n",
" 8 0.8136693 0.5945060\n",
"\n",
"Tuning parameter 'learning_rate' was held constant at a value of 0.1\n",
"\n",
"Tuning parameter 'rsm' was held constant at a value of 0.95\n",
"Tuning\n",
" parameter 'border_count' was held constant at a value of 64\n",
"Accuracy was used to select the optimal model using the largest value.\n",
"The final values used for the model were depth = 8, learning_rate =\n",
" 0.1, iterations = 100, l2_leaf_reg = 0.1, rsm = 0.95 and border_count = 64.\n",
"custom variable importance\n",
"\n",
" Overall\n",
"Fare 18.825\n",
"Age 18.711\n",
"Pclass 17.359\n",
"Sex 16.098\n",
"SibSp 11.023\n",
"Parch 9.566\n",
"Embarked 8.419\n"
]
}
],
"source": [
"print(model)\n",
"\n",
"importance <- varImp(model, scale = FALSE)\n",
"print(importance)"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"X0 | X1 |
\n",
"\n",
"\t0.940814734 | 0.05918527 |
\n",
"\t0.004609430 | 0.99539057 |
\n",
"\t0.257634360 | 0.74236564 |
\n",
"\t0.005060246 | 0.99493975 |
\n",
"\t0.975453030 | 0.02454697 |
\n",
"\t0.986188723 | 0.01381128 |
\n",
"\n",
"
\n"
],
"text/latex": [
"\\begin{tabular}{r|ll}\n",
" X0 & X1\\\\\n",
"\\hline\n",
"\t 0.940814734 & 0.05918527 \\\\\n",
"\t 0.004609430 & 0.99539057 \\\\\n",
"\t 0.257634360 & 0.74236564 \\\\\n",
"\t 0.005060246 & 0.99493975 \\\\\n",
"\t 0.975453030 & 0.02454697 \\\\\n",
"\t 0.986188723 & 0.01381128 \\\\\n",
"\\end{tabular}\n"
],
"text/markdown": [
"\n",
"X0 | X1 | \n",
"|---|---|---|---|---|---|\n",
"| 0.940814734 | 0.05918527 | \n",
"| 0.004609430 | 0.99539057 | \n",
"| 0.257634360 | 0.74236564 | \n",
"| 0.005060246 | 0.99493975 | \n",
"| 0.975453030 | 0.02454697 | \n",
"| 0.986188723 | 0.01381128 | \n",
"\n",
"\n"
],
"text/plain": [
" X0 X1 \n",
"1 0.940814734 0.05918527\n",
"2 0.004609430 0.99539057\n",
"3 0.257634360 0.74236564\n",
"4 0.005060246 0.99493975\n",
"5 0.975453030 0.02454697\n",
"6 0.986188723 0.01381128"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"head(predict(model, type = 'prob'))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "R",
"language": "R",
"name": "ir"
},
"language_info": {
"codemirror_mode": "r",
"file_extension": ".r",
"mimetype": "text/x-r-source",
"name": "R",
"pygments_lexer": "r",
"version": "3.3.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}