0%

超参调整与特征工程在机器学习中几乎同等重要。
继翻译的一本特征工程的书籍之后,对于参数调整是否也有方法论可以记录呢?

本文将记录一些自己在调参中的心得,不一定全部正确,随时更新。是为记录。

整体步骤应该:
先用少量数据进行参数大致区间锁定
注意学习率的影响,避免模型一下就跳出了搜索空间
分析训练/测试数据评测标准差距,判断是过拟合还是欠拟合
如果是过拟合则添加正则化或者剪枝
如果是欠拟合则增加迭代次数、模型规模或复杂度
如果在少量数据拟合效果比较好,则增加数据重复上述动作

Bagging与Boost调参

Bagging代表

Random Forest
Extra Trees

Boost代表

LightGBM
xgboost
catboost

调参差异

本文将不会对他们的构成做过多解释。
这两类模型的各自代表的参数调整大同小异,但是两类之间差异较大。每个算法的具体参数内容应当参考调用的模型包解释文档。

Bagging是并行思想,Boost是串行思想。

所以在训练的时候Bagging是互不相关的若干树,Boost是相关的一串树。

因此Boost的剪枝思想比较弱,主要通过正则化来避免过拟合。

Bagging则是先训练所有树,然后通过类似剪枝的操作来避免过拟合。

线性代数:
线性讲的是关系。
代数讲的是形式。

比如还有非线性代数,那是什么呢?

学堂上为什么要把行列式放在矩阵前面讲呢?讲了一堆的计算法则与矩阵的生态却是容易混淆的。
为什么不先讲矩阵及其生态,讲行列式作为一种特殊的计算形式进行讲解推进呢?
是因为难易程度吗?

I want to install AML to explore its capabilities,to gauge whether it will help my machine learning works.
I follow the official installation steps.
But I am unfavorable.Repeated failed many times and got the following log(There are a lot of logs, I only intercepted some of them that seem to be related to the reason for the failure.):

......
2018/7/13 2:37:56 30: Installer [Information] - 0: Executing command: /Users/liguang/Library/Caches/AmlWorkbench/Python/bin/python -s -E -m conda install --no-deps --yes --force --offline "/private/tmp/AmlInstaller/six.macos-1.11.0/six-1.11.0-py35_1.tar.bz2"
2018/7/13 2:37:58 33: Installer [Information] - 0: Returned exit code 1
2018/7/13 2:37:58 33: Installer [Information] - 0: Output: [02:37:58] StandardError: An unexpected error has occurred.
[02:37:58] StandardError: Please consider posting the following information to the
[02:37:58] StandardError: conda GitHub issue tracker at:
[02:37:58] StandardError:
[02:37:58] StandardError: https://github.com/conda/conda/issues
[02:37:58] StandardError:
[02:37:58] StandardError:
[02:37:58] StandardError:
[02:37:58] StandardError: Current conda install:
[02:37:58] StandardError:
[02:37:58] StandardError: platform : osx-64
[02:37:58] StandardError: conda version : 4.3.27
[02:37:58] StandardError: conda is private : False
[02:37:58] StandardError: conda-env version : 4.3.27
[02:37:58] StandardError: conda-build version : not installed
[02:37:58] StandardError: python version : 3.5.2.final.0
[02:37:58] StandardError: requests version : 2.11.1
[02:37:58] StandardError: root environment : /Users/liguang/Library/Caches/AmlWorkbench/Python (writable)
[02:37:58] StandardError: default environment : /Users/liguang/Library/Caches/AmlWorkbench/Python
[02:37:58] StandardError: envs directories : /Users/liguang/Library/Caches/AmlWorkbench/Python/envs
[02:37:58] StandardError: /Users/liguang/.conda/envs
[02:37:58] StandardError: package cache : /Users/liguang/Library/Caches/AmlWorkbench/Python/pkgs
[02:37:58] StandardError: /Users/liguang/.conda/pkgs
[02:37:58] StandardError: channel URLs : https://repo.continuum.io/pkgs/main/osx-64 (offline)
[02:37:58] StandardError: https://repo.continuum.io/pkgs/main/noarch (offline)
[02:37:58] StandardError: https://repo.continuum.io/pkgs/free/osx-64 (offline)
[02:37:58] StandardError: https://repo.continuum.io/pkgs/free/noarch (offline)
[02:37:58] StandardError: https://repo.continuum.io/pkgs/r/osx-64 (offline)
[02:37:58] StandardError: https://repo.continuum.io/pkgs/r/noarch (offline)
[02:37:58] StandardError: https://repo.continuum.io/pkgs/pro/osx-64 (offline)
[02:37:58] StandardError: https://repo.continuum.io/pkgs/pro/noarch (offline)
[02:37:58] StandardError: config file : None
[02:37:58] StandardError: netrc file : None
[02:37:58] StandardError: offline mode : True
[02:37:58] StandardError: user-agent : conda/4.3.27 requests/2.11.1 CPython/3.5.2 Darwin/17.6.0 OSX/10.13.5
[02:37:58] StandardError: UID:GID : 501:20
[02:37:58] StandardError:
[02:37:58] StandardError: $ /Users/liguang/Library/Caches/AmlWorkbench/Python/lib/python3.5/site-packages/conda/__main__.py install --no-deps --yes --force --offline /private/tmp/AmlInstaller/six.macos-1.11.0/six-1.11.0-py35_1.tar.bz2
[02:37:58] StandardError:
......

In the case of web search solutions, I did not actually see a solution that directly helped.I once wanted to give up.And, in fact, I don’t know what the wrong keyword is, it’s frustrating.

Fortunately, I insisted on it again.
I went back to read the log again and felt that the reason for the failure might be related to conda.
With a try attitude:

  • manually installed anaconda3.
  • then, install AML again

This is a success for me, I don’t know why, just try to succeed.Maybe not for others, because I don’t know why.

In addition, after installing anaconda3, it seems that some of the original python packages disappear, such as xgboost,LightGBM,tensorflow, etc. I also don’t know why, just install them again.+_+

正则化

加入正则化因子对高项进行降权,以避免过拟合。

因为高项总是倾向于尽可能的拟合训练数据而导致过拟合现象。

$θ_0$不参与正则化。