数据挖掘算法学习笔记汇总
数据挖掘算法(一)–K近邻算法 (KNN)
数据挖掘算法(二)–决策树
数据挖掘算法(三)–logistic回归

在介绍logistic回归之前先复习几个基础知识点,有助于后面的理解。

基本数学知识点

1、对数似然函数

若总体X为离散型,其概率分布列为

P(X=x)=p(x,θ)
<script type="math/tex; mode=display" id="MathJax-Element-1"> P(X=x)=p(x,\theta) </script> 其中 θ <script type="math/tex" id="MathJax-Element-2">\theta</script>为未知参数。设 (X1,X2,...,Xn) <script type="math/tex" id="MathJax-Element-3"> (X_1,X_2,...,X_n) </script> 是取自总体样本容量为n的样本,则 (X1,X2,...,Xn) <script type="math/tex" id="MathJax-Element-4">(X_1,X_2,...,X_n)</script>的联合概率分布率为
i=1np(xi,θ)
<script type="math/tex; mode=display" id="MathJax-Element-5">\prod_{i=1}^{n}p(x_i, \theta) </script>
又设 (X1,X2,...,Xn) <script type="math/tex" id="MathJax-Element-6"> (X_1,X_2,...,X_n) </script>的一组观测值为 (x1,x2,...,xn) <script type="math/tex" id="MathJax-Element-7"> (x_1,x_2,...,x_n) </script>,易知样本 X1,X2,...,Xn <script type="math/tex" id="MathJax-Element-8"> X_1,X_2,...,X_n </script>取到观测值 x1,x2,...,xn <script type="math/tex" id="MathJax-Element-9"> x_1,x_2,...,x_n </script> 的概率为
L(θ)=L(x1,x2,...,xn;θ)=i=1np(xi,θ)
<script type="math/tex; mode=display" id="MathJax-Element-10">L(\theta)=L(x_1,x_2,...,x_n;\theta)=\prod_{i=1}^{n}p(x_i, \theta) </script>这一概率随 θ <script type="math/tex" id="MathJax-Element-11">\theta</script> 的取值而变化,它是 θ <script type="math/tex" id="MathJax-Element-12">\theta</script> 的函数,称 L(θ) <script type="math/tex" id="MathJax-Element-13">L(\theta)</script> 为样本的似然函数。但是由于来连乘的函数处理起来比较麻烦,所以对 L(θ) <script type="math/tex" id="MathJax-Element-14">L(\theta)</script> 取自然对数变成加法来处理要简单点。
lnL(θ)=i=1nlnp(xi,θ)
<script type="math/tex; mode=display" id="MathJax-Element-15">lnL(\theta)=\sum_{i=1}^{n}lnp(x_i, \theta) </script>

2、logistic函数

logistic函数或logistic曲线是常见的“S”形(sigmoid curve ,S形曲线),方程式如下:

f(x)=L1+ek(xx0)
<script type="math/tex; mode=display" id="MathJax-Element-16">f(x)=\frac{L}{1+e^{-k(x-x_0)}}</script>
其中

  • e <script type="math/tex" id="MathJax-Element-17">e</script>自然对数
  • x0<script type="math/tex" id="MathJax-Element-18">x_0</script> S形中点的x值
  • L <script type="math/tex" id="MathJax-Element-19">L</script>曲线的 最大值
  • k<script type="math/tex" id="MathJax-Element-20">k</script>曲线的陡度
    这里写图片描述
    上图是 L=1,k=1,x0=0 <script type="math/tex" id="MathJax-Element-21">L=1,k=1,x_0=0</script>时的图像
    这里主要说明下这个函数的导数的性质,后面推导的时候会用到。
    f(x)=11+ex=ex1+ex
    <script type="math/tex; mode=display" id="MathJax-Element-22">f(x)=\frac{1}{1+e^{-x}}=\frac{e^{x}}{1+e^{x}}</script>
    ddxf(x)=ex(1+ex)exex(1+ex)2
    <script type="math/tex; mode=display" id="MathJax-Element-23">\frac{d}{dx}f(x)=\frac{e^{x} (1+e^{x})-e^{x} e^{x}}{(1+e^{x})^2}</script>
    ddxf(x)=ex(1+ex)2=f(x)(1f(x))
    <script type="math/tex; mode=display" id="MathJax-Element-24">\frac{d}{dx}f(x)=\frac{e^{x}}{(1+e^{x})^2}=f(x)(1-f(x))</script>

logistic回归数学推导

先看一个简单的例子:
这里写图片描述
我们将平面上的点分为两类,中间的红色线条为边界。
预测类别 y=1 <script type="math/tex" id="MathJax-Element-25">y=1</script> 如果 3+x1+x20 <script type="math/tex" id="MathJax-Element-26">-3+x_1+x_2\geq0</script>预测类别 y=0 <script type="math/tex" id="MathJax-Element-27">y=0</script> 如果 3+x1+x2<0 <script type="math/tex" id="MathJax-Element-28">-3+x_1+x_2 < 0</script>
此例子中

hθ(x)=g(θ0+θ1x1+θ2x2)
<script type="math/tex; mode=display" id="MathJax-Element-29">h_{\theta}(x)=g(\theta_0+\theta_1x_1+\theta_2x_2)</script>

对更多维的数据进行分类时,线性边界的情况,边界形式如下:

θ1x1+θ2x2+...+θnxn=θTx
<script type="math/tex; mode=display" id="MathJax-Element-30">\theta_1x_1+\theta_2x_2+...+\theta_nx_n=\theta^Tx</script>
根据logistic回归可知预测函数为:
hθ(x(i)=g(θTxi)=11+eθTxi
<script type="math/tex; mode=display" id="MathJax-Element-31">h_{\theta}(x^{(i)})=g(\theta^Tx^{i})=\frac{1}{1+e^{-\theta^Tx^{i}}}</script>
hθ(x(i) <script type="math/tex" id="MathJax-Element-32">h_{\theta}(x^{(i)}</script>函数的值有特殊的含义,它表示结果取1的概率,因此对于输入x分类结果为类别1和类别0的概率分别为:
P(y=1|x;θ)=hθ(x(i)
<script type="math/tex; mode=display" id="MathJax-Element-33">P(y=1|x;\theta)=h_{\theta}(x^{(i)}</script>
P(y=0|x;θ)=1hθ(x(i)
<script type="math/tex; mode=display" id="MathJax-Element-34">P(y=0|x;\theta)=1-h_{\theta}(x^{(i)}</script>
合起来写则可以得到下式:
P(y|xθ)=(hθ(x))y(1hθ(x))1y
<script type="math/tex; mode=display" id="MathJax-Element-35">P(y|x;\theta)=(h_\theta(x))^{y}(1-h_\theta(x))^{1-y}</script>
取似然函数得到下式:
L(θ)=i=1mP(y(i)|x(i),θ)
<script type="math/tex; mode=display" id="MathJax-Element-36">L(\theta)=\prod_{i=1}^{m}P(y^{(i)}|x^{(i)},\theta)</script>
求自然对数得到对数似然函数:
l(θ)=lnL(θ)
<script type="math/tex; mode=display" id="MathJax-Element-37">l(\theta)=ln L(\theta)</script>
=i=1m(y(i)lnhθ(x(i))+(1y(i))ln(1hθ(x(i))))
<script type="math/tex; mode=display" id="MathJax-Element-38">=\sum_{i=1}^{m}(y^{(i)}ln h_{\theta}(x^{(i)})+(1-y^{(i)})ln (1-h_{\theta}(x^{(i)})))</script>
最大似然估计就是要求得使 l(θ) <script type="math/tex" id="MathJax-Element-39">l(\theta)</script>取最大值时的 θ <script type="math/tex" id="MathJax-Element-40">\theta</script>,利用梯度上升法求解,求得的 θ <script type="math/tex" id="MathJax-Element-41">\theta</script>就是要求的最佳参数。下面是利用梯度上升法求解过程。
求利用梯度上升法求解 l(θ) <script type="math/tex" id="MathJax-Element-42">l(\theta)</script>的最大值时,根据梯度上升法知道 θ <script type="math/tex" id="MathJax-Element-43">\theta</script>的更新公式如下:
θj:=θj+αθjl(θ)    (j=0...n)
<script type="math/tex; mode=display" id="MathJax-Element-44">\theta_{j} := \theta_{j} + \alpha \frac {\partial }{\partial \theta_{j} }l(\theta) \space \space\space\space(j = 0 ... n)</script>
下面先求出 l(θ) <script type="math/tex" id="MathJax-Element-45">l(\theta)</script>的偏导数:
θjl(θ)=i=1m((y(i)1hθ(x(i))θjhθ(x(i))(1y(i))11hθ(x(i))θjhθ(x(i))
<script type="math/tex; mode=display" id="MathJax-Element-46">\frac {\partial }{\partial \theta_{j} }l(\theta)= \sum _{i=1}^{m}((y^{(i)}\frac {1}{h_{\theta}(x^{(i)})}\frac{\partial }{\partial \theta_{j} }h_{\theta}(x^{(i)})-(1-y^{(i)})\frac{1}{1-h_{\theta}(x^{(i)})}\frac{\partial }{\partial \theta_{j} }h_{\theta}(x^{(i)})</script>

=i=1m((y(i)1g(θTx(i))(1y(i))11g(θTx(i)))θjg(θTx(i))
<script type="math/tex; mode=display" id="MathJax-Element-47">= \sum _{i=1}^{m}((y^{(i)}\frac {1}{g(\theta^Tx^{(i)})}-(1-y^{(i)})\frac{1}{1-g(\theta^Tx^{(i)})})\frac{\partial }{\partial \theta_{j} }g(\theta^Tx^{(i)})</script>

因为 g(θTxi) <script type="math/tex" id="MathJax-Element-48">g(\theta^Tx^{i})</script>是logistic函数

g(θTxi)=11+eθTxi
<script type="math/tex; mode=display" id="MathJax-Element-49">g(\theta^Tx^{i})=\frac{1}{1+e^{-\theta^Tx^{i}}}</script>
所以我们利用前面讲的logistic函数的导数性质可以将 l(θ) <script type="math/tex" id="MathJax-Element-50">l(\theta)</script>的偏导数转化
θjl(θ)=i=1m((y(i)1g(θTx(i))(1y(i))11g(θTx(i)))g(θTx(i))(1g(θTx(i)))θjθTx(i)
<script type="math/tex; mode=display" id="MathJax-Element-51">\frac {\partial }{\partial \theta_{j} }l(\theta)=\sum _{i=1}^{m}((y^{(i)}\frac {1}{g(\theta^Tx^{(i)})}-(1-y^{(i)})\frac{1}{1-g(\theta^Tx^{(i)})}) g(\theta^Tx^{(i)})(1-g(\theta^Tx^{(i)})) \frac{\partial }{\partial \theta_{j} }\theta^Tx^{(i)}</script>

=i=1m(y(i)(1g(θTx(i)))(1y(i))g(θTx(i)))x(i)j
<script type="math/tex; mode=display" id="MathJax-Element-52">=\sum _{i=1}^{m}(y^{(i)}(1-g(\theta^Tx^{(i)}))-(1-y^{(i)})g(\theta^Tx^{(i)}) )x_{j}^{(i)}</script>

=i=1m(y(i)g(θTx(i)))x(i)j
<script type="math/tex; mode=display" id="MathJax-Element-53">=\sum _{i=1}^{m}(y^{(i)}-g(\theta^Tx^{(i)}))x_{j}^{(i)}</script>
=i=1m(y(i)hθ(x(i)))x(i)j
<script type="math/tex; mode=display" id="MathJax-Element-54">=\sum _{i=1}^{m}(y^{(i)}-h_{\theta}(x^{(i)}))x_{j}^{(i)}</script>

这样就得到了更新的过程

θj:=θj+αi=1m(y(i)hθ(x(i)))x(i)j    (j=0...n)
<script type="math/tex; mode=display" id="MathJax-Element-55">\theta_{j} := \theta_{j} + \alpha \sum _{i=1}^{m}(y^{(i)}-h_{\theta}(x^{(i)}))x_{j}^{(i)} \space \space\space\space(j = 0 ... n)</script>

python代码实现

本文代码运行环境:
python:3.5.1
pandas:0.19.2
其他环境可能有细微差别

# -*coding:utf-8*-
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import math

# 获取数据
data = pd.read_table("./logistic.txt", sep="\t", header=None)
dataMat = data.iloc[:, 0:-1]
labelMat = data.iloc[:, -1]


def sigmoid(dataSeries):
    return 1.0 / (1 + np.exp(-dataSeries))

# 梯度上升算法
def gradAscent(dataMatrix, LabelsVector):
    n = dataMatrix.shape[1]
    alpha = 0.001
    maxCycles = 500
    thetas = np.ones((n, 1))
    for k in range(maxCycles):  # heavy on matrix operations
        h = sigmoid(dataMatrix * thetas)  # matrix mult
        error = LabelsVector.T - h  # vector subtraction
        thetas = thetas + alpha * dataMatrix.T * error  # matrix mult
    return thetas


def plotBestFit(thetas, data):
    """    
    :param thetas: type DataFrame , the thetas 
    :param data: type DtaFrame , all the data
    :return: 
    """
    X1 = data[data[3] == 0]
    X2 = data[data[3] == 1]
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.scatter(X1[1], X1[2], s=30, c='red', marker='s')
    ax.scatter(X2[1], X2[2], s=30, c='green')
    x = np.arange(-3.0, 3.0, 0.1)
    y = (-thetas.iloc[0, 0] - thetas.iloc[1, 0] * x) / thetas.iloc[2, 0]
    ax.plot(x, y)
    plt.xlabel('X1')
    plt.ylabel('X2')
    plt.show()

thetas = gradAscent(np.mat(dataMat), np.mat(labelMat))
plotBestFit(pd.DataFrame(thetas), data)

画出的图如下所示:
这里写图片描述

代码和数据下载地址:链接:http://pan.baidu.com/s/1hs6CKL2 密码:308l

参考资料

1、https://en.wikipedia.org/wiki/Maximum_likelihood_estimation
2、https://en.wikipedia.org/wiki/Logistic_function

欢迎python爱好者加入:学习交流群 667279387

Logo

永洪科技,致力于打造全球领先的数据技术厂商,具备从数据应用方案咨询、BI、AIGC智能分析、数字孪生、数据资产、数据治理、数据实施的端到端大数据价值服务能力。

更多推荐