402com永利平台|402com永利1站|55.402com永利网址

您的位置:402com永利平台 > 科学研究 > 利用神经网络降低数据维度,网络的数据降维和

利用神经网络降低数据维度,网络的数据降维和

2019-08-27 08:50

402com永利1站 1

 

1. 数据维度

PCA 主成分分析
principle component analysis
PCA是一套全面应用于各类数据分析的分析方法,包括特征集压缩feature set compression
每当进行数据可视化的时候,都可以应用主成分分析

二维数据

402com永利1站 2

image.png

一维数据

402com永利1站 3

image.png

402com永利1站 4

image.png

并非严格一维数据,某些地方会出现一些偏差,但是为了理解这些数据,我乐意将这些偏差信息看成是干扰信息,并将其看作是一维的数据集:

402com永利1站 5

image.png

PCA特别擅长处理坐标系的移位和旋转

【读书笔记】基于Autoencoder 网络的数据降维和重构

通过训练具有小型中心层的多层神经网络来重构高维输入向量,可以将高维数据转换为低维编码。

6. 用于数据转换的PCA

如果你拥有的任何形状的数据,
PCA finds a new coordinate system that's obtained from the old one by translation and rotation only

PCA moves the center of the coordinate system with the center of the data

PCA move the x-axis into the principle axis of variation ,where you see the most variation relative to all the data points

PCA move the y-axis down the road into a orthogonal less important directions of variation

主成分分析为你找到这些轴,并告诉你这些轴的重要性

分类:DL2013-07-20 18:181655人阅读评论(1)收藏;)举报

High-dimensional data can be converted tolow-dimensional codes by training a multilayer neural network with a smallcentral layer to reconstruct high-dimensional input vectors.

High-dimensional data can be converted to low-dimensional codes by training a multilayer neural

7. 新坐标系的中心

(2,3)

402com永利1站 6

image.png

是一篇学术文章,我在学习autoencoder的时候找到的,看了看。主要是想弄清楚两个问题:1. autoencoder与RBM的区别;2. autoencoder最后训练出的结果怎么用。

在本文的“自编码”网络中可以使用梯度下降来精准地调节权值,但前提是初始权值必须足够靠近优值解。

network with a small central layer to reconstruct high-dimensional input vectors. Gradient descent

8. 新坐标系的主轴

402com永利1站 7

image.png

△x=1
△y=1

文章基本信息:

Gradient descent can be used forfine-tuning the weights in such ‘‘autoencoder’’ networks, but this works wellonly if the initial weights are close to a good solution.

can be used for fine-tuning the weights in such ‘‘autoencoder’’ networks, but this works well only if

9.新系统的第二主成分

402com永利1站 8

image.png

△x=-1
△y=1

在PCA分析法中书写向量时,最低输出向量值被规定为1

归一化 PCA 成分向量后,
△x(黑)= 根号 2 分之一
△y(黑)= 根号 2 分之一 #新的x轴
新的x轴和新的y轴所属的向量是正交的
△x(红)= 负根号 2 分之一
△y(红)= 根号 2 分之一 #402com永利1站,新的y轴

题目:基于Autoencoder 网络的数据降维和重构

我们描述了一种初始化权值的有效方法,这种方法允许深度自动编码器网络学习低维编码,这种编码作为一种降低数据维度的工具,其性能优于主成分分析。

the initial weights are close to a good solution. We describe an effective way of initializing the

11. 练习查找新轴

通过PCA还可以得出一个重要值,那就是轴的散布值 spread value
如果散布率较小,那么散布值对于主轴来说倾向于是一个很大的值,而对于第二主成分轴来说则小很多

作者:胡昭华

We describe an effective way ofinitializing the weights that allows deep autoencoder networks to learnlow-dimensional codes that work much better than principal components analysisas a tool to reduce the dimensionality of data.

weights that allows deep autoencoder networks to learn low-dimensional codes that work much

12. 哪些数据可用于PCA

402com永利1站 9

image.png

Part of the beauty of PCA is that the data doesn't have to be perfectly 1D in order to find the principal axis!

单位:南京理工大学电子工程与光电技术学院

降低维度有助于高维数据的分类、可视化、通信和存储。

better than principal components analysis as a tool to reduce the dimensionality of data.

13. 轴何时占主导地位

长轴是否占优势
所谓长轴占优势是指轴的重要值importance value,或者说长轴特征值要大于短轴的特征值

时间:2009,发表于《电子与信息学报》

Dimensionality reduction facilitates theclassification, visualization, communication, and storage of high-dimensionaldata.

 

14.可测量的特征与潜在的特征练习

给定一些房屋的参数,如果想预测它的价格,该使用以下那个算法呢?
□ 决策树分类器
□ SVC
□ √线性回归
因为我们预期的输出是连续性的,所以使用分类器是不合适的

笔记如下:

一种简单而广泛使用的方法是主成分分析,它在数据集中找到最大方差的方向,并用每个方向上的坐标来表示每个数据点。

 

15. 从四个特征到两个

给定一些房屋的参数,预测它的价格

可衡量特征:
square footage
no. of rooms
school ranking
neighborhood safety

潜在特征
size
neighborhood

  1. autoencoder是用来降维的

  2. 现在的降维方法

A simple and widely used method isprincipal components analysis , which finds the directions of greatestvariance in the data set and represents each data point by its coordinatesalong each of these directions.

D

16. 在保留信息的同时压缩

将四项特征压缩为两项,以便我们能真正获得核心的信息的最好方法是什么?
我们实际要调查的是size neighborhood这两个特征
哪个是最合适的选择参数的工具?
□ SelectKBest(K 为要保留的特征数量)
□ √ SelectPercentile 指定你希望保留的特征的百分比
因为我们已知希望得到两个特征,所以使用SelectKBest,它将保留最强大的两个特征,并抛弃除此之外的所有其他特征

如果我们知道本来有多少个可选特征,也知道最后需要多少个特征,那么也可以使用 SelectPercentile

线性降维方法如主成分分析(principal component analysis) 、独立分量分析和因子分析(factor analysis) 。在高维数据集具有线性结构和高斯分布时能有好的效果。当数据集在高维空间呈现高度扭曲时,这些方法则难以发现嵌入在数据集中的非线性结构以及恢复内在的结构。autoencoder能处理好以上问题。

论文及完整源码下载地址:

imensionality reduction facilitates the

17.复合特征

我有很多特征可以使用,但是假设只有一小部分特征在驱动数据模式,然后我将根据这个找出一个复合特征,以便弄清楚潜在的现象
这里的复合特征/组合特征,也被称为主要成分principle component ,是一个非常强大的算法,本课中,我们主要在特征降维的情况中讨论它,降低特征的维度,从而将一大堆特征缩减至几个特征
PCA也是非监督学习中一种非常强大的独立算法

402com永利1站 10

image.png

例子:将square footage no.room 转化成size
上图看上去有些像线性回归,但是PCA并不是线性回归,线性回归的目的是预测与输入值相对应的输出值,而PCA不是要预测任何值,而是算出数据的大致方向,使得我们的数据能够在尽可能少地损失信息的同时映射在该方向上

在我找到了主成分,也就是这个向量的方向后,我会对所有的数据点进行一个处理,这里称为映射,数据最初是二维的,但是在我把它映射到主成分上后,它就变成了一维数据

  1. autoencoder原理

classification, visualization, communi-

18. 最大方差

variance

  • the willingness/flexibility of an algorithm to learn
  • technical term in statistics -- roughly the 'spread' of a data distribution(similar to standard deviation)
    对于具有较大方差的特征,它的样本散布的值范围极大,若方差较小,则各个特征通常是紧密聚集在一起
![](https://upload-images.jianshu.io/upload_images/9246628-571662a8fdd7ede5.png)

image.png

在上图中,在数据周边画一个椭圆,使得椭圆内包含大部分数据,这个椭圆可以用两个数字的参数来表示,即椭圆的长轴距离和短轴距离,那么在这两条线中,哪一条线所指的方向是数据的最大方差?即哪一个方向上的数据更为分散?

长轴的线是数据最大方差的方向

编码网络属于降维部分,作用是将高维原始数据降到具有一定维数的低维嵌套结构上;解码网络属于重构部分,可视为编码网络的逆过程,作用是将低维嵌套上的点还原成高维数据。编码网络与解码网络之间还存在一个交叉部分,称之为“码字层”(code layer) ,是整个自编码网络的核心,能够反映具有嵌套结构的高维数据集的本质规律,并确定高维数据集的本质维数。

更多精彩文章请关注微信号:402com永利1站 11

cation, and storage of high-dimensional

19. 最大方差的优点

principal component of a data set is the direction that has the largest variance because ?

402com永利1站 12

image.png

why do you think we define the principle component this way?
what's the advantage of looking for the direction that has the largest variance?
when we are doing our project of these two dimension feature space down on to one dimension,why do we project all the data points down onto this heavy red line instead of projecting them onto this shorter line?
□ 计算复杂度低
□ √可以最大程度保留来自原始数据的信息量
□ 只是一种惯例,并没有什么实际的原因

当我们沿着最大方差的维度进行映射时,它能够保留原始数据中最多的信息

自编码网络的工作原理如下:首先初始化编码和解码两个网络的权值,然后按照原始训练数据与重构数据之间误差最小化的原则对自编码网络进行训练。如果自编码网络的初始权值接近最优解,运用梯度下降法则能达到很好的训练结果。Hinton 和Salakhutdinov使用了一种称为限制玻耳兹曼机(Restricted Boltzmann Machine, RBM)的两层网络来求取自编码网络的适当初始权值,然后用BP来训练autoencoder。对于输入是实数向量的情况来说,需要用CRBM (continous RBM)。

data. A simple and widely used method is

20. 最大方差与信息损失

safety problems school ranking →(PCA) neighborhood quality
find the direction of maximal variance
最大方差的方向就是将信息的损失减到最小的方向

402com永利1站 13

image.png

当我将这些二维的点投射到这条一维的线上时,就会丢失信息,丢失的信息量等于某个特定的点与它在这条线上的新位置之间的距离

  1. 实验

principal components analysis (PCA), which

21. 信息损失和主成分

信息丢失:各个点与其在该线上的新特征上新投影的点之间的距离总和

当我们将方差最大化的同时,我们实际上是将点与其在该线上的投影之间的距离最小化
projection onto direction of maximal variance minimizes distance from old(higher-dimensional) data point to its new transformed value
→ minimizes information loss

用图像数据来做的试验。主要是看哪个模型对数据还原的好。没看懂。

 

23. 用于特征转换的 PCA

PCA as a general algorithm for feature transformation
我们将所有这四个特征一起放入PCA中,它可以自动将这些特征结合成新的特征,并对这些新特征的相对能力划分等级,如果我们的案例中有两个隐藏特征推动数据中大部分变化,那么PCA将选出这些特征,并将其作为第一和第二主成分,第一个主成分即影响最大的特征。
由于第一个主成分是混合产生的,可能包含所有特征或多或少的元素,但是该非监督学习算法非常强大,可以帮助你从根本上了解数据中的隐藏特征,如果你对房价一无所知,PCA仍可让你获得自己的见解,如总体上有两个因素推动房价的变动,至于这两个因素是不是neighborhoodsize,则取决于你自己,现在除了进行降维操作,你还会了解到有关数据变化的模式的重要信息

  1. 其他

finds the directions of greatest variance in the

25. PCA 的回顾/定义

review/definition of PCA

  • systematized way to transform input features into principal component
  • use principal components as new features in regression/classification
  • you can also rank the principle components,the more variance you have of the data along a given principal component,the higher that principal component is ranked.so the one that has the most variance will be the first principal component,second will be the second principal component,and so on .
  • the principal components are all perpendicular to each other in a sense,so the second principal component is mathematically guaranteed to not overlap at all with the first principal component,and the third will not overlap with the first through the second ,and so on.so you can treat them as independent features in a sense.
  • there is a maximum number of principal components you can find,it's equal to the number of input features that you had in you data set.usually, you'll only use the first handful of principal components,but you could go all the way out and use the maximum number,in that case though,you are not really gaining anything,you're just representing your features in a different way,so the PCA won't give you the wrong answer,but it doesn't give you any advantages over just using the original input features if you're using all of the principal components together in a regression or classification task.

这篇文章回答了我第一个问题“autoencoder与RBM的区别”,但第二个问题没有答案。

data set and represents each data point by its

26. 将 PCA 应用到实际数据

在以下几段视频中,Katie 和 Sebastian 研究安然的一些财务数据,并着眼于 PCA 的应用。

请记住,要获得包含项目代码的版本库以及此数据集,请访问以下网址:

https://github.com/udacity/ud120-projects

安然数据位于:final_project/

coordinates along each of these directions. We

28. sklearn 中的 PCA

def doPCA():
    from sklearn.decomposition import PCA
    pca = PCA(n_components = 2)
    pca.fit(data)
    returen pca

pca = doPCA()
print pca.explained_variance_ratio_  #方差比,是特征值的具体表现形式,可以了解第一/二个主成分占数据变动的百分比
first_pc = pca.components_[0]
second_pc = pca.components_[1]

transformed_data = pca.transform(data)
for ii,jj in zip(transformed_data,data):
    plt.scatter(first_pc[0]**ii[0],first_pc[1]**ii[0],color='r')
    plt.scatter(second_pc[0]**ii[1],second_pc[1]**ii[1],color='c')
    plt.scatter(jj[0],jj[1],color='b')

describe a nonlinear generalization of PCA that

29.何时使用 PCA

  • latent features driving the patterns in data(big shots at Enron)
    if you want to access to latent features that you think might be showing up in the patterns in your data,maybe the entire point of what you're trying to do is figure out if there's a latent feature,in other words,you just want to know the size of the first principal components,then measure who the big shots are at Enron.
  • dimensionality reduction
    -- visualize high dimensional data
    sometimes you will have more than two features,you have to represent three or four or many numbers about a data point if you only have two dimensions in which to draw ,and so what you can do is project it down to the first two principal components and just plot that,and just draw that scatter plot.
    -- reduce noise
    the hope is that the first or the second,your strongest principal components are capturing the actual patterns in the data,and the smaller principle components are just representing noisy variations about those patterns,so by throwing away the less important principle components,you're getting rid of that noise.
    -- make other algorithms(regression,classification) work better with fewer inputs(eigenfaces)
    using PCA as pre-processing before you use another algorithm,so a regression or a classification task,if you have very high dimensionality, and if you have a complex,say,classification algorithm,the algorithm can be very high variance,it can end up fitting to noise in the data,it can end up running really low,there are lots of things that can happen when you have very high input dimensionality with some of these algorithms,but, of course,the algorithm might work really well for the problem at hand,so one of the things you can do is use PCA to reduce the dimensionality of your input features,so that then your,say classification algorithm works better.
    in the example of eigenfaces,a method of applying PCA to pictures of people,this is a very high dimensionality space,you have many many pixels in the picture,but say,you want to identify who is pictured in the image,you are running some kind of facial identification,so with PCA you can reduce the very high input dimensionality into something that's maybe a factor of ten lower,and feed this into SVM,which can then do the actual classification of trying to figure out who's pictured,so now the inputs ,instead of being the original pixels or the images,are the principal components.

uses an adaptive, multilayer encoder  network to

30. 用于人脸识别的PCA

PCA for facial recognition
what makes facial recognition in pictures good for PCA?
□ √pictures of faces generally have high input dimensionality (many pixels)
人脸照片通常有很高的输入维度(很多像素)
在这种情况下,缩减非常重要,因为SVM很难处理一百万个特征
□ √faces have general patterns that could be captured in smaller number of dimensions(two eyes on top,mouth /chin on bottom,etc.)
人脸具有一些一般性形态,这些形态可以以较小维数的方式捕捉,比如人一般都有两只眼睛,眼睛基本都位于接近脸的顶部的位置
在两张头像中,并不是一百万个像素点都存在差异,而是只有几个主要的差异点,我们或许可以用PCA挑选出这些点,并让它们发挥最大用处
□ ×facial recognition is simple using machine learning(humans do it easily)
使用机器学习技术,人脸识别是非常容易的(因为人类可以轻易做到)
很难用决策树来实现人脸识别

transform the high-dimensional data into a

31. 特征脸方法代码

在人脸识别中,结合使用PCA和SVM是很强大的

"""
===================================================
Faces recognition example using eigenfaces and SVMs
===================================================

The dataset used in this example is a preprocessed excerpt of the
"Labeled Faces in the Wild", aka LFW_:

  http://vis-www.cs.umass.edu/lfw/lfw-funneled.tgz (233MB)

  .. _LFW: http://vis-www.cs.umass.edu/lfw/

  original source: http://scikit-learn.org/stable/auto_examples/applications/face_recognition.html

"""
print __doc__

from time import time
import logging
import pylab as pl
import numpy as np

from sklearn.cross_validation import train_test_split
from sklearn.datasets import fetch_lfw_people
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.decomposition import RandomizedPCA
from sklearn.svm import SVC

# Display progress logs on stdout
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')
###############################################################################
# Download the data, if not already on disk and load it as numpy arrays
lfw_people = fetch_lfw_people(min_faces_per_person=70, resize=0.4)

# introspect the images arrays to find the shapes (for plotting)
n_samples, h, w = lfw_people.images.shape
np.random.seed(42)

# for machine learning we use the data directly (as relative pixel
# position info is ignored by this model)
X = lfw_people.data
n_features = X.shape[1]

# the label to predict is the id of the person
y = lfw_people.target
target_names = lfw_people.target_names
n_classes = target_names.shape[0]

print "Total dataset size:"
print "n_samples: %d" % n_samples
print "n_features: %d" % n_features
print "n_classes: %d" % n_classes


###############################################################################
# Split into a training and testing set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

###############################################################################
# Compute a PCA (eigenfaces) on the face dataset (treated as unlabeled
# dataset): unsupervised feature extraction / dimensionality reduction
n_components = 150

print "Extracting the top %d eigenfaces from %d faces" % (n_components, X_train.shape[0])
t0 = time()
pca = RandomizedPCA(n_components=n_components, whiten=True).fit(X_train) #figuring out what the principle components are 
print "the raio is ", pca.explained_variance_ratio_ #每个主成分的可释方差  0.19346534  0.15116844
print "done in %0.3fs" % (time() - t0)

eigenfaces = pca.components_.reshape((n_components, h, w)) #asks for the eigenfaces

print "Projecting the input data on the eigenfaces orthonormal basis"
t0 = time()
X_train_pca = pca.transform(X_train) #transform data into the principle components representation 
X_test_pca = pca.transform(X_test)
print "done in %0.3fs" % (time() - t0)


###############################################################################
# Train a SVM classification model

print "Fitting the classifier to the training set"
t0 = time()
param_grid = {
         'C': [1e3, 5e3, 1e4, 5e4, 1e5],
          'gamma': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.1],
          }
# for sklearn version 0.16 or prior, the class_weight parameter value is 'auto'
clf = GridSearchCV(SVC(kernel='rbf', class_weight='balanced'), param_grid)
clf = clf.fit(X_train_pca, y_train)  #SVC using the principle components as the features
print "done in %0.3fs" % (time() - t0)
print "Best estimator found by grid search:"
print clf.best_estimator_


###############################################################################
# Quantitative evaluation of the model quality on the test set

print "Predicting the people names on the testing set"
t0 = time()
y_pred = clf.predict(X_test_pca) #SVC try to identify in the test set who appears in a given picture.
print "done in %0.3fs" % (time() - t0)

print classification_report(y_test, y_pred, target_names=target_names)
print confusion_matrix(y_test, y_pred, labels=range(n_classes))


###############################################################################
# Qualitative evaluation of the predictions using matplotlib

def plot_gallery(images, titles, h, w, n_row=3, n_col=4):
    """Helper function to plot a gallery of portraits"""
    pl.figure(figsize=(1.8 * n_col, 2.4 * n_row))
    pl.subplots_adjust(bottom=0, left=.01, right=.99, top=.90, hspace=.35)
    for i in range(n_row * n_col):
        pl.subplot(n_row, n_col, i   1)
        pl.imshow(images[i].reshape((h, w)), cmap=pl.cm.gray)
        pl.title(titles[i], size=12)
        pl.xticks(())
        pl.yticks(())


# plot the result of the prediction on a portion of the test set

def title(y_pred, y_test, target_names, i):
    pred_name = target_names[y_pred[i]].rsplit(' ', 1)[-1]
    true_name = target_names[y_test[i]].rsplit(' ', 1)[-1]
    return 'predicted: %sntrue:      %s' % (pred_name, true_name)

prediction_titles = [title(y_pred, y_test, target_names, i)
                         for i in range(y_pred.shape[0])]

plot_gallery(X_test, prediction_titles, h, w)

# plot the gallery of the most significative eigenfaces

eigenface_titles = ["eigenface %d" % i for i in range(eigenfaces.shape[0])]
plot_gallery(eigenfaces, eigenface_titles, h, w)

pl.show()

The eigenfaces are basically the principle components of the face data.

402com永利1站 14

image.png

at last ,the algorithm will show you the eigenfaces.
在SVM中,将PCA产生的合成图像用作特征,在预测图片中的脸的身份时非常有用

low-dimensional code and a similar decoder

33. PCA 迷你项目

我们在讨论 PCA 时花费了大量时间来探讨理论问题,因此,在此迷你项目中,我们将要求你写一些 sklearn 代码。特征脸方法代码很有趣,而且内容丰富,足以胜任这一整个迷你项目的试验平台。

可在 pca/eigenfaces.py 中找到初始代码。此代码主要取自此处 sklearn 文档中的示例。

请注意,在运行代码时,对于在 pca/eigenfaces.py 的第 94 行调用的 SVC 函数,有一个参数有改变。对于“class_weight”参数,参数字符串“auto”对于 sklearn 版本 0.16 和更早版本是有效值,但将被 0.19 舍弃。如果运行 sklearn 版本 0.17 或更高版本,预期的参数字符串应为“balanced”。如果在运行 pca/eigenfaces.py 时收到错误或警告,请确保第 98 行包含与你安装的 sklearn 版本匹配的正确参数。

sklearn 0.16或更早版本 class_weight='auto'
sklearn 0.16或更高版本 class_weight='balanced'

network to recover the data from the code.

34.每个主成分的可释方差

我们提到 PCA 会对主成分进行排序,第一个主成分具有最大方差,第二个主成分 具有第二大方差,依此类推。第一个主成分可以解释多少方差?第二个呢?

print "the raio is ", pca.explained_variance_ratio_  #每个主成分的可释方差  0.19346534  0.15116844

第一主成分解释了多少变异量? 0.19346534
第二主成分呢? 0.15116844

我们发现,有时 Pillow 模块(本例中使用的)可能会造成麻烦。如果你收到与 fetch_lfw_people() 命令相关的错误,请尝试以下命令:

pip install --upgrade PILLOW

 

35.要使用多少个主成分?

现在你将尝试保留不同数量的主成分。在类似这样的多类分类问题中(要应用两个以上标签),准确性这个指标不像在两个类的情形中那么直观。相反,更常用的指标是 F1 分数f1-score
我们将在评估指标课程中学习 F1 分数f1-score,但你自己要弄清楚好的分类器的特点是具有高 F1 分数f1-score还是低 F1 分数f1-score。你将通过改变主成分数量并观察 F1 分数f1-score如何相应地变化来确定。
将更多主成分添加为特征以便训练分类器时,你是希望它的性能更好还是更差?
as you add more principal components as features for training your classifier,do you expect it to get better or worse performance?
□ √ could go either way
While ideally, adding components should provide us additional signal to improve our performance, it is possible that we end up at a complexity where we overfit.

402com永利1站 15

36. F1 分数与使用的主成分数

将 n_components 更改为以下值:[10, 15, 25, 50, 100, 250]。对于每个主成分,请注意 Ariel Sharon 的 F1 分数。(对于 10 个主成分,代码中的绘制功能将会失效,但你应该能够看到 F1 分数。)
如果看到较高的 F1 分数,这意味着分类器的表现是更好还是更差?

Ariel Sharon f-score
n_components = 150 f-score=0.65
n_components = 10 f-score=0.11
n_components = 15 f-score=0.33
n_components = 50 f-score=0.67
n_components = 100 f-score=0.67
n_components = 250 f-score=0.62

if you see a higher f1-score ,dose it mean the classifier is doing better,or worse?
□ √ better

 

37. 维度降低与过拟合

在使用大量主成分时,是否看到过拟合的任何证据?PCA 维度降低是否有助于提高性能?
did you see any evidence of overfitting when using a large number of PCs?
□ √ yes,performance starts to drop with many PCs.

 

38. 选择主成分

selecting a number of principle components
think about selecting how many principle components you should look at.
there is no cut and dry answer for how many principle components you should use,you kind of have to figure it out

what's a good way to figure out how many PCs to use?
□ × just take top 10%
□ √train on different number of PCs,and see how accuracy responds-cut off when it becomes apparent that adding more PCs doesn't by you much more discrimination
□ × perform feature selection on input features before putting them into PCA,then use as many PCs as you have input features.
PCA is going to find a way to combine information from potentially many different input features together,so if you are throwing out input features before you do PCA,you are throwing information that PCA might be able to kind of rescue in a sense.it's fine to do feature selection on the principle components after you have make them,but you want to be very careful about throwing out information before performing PCA.
PCA can be fairly computationally expensive,so if you have a very large input feature space and you know that a lot of them are potentially completely irrelevant features. go ahead and try tossing them out,but proceed with caution.

 

 

Fig. 1.Pretraining consists of learning a stack of restricted Boltzmann machines (RBMs), eachhaving only one layer of feature detectors. Thelearned feature activations of one RBM are usedas the ‘‘data’’ for training the next RBM in the stack. After the pretraining, the RBMs are‘‘unrolled’’ to create a deep autoencoder, which is then fine-tuned using backpropagation oferror derivatives.

 

 

 

 

 

Starting with random weights in the twonetworks, they can be trained together byminimizing the discrepancy between the orig-inal data and its reconstruction. The requiredgradients are easily obtained by using the chainrule to backpropagate error derivatives firstthrough the decoder network and then throughthe encoder network (1). The whole system is called an autoencoderand is depicted in Fig. 1.

It is difficult to optimize the weights innonlinear autoencoders that have multiplehidden layers (2–4). With large initial weights,autoencoders typically find poor local minima;with small initial weights, the gradients in theearly layers are tiny, making it infeasible totrain autoencoders withmany hidden layers. If the initial weights are close to a good solution,gradient descent works well, but finding suchinitial weights requires a very different type ofalgorithm that learns one layer of features at atime. We introduce this pretraining procedurefor binary data, generalize it to real-valued data,and show that it works well for a variety of data sets.

An ensemble of binary vectors (e.g., im-ages) can be modeled using a two-layer net-work called a restricted Boltzmann machine (RBM) (5,6) in which stochastic, binary pixelsare connected to stochastic, binary featuredetectors using symmetrically weighted con-nections. The pixels correspond toBvisible[units of the RBM because their states are observed; the feature detectors correspond to hidden units. A joint configuration (v,h) of the visible and hidden units has an energy (7) given by

 402com永利1站 16

 

where

v

i

and

h

j

are the binary states of pixel

i

and feature

j

,

b

i

and

b

j

are their biases, and

w

ij

is the weight between them. The network as-

signs a probability to every possible image via

this energy function, as explained in (

8

).

 

 

The probability of a training image can be raised by adjusting the weights and biases to lower theenergy of that image and to raise the energy ofsimilar, confabulated images that the network would prefer to the real data.

 

 

 

 

 

 

 

本文由402com永利平台发布于科学研究,转载请注明出处:利用神经网络降低数据维度,网络的数据降维和

关键词: