Roteiv Talking


  • 首页

  • 归档

  • 标签

使用Tflearn封装的Tensorflow的经典网络代码

发表于 2017-11-06 | 分类于 deep-learning

AlexNet

“ImageNet Classification with Deep Convolutional Neural Networks”是Hinton和他的学生Alex Krizhevsky在12年ImageNet Challenge使用的模型结构,刷新了Image Classification的记录。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import tflearn
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.normalization import local_response_normalization
from tflearn.layers.estimator import regression
import tflearn.datasets.oxflower17 as oxflower17
X, Y = oxflower17.load_data(one_hot=True, resize_pics=(227, 227))
# Building 'AlexNet'
network = input_data(shape=[None, 227, 227, 3])
network = conv_2d(network, 96, 11, strides=4, activation='relu')
network = max_pool_2d(network, 3, strides=2)
network = local_response_normalization(network)
network = conv_2d(network, 256, 5, activation='relu')
network = max_pool_2d(network, 3, strides=2)
network = local_response_normalization(network)
network = conv_2d(network, 384, 3, activation='relu')
network = conv_2d(network, 384, 3, activation='relu')
network = conv_2d(network, 256, 3, activation='relu')
network = max_pool_2d(network, 3, strides=2)
network = local_response_normalization(network)
network = fully_connected(network, 4096, activation='tanh')
network = dropout(network, 0.5)
network = fully_connected(network, 4096, activation='tanh')
network = dropout(network, 0.5)
network = fully_connected(network, 17, activation='softmax')
network = regression(network, optimizer='momentum',
loss='categorical_crossentropy',
learning_rate=0.001)
# Training
model = tflearn.DNN(network, checkpoint_path='model_alexnet',
max_checkpoints=1, tensorboard_verbose=2)
model.fit(X, Y, n_epoch=1000, validation_set=0.1, shuffle=True,
show_metric=True, batch_size=64, snapshot_step=200,
snapshot_epoch=False, run_id='alexnet_oxflowers17')

SqueezeNet

相比了比AlexNet少50倍的参数,达到了AlexNet相同的精度!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
import tflearn
from tflearn.layers.core import input_data, dropout, fully_connected, flatten
from tflearn.layers.conv import conv_2d, max_pool_2d, avg_pool_2d
from tflearn.layers.merge_ops import merge
from tflearn.layers.estimator import regression
network = input_data(shape = [None, 256, 256, 1])
network = conv_2d(network, 96, 3, strides = 3, activation = 'relu')
network = max_pool_2d(network, 3, strides = 2)
# Fire 1
fire2_squeeze = conv_2d(network, 16, 1, activation = 'relu')
fire2_expand1 = conv_2d(fire2_squeeze, 64, 1, activation = 'relu')
fire2_expand2 = conv_2d(fire2_squeeze, 64, 3, activation = 'relu')
network = merge([fire2_expand1, fire2_expand2], mode = 'concat', axis = 1)
# Fire 2
fire3_squeeze = conv_2d(network, 16, 1, activation = 'relu')
fire3_expand1 = conv_2d(fire3_squeeze, 64, 1, activation = 'relu')
fire3_expand2 = conv_2d(fire3_squeeze, 64, 3, activation = 'relu')
network = merge([fire3_expand1, fire3_expand2], mode = 'concat', axis = 1)
# Fire 3
fire4_squeeze = conv_2d(network, 32, 1, activation = 'relu')
fire4_expand1 = conv_2d(fire4_squeeze, 128, 1, activation = 'relu')
fire4_expand2 = conv_2d(fire4_squeeze, 128, 3, activation = 'relu')
network = merge([fire2_expand1, fire2_expand2], mode = 'concat', axis = 1)
# MaxPool 4
network = max_pool_2d(network, 2)
# Fire 5
fire5_squeeze = conv_2d(network, 32, 1, activation = 'relu')
fire5_expand1 = conv_2d(fire5_squeeze, 128, 1, activation = 'relu')
fire5_expand2 = conv_2d(fire5_squeeze, 128, 3, activation = 'relu')
network = merge([fire2_expand1, fire2_expand2], mode = 'concat', axis = 1)
# Fire 6
fire6_squeeze = conv_2d(network, 48, 1, activation = 'relu')
fire6_expand1 = conv_2d(fire6_squeeze, 192, 1, activation = 'relu')
fire6_expand2 = conv_2d(fire6_squeeze, 192, 3, activation = 'relu')
network = merge([fire6_expand1, fire6_expand2], mode = 'concat', axis = 1)
# Fire 7
fire7_squeeze = conv_2d(network, 48, 1, activation = 'relu')
fire7_expand1 = conv_2d(fire7_squeeze, 192, 1, activation = 'relu')
fire7_expand2 = conv_2d(fire7_squeeze, 192, 3, activation = 'relu')
network = merge([fire7_expand1, fire7_expand2], mode = 'concat', axis = 1)
# Fire 8
fire8_squeeze = conv_2d(network, 64, 1, activation = 'relu')
fire8_expand1 = conv_2d(fire8_squeeze, 256, 1, activation = 'relu')
fire8_expand2 = conv_2d(fire8_squeeze, 256, 3, activation = 'relu')
network = merge([fire8_expand1, fire8_expand2], mode = 'concat', axis = 1)
# MaxPool 8
network = max_pool_2d(network, 2)
# Fire 9
fire9_squeeze = conv_2d(network, 64, 1, activation = 'relu')
fire9_expand1 = conv_2d(fire9_squeeze, 256, 1, activation = 'relu')
fire9_expand2 = conv_2d(fire9_squeeze, 256, 3, activation = 'relu')
network = merge([fire9_expand1, fire9_expand2], mode = 'concat', axis = 1)
network = dropout(network, 0.5)
# Conv10
network = conv_2d(network, 10, 1, activation = 'relu', padding = 'valid')
# AVG 1
network = avg_pool_2d(network, 3) # LOL
network = flatten(network)
network = fully_connected(network, len(EMOTIONS), activation = 'softmax')
network = regression(network,
optimizer = 'momentum',
loss = 'categorical_crossentropy')
model = tflearn.DNN(
network,
checkpoint_path = SAVE_DIRECTORY + '/alexnet_mood_recognition',
max_checkpoints = 1,
tensorboard_verbose = 2
)

GoogLeNet

GoogLeNet是ILSVRC 2014的冠军,文章”Going Deeper with Convolutions”。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
import tflearn
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.conv import conv_2d, max_pool_2d, avg_pool_2d
from tflearn.layers.normalization import local_response_normalization
from tflearn.layers.merge_ops import merge
from tflearn.layers.estimator import regression
import tflearn.datasets.oxflower17 as oxflower17
X, Y = oxflower17.load_data(one_hot=True, resize_pics=(227, 227))
network = input_data(shape=[None, 227, 227, 3])
conv1_7_7 = conv_2d(network, 64, 7, strides=2, activation='relu', name = 'conv1_7_7_s2')
pool1_3_3 = max_pool_2d(conv1_7_7, 3,strides=2)
pool1_3_3 = local_response_normalization(pool1_3_3)
conv2_3_3_reduce = conv_2d(pool1_3_3, 64,1, activation='relu',name = 'conv2_3_3_reduce')
conv2_3_3 = conv_2d(conv2_3_3_reduce, 192,3, activation='relu', name='conv2_3_3')
conv2_3_3 = local_response_normalization(conv2_3_3)
pool2_3_3 = max_pool_2d(conv2_3_3, kernel_size=3, strides=2, name='pool2_3_3_s2')
inception_3a_1_1 = conv_2d(pool2_3_3, 64, 1, activation='relu', name='inception_3a_1_1')
inception_3a_3_3_reduce = conv_2d(pool2_3_3, 96,1, activation='relu', name='inception_3a_3_3_reduce')
inception_3a_3_3 = conv_2d(inception_3a_3_3_reduce, 128,filter_size=3, activation='relu', name = 'inception_3a_3_3')
inception_3a_5_5_reduce = conv_2d(pool2_3_3,16, filter_size=1,activation='relu', name ='inception_3a_5_5_reduce' )
inception_3a_5_5 = conv_2d(inception_3a_5_5_reduce, 32, filter_size=5, activation='relu', name= 'inception_3a_5_5')
inception_3a_pool = max_pool_2d(pool2_3_3, kernel_size=3, strides=1, )
inception_3a_pool_1_1 = conv_2d(inception_3a_pool, 32, filter_size=1, activation='relu', name='inception_3a_pool_1_1')
# merge the inception_3a__
inception_3a_output = merge([inception_3a_1_1, inception_3a_3_3, inception_3a_5_5, inception_3a_pool_1_1], mode='concat', axis=3)
inception_3b_1_1 = conv_2d(inception_3a_output, 128,filter_size=1,activation='relu', name= 'inception_3b_1_1' )
inception_3b_3_3_reduce = conv_2d(inception_3a_output, 128, filter_size=1, activation='relu', name='inception_3b_3_3_reduce')
inception_3b_3_3 = conv_2d(inception_3b_3_3_reduce, 192, filter_size=3, activation='relu',name='inception_3b_3_3')
inception_3b_5_5_reduce = conv_2d(inception_3a_output, 32, filter_size=1, activation='relu', name = 'inception_3b_5_5_reduce')
inception_3b_5_5 = conv_2d(inception_3b_5_5_reduce, 96, filter_size=5, name = 'inception_3b_5_5')
inception_3b_pool = max_pool_2d(inception_3a_output, kernel_size=3, strides=1, name='inception_3b_pool')
inception_3b_pool_1_1 = conv_2d(inception_3b_pool, 64, filter_size=1,activation='relu', name='inception_3b_pool_1_1')
#merge the inception_3b_*
inception_3b_output = merge([inception_3b_1_1, inception_3b_3_3, inception_3b_5_5, inception_3b_pool_1_1], mode='concat',axis=3,name='inception_3b_output')
pool3_3_3 = max_pool_2d(inception_3b_output, kernel_size=3, strides=2, name='pool3_3_3')
inception_4a_1_1 = conv_2d(pool3_3_3, 192, filter_size=1, activation='relu', name='inception_4a_1_1')
inception_4a_3_3_reduce = conv_2d(pool3_3_3, 96, filter_size=1, activation='relu', name='inception_4a_3_3_reduce')
inception_4a_3_3 = conv_2d(inception_4a_3_3_reduce, 208, filter_size=3, activation='relu', name='inception_4a_3_3')
inception_4a_5_5_reduce = conv_2d(pool3_3_3, 16, filter_size=1, activation='relu', name='inception_4a_5_5_reduce')
inception_4a_5_5 = conv_2d(inception_4a_5_5_reduce, 48, filter_size=5, activation='relu', name='inception_4a_5_5')
inception_4a_pool = max_pool_2d(pool3_3_3, kernel_size=3, strides=1, name='inception_4a_pool')
inception_4a_pool_1_1 = conv_2d(inception_4a_pool, 64, filter_size=1, activation='relu', name='inception_4a_pool_1_1')
inception_4a_output = merge([inception_4a_1_1, inception_4a_3_3, inception_4a_5_5, inception_4a_pool_1_1], mode='concat', axis=3, name='inception_4a_output')
inception_4b_1_1 = conv_2d(inception_4a_output, 160, filter_size=1, activation='relu', name='inception_4a_1_1')
inception_4b_3_3_reduce = conv_2d(inception_4a_output, 112, filter_size=1, activation='relu', name='inception_4b_3_3_reduce')
inception_4b_3_3 = conv_2d(inception_4b_3_3_reduce, 224, filter_size=3, activation='relu', name='inception_4b_3_3')
inception_4b_5_5_reduce = conv_2d(inception_4a_output, 24, filter_size=1, activation='relu', name='inception_4b_5_5_reduce')
inception_4b_5_5 = conv_2d(inception_4b_5_5_reduce, 64, filter_size=5, activation='relu', name='inception_4b_5_5')
inception_4b_pool = max_pool_2d(inception_4a_output, kernel_size=3, strides=1, name='inception_4b_pool')
inception_4b_pool_1_1 = conv_2d(inception_4b_pool, 64, filter_size=1, activation='relu', name='inception_4b_pool_1_1')
inception_4b_output = merge([inception_4b_1_1, inception_4b_3_3, inception_4b_5_5, inception_4b_pool_1_1], mode='concat', axis=3, name='inception_4b_output')
inception_4c_1_1 = conv_2d(inception_4b_output, 128, filter_size=1, activation='relu',name='inception_4c_1_1')
inception_4c_3_3_reduce = conv_2d(inception_4b_output, 128, filter_size=1, activation='relu', name='inception_4c_3_3_reduce')
inception_4c_3_3 = conv_2d(inception_4c_3_3_reduce, 256, filter_size=3, activation='relu', name='inception_4c_3_3')
inception_4c_5_5_reduce = conv_2d(inception_4b_output, 24, filter_size=1, activation='relu', name='inception_4c_5_5_reduce')
inception_4c_5_5 = conv_2d(inception_4c_5_5_reduce, 64, filter_size=5, activation='relu', name='inception_4c_5_5')
inception_4c_pool = max_pool_2d(inception_4b_output, kernel_size=3, strides=1)
inception_4c_pool_1_1 = conv_2d(inception_4c_pool, 64, filter_size=1, activation='relu', name='inception_4c_pool_1_1')
inception_4c_output = merge([inception_4c_1_1, inception_4c_3_3, inception_4c_5_5, inception_4c_pool_1_1], mode='concat', axis=3,name='inception_4c_output')
inception_4d_1_1 = conv_2d(inception_4c_output, 112, filter_size=1, activation='relu', name='inception_4d_1_1')
inception_4d_3_3_reduce = conv_2d(inception_4c_output, 144, filter_size=1, activation='relu', name='inception_4d_3_3_reduce')
inception_4d_3_3 = conv_2d(inception_4d_3_3_reduce, 288, filter_size=3, activation='relu', name='inception_4d_3_3')
inception_4d_5_5_reduce = conv_2d(inception_4c_output, 32, filter_size=1, activation='relu', name='inception_4d_5_5_reduce')
inception_4d_5_5 = conv_2d(inception_4d_5_5_reduce, 64, filter_size=5, activation='relu', name='inception_4d_5_5')
inception_4d_pool = max_pool_2d(inception_4c_output, kernel_size=3, strides=1, name='inception_4d_pool')
inception_4d_pool_1_1 = conv_2d(inception_4d_pool, 64, filter_size=1, activation='relu', name='inception_4d_pool_1_1')
inception_4d_output = merge([inception_4d_1_1, inception_4d_3_3, inception_4d_5_5, inception_4d_pool_1_1], mode='concat', axis=3, name='inception_4d_output')
inception_4e_1_1 = conv_2d(inception_4d_output, 256, filter_size=1, activation='relu', name='inception_4e_1_1')
inception_4e_3_3_reduce = conv_2d(inception_4d_output, 160, filter_size=1, activation='relu', name='inception_4e_3_3_reduce')
inception_4e_3_3 = conv_2d(inception_4e_3_3_reduce, 320, filter_size=3, activation='relu', name='inception_4e_3_3')
inception_4e_5_5_reduce = conv_2d(inception_4d_output, 32, filter_size=1, activation='relu', name='inception_4e_5_5_reduce')
inception_4e_5_5 = conv_2d(inception_4e_5_5_reduce, 128, filter_size=5, activation='relu', name='inception_4e_5_5')
inception_4e_pool = max_pool_2d(inception_4d_output, kernel_size=3, strides=1, name='inception_4e_pool')
inception_4e_pool_1_1 = conv_2d(inception_4e_pool, 128, filter_size=1, activation='relu', name='inception_4e_pool_1_1')
inception_4e_output = merge([inception_4e_1_1, inception_4e_3_3, inception_4e_5_5,inception_4e_pool_1_1],axis=3, mode='concat')
pool4_3_3 = max_pool_2d(inception_4e_output, kernel_size=3, strides=2, name='pool_3_3')
inception_5a_1_1 = conv_2d(pool4_3_3, 256, filter_size=1, activation='relu', name='inception_5a_1_1')
inception_5a_3_3_reduce = conv_2d(pool4_3_3, 160, filter_size=1, activation='relu', name='inception_5a_3_3_reduce')
inception_5a_3_3 = conv_2d(inception_5a_3_3_reduce, 320, filter_size=3, activation='relu', name='inception_5a_3_3')
inception_5a_5_5_reduce = conv_2d(pool4_3_3, 32, filter_size=1, activation='relu', name='inception_5a_5_5_reduce')
inception_5a_5_5 = conv_2d(inception_5a_5_5_reduce, 128, filter_size=5, activation='relu', name='inception_5a_5_5')
inception_5a_pool = max_pool_2d(pool4_3_3, kernel_size=3, strides=1, name='inception_5a_pool')
inception_5a_pool_1_1 = conv_2d(inception_5a_pool, 128, filter_size=1,activation='relu', name='inception_5a_pool_1_1')
inception_5a_output = merge([inception_5a_1_1, inception_5a_3_3, inception_5a_5_5, inception_5a_pool_1_1], axis=3,mode='concat')
inception_5b_1_1 = conv_2d(inception_5a_output, 384, filter_size=1,activation='relu', name='inception_5b_1_1')
inception_5b_3_3_reduce = conv_2d(inception_5a_output, 192, filter_size=1, activation='relu', name='inception_5b_3_3_reduce')
inception_5b_3_3 = conv_2d(inception_5b_3_3_reduce, 384, filter_size=3,activation='relu', name='inception_5b_3_3')
inception_5b_5_5_reduce = conv_2d(inception_5a_output, 48, filter_size=1, activation='relu', name='inception_5b_5_5_reduce')
inception_5b_5_5 = conv_2d(inception_5b_5_5_reduce,128, filter_size=5, activation='relu', name='inception_5b_5_5' )
inception_5b_pool = max_pool_2d(inception_5a_output, kernel_size=3, strides=1, name='inception_5b_pool')
inception_5b_pool_1_1 = conv_2d(inception_5b_pool, 128, filter_size=1, activation='relu', name='inception_5b_pool_1_1')
inception_5b_output = merge([inception_5b_1_1, inception_5b_3_3, inception_5b_5_5, inception_5b_pool_1_1], axis=3, mode='concat')
pool5_7_7 = avg_pool_2d(inception_5b_output, kernel_size=7, strides=1)
pool5_7_7 = dropout(pool5_7_7, 0.4)
loss = fully_connected(pool5_7_7, 17,activation='softmax')
network = regression(loss, optimizer='momentum',
loss='categorical_crossentropy',
learning_rate=0.001)
model = tflearn.DNN(network, checkpoint_path='model_googlenet',
max_checkpoints=1, tensorboard_verbose=2)
model.fit(X, Y, n_epoch=1000, validation_set=0.1, shuffle=True,
show_metric=True, batch_size=64, snapshot_step=200,
snapshot_epoch=False, run_id='googlenet_oxflowers17')

VGGnet

VGGnet是Oxford的Visual Geometry Group的team,ILSVRC 2014上第二名。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import tflearn
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.estimator import regression
# Data loading and preprocessing
import tflearn.datasets.oxflower17 as oxflower17
X, Y = oxflower17.load_data(one_hot=True)
# Building 'VGG Network'
network = input_data(shape=[None, 227, 227, 3])
network = conv_2d(network, 64, 3, activation='relu')
network = conv_2d(network, 64, 3, activation='relu')
network = max_pool_2d(network, 2, strides=2)
network = conv_2d(network, 128, 3, activation='relu')
network = conv_2d(network, 128, 3, activation='relu')
network = max_pool_2d(network, 2, strides=2)
network = conv_2d(network, 256, 3, activation='relu')
network = conv_2d(network, 256, 3, activation='relu')
network = conv_2d(network, 256, 3, activation='relu')
network = max_pool_2d(network, 2, strides=2)
network = conv_2d(network, 512, 3, activation='relu')
network = conv_2d(network, 512, 3, activation='relu')
network = conv_2d(network, 512, 3, activation='relu')
network = max_pool_2d(network, 2, strides=2)
network = conv_2d(network, 512, 3, activation='relu')
network = conv_2d(network, 512, 3, activation='relu')
network = conv_2d(network, 512, 3, activation='relu')
network = max_pool_2d(network, 2, strides=2)
network = fully_connected(network, 4096, activation='relu')
network = dropout(network, 0.5)
network = fully_connected(network, 4096, activation='relu')
network = dropout(network, 0.5)
network = fully_connected(network, 17, activation='softmax')
network = regression(network, optimizer='rmsprop',
loss='categorical_crossentropy',
learning_rate=0.001)
# Training
model = tflearn.DNN(network, checkpoint_path='model_vgg',
max_checkpoints=1, tensorboard_verbose=0)
model.fit(X, Y, n_epoch=500, shuffle=True,
show_metric=True, batch_size=32, snapshot_step=500,
snapshot_epoch=False, run_id='vgg_oxflowers17')

Residual Network

“Deep Residual Learning for Image Recognition”是ILSVRC 2015的冠军,现在最强大的网络模型。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import tflearn
# Residual blocks
# 32 layers: n=5, 56 layers: n=9, 110 layers: n=18
n = 5
# Data loading
from tflearn.datasets import cifar10
(X, Y), (testX, testY) = cifar10.load_data()
Y = tflearn.data_utils.to_categorical(Y, 10)
testY = tflearn.data_utils.to_categorical(testY, 10)
# Real-time data preprocessing
img_prep = tflearn.ImagePreprocessing()
img_prep.add_featurewise_zero_center(per_channel=True)
# Real-time data augmentation
img_aug = tflearn.ImageAugmentation()
img_aug.add_random_flip_leftright()
img_aug.add_random_crop([32, 32], padding=4)
# Building Residual Network
net = tflearn.input_data(shape=[None, 32, 32, 3],
data_preprocessing=img_prep,
data_augmentation=img_aug)
net = tflearn.conv_2d(net, 16, 3, regularizer='L2', weight_decay=0.0001)
net = tflearn.residual_block(net, n, 16)
net = tflearn.residual_block(net, 1, 32, downsample=True)
net = tflearn.residual_block(net, n-1, 32)
net = tflearn.residual_block(net, 1, 64, downsample=True)
net = tflearn.residual_block(net, n-1, 64)
net = tflearn.batch_normalization(net)
net = tflearn.activation(net, 'relu')
net = tflearn.global_avg_pool(net)
# Regression
net = tflearn.fully_connected(net, 10, activation='softmax')
mom = tflearn.Momentum(0.1, lr_decay=0.1, decay_step=32000, staircase=True)
net = tflearn.regression(net, optimizer=mom,
loss='categorical_crossentropy')
# Training
model = tflearn.DNN(net, checkpoint_path='model_resnet_cifar10',
max_checkpoints=10, tensorboard_verbose=0,
clip_gradients=0.)
model.fit(X, Y, n_epoch=200, validation_set=(testX, testY),
snapshot_epoch=False, snapshot_step=500,
show_metric=True, batch_size=128, shuffle=True,
run_id='resnet_cifar10')

再一次清理Redis中多余的Key的操作过程

发表于 2016-09-19 | 分类于 bigdata

在我们的一个系统中,一个Redis存储KEY的TTL设置为5年,但后期内存压力较大需要清理较久的数据。根据业务逻辑的特性,确定删除TTL有效期小于4年的KEY(即,创建时间超过1年)。

将Redis的rdb文件导出

通常有两种方案:

  • 直接使用copy命令复制dump.rdb文件;
  • 通过redis-cli的–rdb命令导出。

将要删除的Key从rdb中导出来

需要首先手工安装LesTR 的 redis-rdb-tools.

将带所有的KEY及TTL等信息导出到csv文件

1
rdb --command memory dump.rdb > usertoken_keys.csv

此时生成的CSV文件格式如下:

1
2
3
4
5
database,type,key,size_in_bytes,encoding,num_elements,len_largest_element,ttl
0,string,"kt.usertoken-C102685U46553459",237,string,86,86,130272503.0
0,string,"kt.usertoken-C102685U402255982",238,string,86,86,135267585.0
0,string,"kt.usertoken-C103439U432281392",238,string,86,86,148310228.0
0,string,"kt.usertoken-C103367U440243305",238,string,86,86,150836002.0

将CSV中TTL小于4年的KEY字段导出:

1
sed '1d' usertoken_keys.csv | awk -F',' '{if($8 < 126144000){gsub(/[:\"]/,"",$3); print $3}}' > rkeys.csv

此时生成的rkeys.txt文件中就包含所有的KEY,可通过写一个python程序读此文件完成实际的删除。生成文件内容样例:

1
2
3
4
5
kt.usertoken-C102685U46553459
kt.usertoken-C102685U402255982
kt.usertoken-C101287U35606393
kt.usertoken-C101675U404094112
kt.usertoken-C101140U36998042

How easy to setup Kafka

发表于 2016-08-04 | 分类于 bigdata

Kafka is a Runtime Message Queue.

Environment

  • Three installed jdk’s linux server.
    Demo Kafka server ip was 192.168.200.1, 192.168.200.2, 192.168.200.3.
    Used existing Zookeeper server ip was 192.168.100.2.
  • Download Kafka 0.9.x from kafka.apache.org
  • Extract Hbase’s package file to /opt and create a symlink name with /opt/kafka
  • Always in /opt/kafka for working

Configuration

Global (all server)

  • Create target directory
1
$ mkdir /kafka
  • Modify file config/server.properties
1
2
3
port=9092
log.dirs=/kafka
zookeeper.connect=192.168.100.2:2181/kafka

Node (192.168.200.1)

  • Modify file config/server.properties
1
2
broker.id=1
advertised.host.name=192.168.200.1

Node (192.168.200.2)

  • Modify file config/server.properties
1
2
broker.id=2
advertised.host.name=192.168.200.2

Node (192.168.200.3)

  • Modify file config/server.properties
1
2
broker.id=3
advertised.host.name=192.168.200.3

Startup & Shutdown

  • Start
1
./bin/kafka-server-start.sh config/server.properties
  • Stop
1
./bin/kafka-server-stop.sh config/server.properties

Usage

Create Topic

1
./bin/kafka-topics.sh --zookeeper=192.168.100.2:2181/kafka --create --topic test

Send Message To Topic (Test)

1
./bin/kafka-console-producer.sh --broker-list=192.168.200.1:9092,192.168.200.2:9092,192.168.200.3:9092 --topic=test

Receive Message From Topic (Test)

1
./bin/kafka-console-consumer.sh --zookeeper=192.168.100.2:2181/kafka --topic=test

Mirror Server

Environment

  • Mirror Server ip was 192.168.200.10
  • The Kafka2 server group
    Zookeeper: 192.168.100.2:2181/kafka2
    Demo server ip was 192.168.200.11,192.168.200.12,192.168.20.13
  • Install to /opt/kafka, and configure it.
  • Data stream copy from kafka to kafka2

Configuration

  • Modify file config/consumer.properties
1
zookeeper.connect=192.168.100.2:2181/kafka
  • Modify file config/producer.properties
1
2
bootstrap.servers=192.168.200.11:9092,192.168.200.12:9092,192.168.200.13:9092
metadata.broker.list=192.168.200.11:9092,192.168.200.12:9092,192.168.200.13:9092

Startup

1
./bin/kafka-mirror-maker.sh --consumer.config=config/consumer.properties --producer.config=config/producer.properties --blacklist='ignore*'

How easy to setup HBase Fully-distributed

发表于 2016-07-26 | 分类于 bigdata

HBase is a NoSQL under Hadoop.

Environment

  • Three installed jdk’s linux server.
    Startup the ntpd service already.
    Demo HBase server ip was 192.168.200.1, 192.168.200.2, 192.168.200.3.
    Used existing HDFS server ip was 192.168.100.1.
    Used existing Zookeeper server ip was 192.168.100.2.
  • Download HBase 1.2.x from hbase.apache.org
  • Extract Hbase’s package file to /opt and create a symlink name with /opt/hbase
  • Always in /opt/hbase for working

Configuration

Master (192.168.200.1)

  • Modify file conf/hbase-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://192.168.100.1:8020/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>192.168.100.2</value>
</property>
<property>
<name>hbase.zookeeper.clientPort</name>
<value>2181</value>
</property>
<property>
<name>zookeeper.znode.parent</name>
<value>/hbase</value>
</property>
<property>
<name>hbase.master.hostname</name>
<value>192.168.200.1</value>
</property>
</configuration>

hbase.master.hostname set real ip address with itself, don’t try the hostname.

RegionServer (192.168.200.2/3)

  • Modify file conf/hbase-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://192.168.100.1:8020/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>192.168.100.2</value>
</property>
<property>
<name>hbase.zookeeper.clientPort</name>
<value>2181</value>
</property>
<property>
<name>zookeeper.znode.parent</name>
<value>/hbase</value>
</property>
<property>
<name>hbase.regionserver.hostname</name>
<value>192.168.200.2</value>
</property>
</configuration>

hbase.regionserver.hostname set real ip address with itself, don’t try the hostname.

More Configuration (conf/hbase-site.xml)

  • HBase files replication in HDFS
1
2
3
4
<property>
<name>dfs.replication</name>
<value>2</value>
</property>

Startup & Shutdown

Master

  • Start
1
./bin/hbase-daemon.sh start master
  • Stop
1
./bin/hbase-daemon.sh stop master

You can start more type servers, like: thrift, thrift2, rest, etc.

RegionServer

  • Start
1
./bin/hbase-daemon.sh start regionserver
  • Stop
1
./bin/hbase-daemon.sh stop regionserver

Web interface

Open http://192.168.200.1:16010 in web browser, enjoy it.

Phoenix Installed (If you need it)

Phoenix is a SQL wapper for HBase.

  • Download phoenix-[version]-bin.tar.gz from phoenix.apache.org
  • Extract phoenix-[version]-server.jar and Copy it to ‘/opt/hbase/lib’ directory in All HBase instance

Configuration

  • Modify file conf/hbase-site.xml
1
2
3
4
<property>
<name>hbase.table.sanity.checks</name>
<value>false</value>
</property>

Usage

  • Restart All HBase instance after installed.
  • How to connect it? Click here

一次清理Redis中多余的Key的操作过程

发表于 2016-05-05 | 分类于 bigdata

在我们的一个系统中,使用一个Redis存储所有的Cache数据。后期因为Redis本身占用资源过大,加入一个新的Redis,而原来Redis中只保留了一种种类数据。此时需要将原Redis中的不使用的数据进行清理。此文就记录了这样一个过程。

将Redis的rdb文件导出

通常有两种方案:

  • 直接使用copy命令复制dump.rdb文件;
  • 通过redis-cli的–rdb命令导出。

将要删除的Key从rdb中导出来

需要首先安装redis-rdb-tools, 可使用pip安装:

1
pip install rdbtools

将所有要删除的KEY导出到csv文件
我们的场景是排除所有包含kt.usertoken的KEY

1
rdb --command memory dump.rdb | grep -v "kt.usertoken" > rkeys.csv

此时生成的CSV文件格式如下:

1
2
3
4
5
6
database,type,key,size_in_bytes,encoding,num_elements,len_largest_element
0,string,"PREFIX_USER_MSG_912c05fac706a6fa679326e95d188f67_0",142,string,2,2
0,string,"PREFIX_USER_MSG_99641fecd9d30a3c07b02836c7a8147f2618cb56_6847",438,string,287,287
0,string,"PREFIX_USER_MSG_c0c694ddbec7a82d1755fcd79026260e834a0e29_0",150,string,2,2
0,string,"PREFIX_USER_MSG_e246022a79f1fbd52b3dbecef18c51d27a6f3e9f_0",150,string,2,2
0,string,"PREFIX_USER_MSG_719bd47ebf34743150b5535d841e9628bc27decd_6851",153,string,2,2

将CSV中的KEY字段导出:

1
sed '1d' rkeys.csv | awk 'match($0, /\,\"(.*)\"\,/, a) {print a[1]}' > rkeys.txt

此时生成的rkeys.txt文件中就包含所有的KEY,可通过写一个python程序读此文件完成实际的删除。生成文件内容样例:

1
2
3
4
PREFIX_USER_MSG_912c05fac706a6fa679326e95d188f67_0
PREFIX_USER_MSG_99641fecd9d30a3c07b02836c7a8147f2618cb56_6847
PREFIX_USER_MSG_c0c694ddbec7a82d1755fcd79026260e834a0e29_0
PREFIX_USER_MSG_e246022a79f1fbd52b3dbecef18c51d27a6f3e9f_0

操作过程中遇到的一些坑

  • 使用csvtool无法操作大文件,所以采用awk代替。
  • 使用bash脚本来读取rkeys.txt进而调用redis-cli del来进行KEY的删除很容易出现“特殊字符”问题,所以尽量使用程序来删除。
123
Vietor Liu

Vietor Liu

15 日志
4 分类
18 标签
GitHub Twitter
© 2017 Vietor Liu