machine learning - Use neural network to learn a square wave function -
out of curiosity, trying build simple connected nn using tensorflow learn square wave function such following one:
therefore input 1d array of x value (as horizontal axis), , output binary scalar value. used tf.nn.sparse_softmax_cross_entropy_with_logits loss function, , tf.nn.relu activation. there 3 hidden layers (100*100*100) , single input node , output node. input data generated match above wave shape , therefore data size not problem.
however, trained model seems fail completed, predicting negative class always.
so trying figure out why happened. whether nn configuration suboptimal, or due mathematical flaw in nn beneath surface (though think nn should able imitate function).
thanks.
as per suggestions in comment section, here full code. 1 thing noticed saying wrong earlier is, there 2 output nodes (due 2 output classes):
""" see if neural net can find piecewise linear correlation in data """ import time import os import tensorflow tf import numpy np def generate_placeholder(batch_size): x_placeholder = tf.placeholder(tf.float32, shape=(batch_size, 1)) y_placeholder = tf.placeholder(tf.float32, shape=(batch_size)) return x_placeholder, y_placeholder def feed_placeholder(x, y, x_placeholder, y_placeholder, batch_size, loop): x_selected = [[none]] * batch_size y_selected = [none] * batch_size in range(batch_size): x_selected[i][0] = x[min(loop*batch_size, loop*batch_size % len(x)) + i, 0] y_selected[i] = y[min(loop*batch_size, loop*batch_size % len(y)) + i] feed_dict = {x_placeholder: x_selected, y_placeholder: y_selected} return feed_dict def inference(input_x, h1_units, h2_units, h3_units): tf.name_scope('h1'): weights = tf.variable(tf.truncated_normal([1, h1_units], stddev=1.0/2), name='weights') biases = tf.variable(tf.zeros([h1_units]), name='biases') a1 = tf.nn.relu(tf.matmul(input_x, weights) + biases) tf.name_scope('h2'): weights = tf.variable(tf.truncated_normal([h1_units, h2_units], stddev=1.0/h1_units), name='weights') biases = tf.variable(tf.zeros([h2_units]), name='biases') a2 = tf.nn.relu(tf.matmul(a1, weights) + biases) tf.name_scope('h3'): weights = tf.variable(tf.truncated_normal([h2_units, h3_units], stddev=1.0/h2_units), name='weights') biases = tf.variable(tf.zeros([h3_units]), name='biases') a3 = tf.nn.relu(tf.matmul(a2, weights) + biases) tf.name_scope('softmax_linear'): weights = tf.variable(tf.truncated_normal([h3_units, 2], stddev=1.0/np.sqrt(h3_units)), name='weights') biases = tf.variable(tf.zeros([2]), name='biases') logits = tf.matmul(a3, weights) + biases return logits def loss(logits, labels): labels = tf.to_int32(labels) cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits, name='xentropy') return tf.reduce_mean(cross_entropy, name='xentropy_mean') def inspect_y(labels): return tf.reduce_sum(tf.cast(labels, tf.int32)) def training(loss, learning_rate): tf.summary.scalar('lost', loss) optimizer = tf.train.gradientdescentoptimizer(learning_rate) global_step = tf.variable(0, name='global_step', trainable=false) train_op = optimizer.minimize(loss, global_step=global_step) return train_op def evaluation(logits, labels): labels = tf.to_int32(labels) correct = tf.nn.in_top_k(logits, labels, 1) return tf.reduce_sum(tf.cast(correct, tf.int32)) def run_training(x, y, batch_size): tf.graph().as_default(): x_placeholder, y_placeholder = generate_placeholder(batch_size) logits = inference(x_placeholder, 100, 100, 100) loss = loss(logits, y_placeholder) y_sum = inspect_y(y_placeholder) train_op = training(loss, 0.01) init = tf.global_variables_initializer() sess = tf.session() sess.run(init) max_steps = 10000 step in range(max_steps): start_time = time.time() feed_dict = feed_placeholder(x, y, x_placeholder, y_placeholder, batch_size, step) _, loss_val = sess.run([train_op, loss], feed_dict = feed_dict) duration = time.time() - start_time if step % 100 == 0: print('step {}: loss = {:.2f} {:.3f}sec'.format(step, loss_val, duration)) x_test = np.array(range(1000)) * 0.001 x_test = np.reshape(x_test, (1000, 1)) _ = sess.run(logits, feed_dict={x_placeholder: x_test}) print(min(_[:, 0]), max(_[:, 0]), min(_[:, 1]), max(_[:, 1])) print(_) if __name__ == '__main__': population = 10000 input_x = np.random.rand(population) input_y = np.copy(input_x) bin in range(10): print(bin, bin/10, 0.5 - 0.5*(-1)**bin) input_y[input_x >= bin/10] = 0.5 - 0.5*(-1)**bin batch_size = 1000 input_x = np.reshape(input_x, (population, 1)) run_training(input_x, input_y, batch_size)
sample output shows model prefer first class on second, shown min(_[:, 0])
> max(_[:, 1])
, i.e. minimum logit output first class higher maximum logit output second class, sample size of population
.
my mistake. problem occurred in line:
for in range(batch_size): x_selected[i][0] = x[min(loop*batch_size, loop*batch_size % len(x)) + i, 0] y_selected[i] = y[min(loop*batch_size, loop*batch_size % len(y)) + i]
python mutating whole list of x_selected
same value. code issue resolved. fix is:
x_selected = np.zeros((batch_size, 1)) y_selected = np.zeros((batch_size,)) in range(batch_size): x_selected[i, 0] = x[(loop*batch_size + i) % x.shape[0], 0] y_selected[i] = y[(loop*batch_size + i) % y.shape[0]]
after fix, model showing more variation. outputs class 0 x <= 0.5 , class 1 x > 0.5. still far ideal.
so after changing network configuration 100 nodes * 4 layers, after 1 million training steps (batch size = 100, sample size = 10 million), model performing showing errors @ edges when y flips. therefore question closed.
you try learn periodic function , function highly non-linear , non-smooth. not simple looks like. in short, better representation of input feature helps.
suppose have period t = 2
, f(x) = f(x+2)
. reduced problem when input/output integers, function f(x) = 1 if x odd else -1.
in case, problem reduced this discussion in train neural network distinguish between odd , numbers.
i guess second bullet in post should (even general case when inputs float numbers).
try representing numbers in binary using fixed length precision.
in our reduced problem above, it's easy see output determined iff least-significant bit known.
decimal binary -> output 1: 0 0 1 -> 1 2: 0 1 0 -> -1 3: 0 1 1 -> 1 ...
Comments
Post a Comment