machine learning - Use neural network to learn a square wave function -


out of curiosity, trying build simple connected nn using tensorflow learn square wave function such following one: credits www.thedawstudio.com

therefore input 1d array of x value (as horizontal axis), , output binary scalar value. used tf.nn.sparse_softmax_cross_entropy_with_logits loss function, , tf.nn.relu activation. there 3 hidden layers (100*100*100) , single input node , output node. input data generated match above wave shape , therefore data size not problem.

however, trained model seems fail completed, predicting negative class always.

so trying figure out why happened. whether nn configuration suboptimal, or due mathematical flaw in nn beneath surface (though think nn should able imitate function).

thanks.


as per suggestions in comment section, here full code. 1 thing noticed saying wrong earlier is, there 2 output nodes (due 2 output classes):

"""     see if neural net can find piecewise linear correlation in data """  import time import os import tensorflow tf import numpy np  def generate_placeholder(batch_size):     x_placeholder = tf.placeholder(tf.float32, shape=(batch_size, 1))     y_placeholder = tf.placeholder(tf.float32, shape=(batch_size))     return x_placeholder, y_placeholder  def feed_placeholder(x, y, x_placeholder, y_placeholder, batch_size, loop):     x_selected = [[none]] * batch_size     y_selected = [none] * batch_size     in range(batch_size):         x_selected[i][0] = x[min(loop*batch_size, loop*batch_size % len(x)) + i, 0]         y_selected[i] = y[min(loop*batch_size, loop*batch_size % len(y)) + i]     feed_dict = {x_placeholder: x_selected,                  y_placeholder: y_selected}     return feed_dict  def inference(input_x, h1_units, h2_units, h3_units):      tf.name_scope('h1'):         weights = tf.variable(tf.truncated_normal([1, h1_units], stddev=1.0/2), name='weights')          biases = tf.variable(tf.zeros([h1_units]), name='biases')         a1 = tf.nn.relu(tf.matmul(input_x, weights) + biases)      tf.name_scope('h2'):         weights = tf.variable(tf.truncated_normal([h1_units, h2_units], stddev=1.0/h1_units), name='weights')          biases = tf.variable(tf.zeros([h2_units]), name='biases')         a2 = tf.nn.relu(tf.matmul(a1, weights) + biases)      tf.name_scope('h3'):         weights = tf.variable(tf.truncated_normal([h2_units, h3_units], stddev=1.0/h2_units), name='weights')          biases = tf.variable(tf.zeros([h3_units]), name='biases')         a3 = tf.nn.relu(tf.matmul(a2, weights) + biases)      tf.name_scope('softmax_linear'):         weights = tf.variable(tf.truncated_normal([h3_units, 2], stddev=1.0/np.sqrt(h3_units)), name='weights')          biases = tf.variable(tf.zeros([2]), name='biases')         logits = tf.matmul(a3, weights) + biases      return logits  def loss(logits, labels):     labels = tf.to_int32(labels)     cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits, name='xentropy')     return tf.reduce_mean(cross_entropy, name='xentropy_mean')  def inspect_y(labels):     return tf.reduce_sum(tf.cast(labels, tf.int32))  def training(loss, learning_rate):     tf.summary.scalar('lost', loss)     optimizer = tf.train.gradientdescentoptimizer(learning_rate)     global_step = tf.variable(0, name='global_step', trainable=false)     train_op = optimizer.minimize(loss, global_step=global_step)     return train_op  def evaluation(logits, labels):     labels = tf.to_int32(labels)     correct = tf.nn.in_top_k(logits, labels, 1)     return tf.reduce_sum(tf.cast(correct, tf.int32))  def run_training(x, y, batch_size):     tf.graph().as_default():         x_placeholder, y_placeholder = generate_placeholder(batch_size)         logits = inference(x_placeholder, 100, 100, 100)         loss = loss(logits, y_placeholder)         y_sum = inspect_y(y_placeholder)         train_op = training(loss, 0.01)         init = tf.global_variables_initializer()         sess = tf.session()         sess.run(init)         max_steps = 10000         step in range(max_steps):             start_time = time.time()             feed_dict = feed_placeholder(x, y, x_placeholder, y_placeholder, batch_size, step)             _, loss_val = sess.run([train_op, loss], feed_dict = feed_dict)             duration = time.time() - start_time             if step % 100 == 0:                 print('step {}: loss = {:.2f} {:.3f}sec'.format(step, loss_val, duration))     x_test = np.array(range(1000)) * 0.001     x_test = np.reshape(x_test, (1000, 1))     _ = sess.run(logits, feed_dict={x_placeholder: x_test})     print(min(_[:, 0]), max(_[:, 0]), min(_[:, 1]), max(_[:, 1]))     print(_)  if __name__ == '__main__':      population = 10000      input_x = np.random.rand(population)     input_y = np.copy(input_x)      bin in range(10):         print(bin, bin/10, 0.5 - 0.5*(-1)**bin)         input_y[input_x >= bin/10] = 0.5 - 0.5*(-1)**bin      batch_size = 1000      input_x = np.reshape(input_x, (population, 1))      run_training(input_x, input_y, batch_size) 

sample output shows model prefer first class on second, shown min(_[:, 0]) > max(_[:, 1]), i.e. minimum logit output first class higher maximum logit output second class, sample size of population.


my mistake. problem occurred in line:

for in range(batch_size):     x_selected[i][0] = x[min(loop*batch_size, loop*batch_size % len(x)) + i, 0]     y_selected[i] = y[min(loop*batch_size, loop*batch_size % len(y)) + i] 

python mutating whole list of x_selected same value. code issue resolved. fix is:

x_selected = np.zeros((batch_size, 1)) y_selected = np.zeros((batch_size,)) in range(batch_size):     x_selected[i, 0] = x[(loop*batch_size + i) % x.shape[0], 0]     y_selected[i] = y[(loop*batch_size + i) % y.shape[0]] 

after fix, model showing more variation. outputs class 0 x <= 0.5 , class 1 x > 0.5. still far ideal.


so after changing network configuration 100 nodes * 4 layers, after 1 million training steps (batch size = 100, sample size = 10 million), model performing showing errors @ edges when y flips. therefore question closed.

you try learn periodic function , function highly non-linear , non-smooth. not simple looks like. in short, better representation of input feature helps.

suppose have period t = 2, f(x) = f(x+2). reduced problem when input/output integers, function f(x) = 1 if x odd else -1. in case, problem reduced this discussion in train neural network distinguish between odd , numbers.

i guess second bullet in post should (even general case when inputs float numbers).

try representing numbers in binary using fixed length precision.

in our reduced problem above, it's easy see output determined iff least-significant bit known.

decimal  binary  -> output 1:       0 0 1   -> 1 2:       0 1 0   -> -1 3:       0 1 1   -> 1 ... 

Comments

Popular posts from this blog

ios - MKAnnotationView layer is not of expected type: MKLayer -

ZeroMQ on Windows, with Qt Creator -

unity3d - Unity SceneManager.LoadScene quits application -