So it turns out that an analytic solution to the ant fighting for aichallenge.org is, well, non-trivial. My new ant bot now does a pretty reasonable job of it, yet the lazy part of me wonders if there's an easier way.
I've been taking the online Machine Learning course at Stanford, so naturally I wanted to apply last week's theory to it. I wouldn't expect a linear classifier (a one-vs-all linear logistic regression classifier) to actual perform all that well. But it does.
So. I took replays JSON of games played by the #1 ranked bot. I put the replays through a simple-ish Groovy script, and every time an ant was in proximity to an enemy ant I captured the local surroundings and the corresponding order as a training data example. A few games yielded over 100k unique training data points - which I shuffled to remove any bias in the distribution for the training and test data sets.
Using Octave the actual programming for the training was very very simple. Here's a taste of how simple (I haven't posted the sigmoid.m, costFunctionReg.m, or predict.m because they're coursework answers):And this is the output: (the target classification labels here are N, E, S, W, and - respectively)clear ; close all; clc
data = load('training.all.nodupes.shuffled.txt');
yRaw = data(:, [1]); X = [ones(size(yRaw)) data(:, 2:size(data,2))];
trainSize = 110000;
testSize = length(yRaw)-trainSize;
lambda = 300;
fprintf('Training set size: %d\n', trainSize);
fprintf('Test set size: %d\n', testSize);
fprintf('Regularization lambda: %d\n', lambda);
for yTarget=1:5
fprintf('Target classification label = %d ...\n', yTarget);
y = yRaw;
for i=1:length(y)
if (y(i) == yTarget)
y(i) = 1;
else
y(i) = 0;
end
end
trainX = X(1:trainSize, :);
trainY = y(1:trainSize, :);
testX = X(trainSize+1:trainSize+testSize, :);
testY = y(trainSize+1:trainSize+testSize, :);
[m, n] = size(trainX);
initial_theta = zeros(n, 1);
options = optimset('GradObj', 'on', 'MaxIter', 1000);
[theta, cost] = ...
fminunc(@(t)(costFunctionReg(t, trainX, trainY, lambda)),
initial_theta, options);
fprintf('Train set accuracy: %f\n',
mean(double(predict(theta, trainX) == trainY)) * 100);
fprintf('Test set accuracy: %f\n',
mean(double(predict(theta, testX) == testY)) * 100);
endImpressive! Well, the proof is in the pudding. Once this is integrated into an Ant bot I first need to see if it can beat my current Bot.octave-3.4.0:36> train
Training set size: 110000
Test set size: 27180
Regularization lambda: 300
Target classification label = 1 ...
Train set accuracy: 79.710909
Test set accuracy: 79.337748
Target classification label = 2 ...
Train set accuracy: 78.755455
Test set accuracy: 78.491538
Target classification label = 3 ...
Train set accuracy: 80.984545
Test set accuracy: 81.111111
Target classification label = 4 ...
Train set accuracy: 78.599091
Test set accuracy: 78.863135
Target classification label = 5 ...
Train set accuracy: 82.936364
Test set accuracy: 83.153054
If you're reading the Octave source code and feeling cheated, yes, the magic happens inside the fminunc function. This is part of the beauty of Octave. You don't have to sweat to write a hyper-efficient numerical methods implementation. You don't have to wade through questionable third-party libraries for something that might work. You can use one that's provided and does the job well. Very nice indeed.
Nov 2, 2011
Cheating at the Prime Directive?
Subscribe to:
Post Comments (Atom)
2 comments:
Hey, that's good fun! I'm also taking the ML course, and after this week's homework on training neural network I'm really interested in seeing what would be possible with that approach.
When I've looked at the game "indata for bots" it strikes me that it's maybe not always that easy to identify how the bots played in the previous turns, especially if there were casualties. How did you handle that?
Regards / Claes W (DrClaes in the ai challenge)
Hey Claes W,
The first step is to get your representation of your features in the most friendly format. I think the mistake I made was to consider it a classification problem. In fact I think it's more appropriate to take out the symmetry of the moves and instead focus on regression of a Value function for the resulting state.
I'm not sure what you mean by "indata". The game specification is documented, although not entirely up to date. Each bot action is represented as the next in their chain of the "moves" attribute. Playing that all forwards allows you to determine when ants die and in what game state. You're not told why, but I'm presuming not that many pro ants die by colliding into each other when next to the enemy.
I don't think DaBunny and DrClaes have had a match online. Want to send me your Bot.zip? I have an aichallenge server running in a VM. My email address is curious dot attempt dot bunny at gmail dot com.
Cheers,
Merlyn
Post a Comment