[ Help | Earliest Comments | Latest Comments ]

[ List All Subjects of Discussion | Create New Subject of Discussion ]

[ List Latest Comments Only For Pages | Games | Rated Pages | Rated Games | Subjects of Discussion ]

Comments/Ratings for a Single Item

⇩Latest ⇩Later ⇩Reverse Order⇧ Earlier⇩ Earliest⇧

Piece Values[Subject Thread] [Add Response]

H. G. Muller wrote on Mon, May 26, 2008 04:47 PM UTC:

Derek Nalls:
| Nonetheless, completing games of CRC (where a long, close, 
| well-played game can require more than 80 moves per player) 
| in 0:24 minutes - 36 minutes does NOT qualify as long or even, 
| moderate time controls.  In the case of your longest 36-minute games, 
| with an example total of 160 moves, that allows just 13.5 seconds per 
| move per player.  In fact, that is an extremely short time by any 
| serious standards.  

In my experience most games on the average take only 60 moves (perhaps
because of the large strength difference of the players). As early moves
are more important for the game result as late moves (even the best moves
late in the game do not help you if your position is already lost), most
engines use 2.5% of the remaining time for their next move (on average,
depending on how the iterations end compared to the target time). That
would be nearly 54 sec/move at 36 min/game in the decisive phase of the
game. That is more than you thought, but admittedly still fast. Note,
however, that I also played 60-min games in the General Championship
(without time odds), and that Joker80 confirms its lead over the
competitors it manifested at faster time controls.

But I don't see the point: Joker80's strength increases with time as
expected, in the range from 0.4 sec to 36 sec per move, in a regular and
theoretically expected way. This is over the entire range where I tested
the dependence of the scoring percentage of various material imbalances,
which extended to only 15 sec/move, and found it to be independent of TC.
So your 'explanation' for the latter phenomenon is just nonsense. The
effect you mention is observed NOT to occur, and thus cannot explain
anything that was observed to occur.

Now if you want to conjecture that this will all miraculously become very
different at longer TC, you are welcome to test it and show us convincing
results. I am not going to waste my computer time on such a wild and
expensive goose chase. Because from the way I know the engines work, I
know that they are 'scalable': their performance at 10 ply results from
one ply being put in front of 9-ply search trees. And that extra ply will
always help. If they have good 9-ply trees, they will have even better
10-ply trees. But you don't have to take my word for it. You have the
engine, and if you don't want to believe that at 1 hour per move you will
get the same win probability as at 1 sec/move, or that at 1 hour per move
it won't beat 10 min//move, just play the games, and you will see for
yourself. It would even be appreciated if you publish the games here or on
your website. But, needless to say, one or two games won't convince anyone
of anything.

| 'since I am not a computer chess programmer, I cannot possibly 
| know what I am talking about when I dare criticize an important 
| working of your Joker80 program'
Well, you certainly make it appear that way. As, despite the elaborate
explanation I gave of why programs derive extra strength from this
technique, you still draw a conclusion that in practice was already shown
to be 100% wrong earlier. And if you think you will run into the problem
you imagine at enormously longer TC, well, very simple: don't use
Joker80, but use some other engine. You are on your own there, as I am not
specifically interested in extremely long TC. There is always a risk in
using equipment outside the range of conditions for which it was designed
and tested, and that risk is entirely yours. So better tread carefully,
and make sure you rule out the percieved dangers by concise testing.

| You must decide upon and define the primary function of your 
| Joker80 program.

I do not see the dilemma you sketch. The purpose is to play ON AVERAGE the
best possible move. If you do that, you have the best chance to win the
game. If I can achieve that through a non-deterministic algorithm better
than through a deterministic one, I go for the nondeterministic method.
That it also diversifies play, and makes me less sensitive to prepared
openings from the opponent, is a win/win situation. Not a compromise.

As I explained, it is very easy to switch this feature off. But you should
be prepared for significant loss of strength if you do that.

Derek Nalls wrote on Mon, May 26, 2008 02:54 PM UTC:

I am slightly relieved and surprised that Joker80 measurably improves the
quality of its moves as a function of time or plies completed over a range
of speed chess tournaments. Nonetheless, completing games of CRC (where a
long, close, well-played game can require more than 80 moves per player)
in 0:24 minutes - 36 minutes does NOT qualify as long or even, moderate
time controls. In the case of your longest 36-minute games, with an example total of 160 moves, that allows just 13.5 seconds per move per player. In fact, that is an extremely short time by any serious standards.

I consider 10 minutes per move a moderate time that produces results of
marginal, unreliable quality and 60-90 minutes per move a long time that
produces results of acceptable, reliable quality. Ask Reinhard Scharnagl or ET about the longest time per move they have used testing openings with their programs playing 'Unmentionable Chess'- 24 hours per move!

It is noteworthy that you are now resorting to playing dirty by using the
'exclusivist argument' that essentially 'since I am not a computer
chess programmer, I cannot possibly know what I am talking about when I
dare criticize an important working of your Joker80 program'. What you
fail to take into account is that I am a playtester with more experience
than you at truly long time controls. If you will not listen to what I am
trying to tell you, then why will you not listen to Scharnagl? After all,
he is also a computer chess programmer with a lot of knowledge in
important subject matters (such as mathematics).

You really should not be laughing. This is a serious problem. Your
sarcastic reaction does nothing to reassure my trust or confidence that
you will competently investigate it, confirm it and fix it.

Now, please do not misconstrue my remarks? My intent is not to overstate
the problem. I realize Joker80 in its present form is not a totally
random 'woodpusher'. It would not be able to win any short time control
tournaments if that were the case. In fact, I believe you when you state
that you have not experienced any problems with it but ... I think this is
strictly because you have not done any truly long time control playtesting with it.

You must decide upon and define the best primary function for your Joker80
program:

1. To pinpoint the single, very best move available from any position.
[Ideally, repeats could produce an identical move.]

2. To produce a different move from any position upon most repeats.
[At best, by randomly choosing amongst a short list of the best available
moves.]

These two objectives are mutually exclusive. It is impossible and
self-contradictory for a program to somehow accomplish both. Virtually
every AI game developer in the world except you chooses #1 as preferable
to #2 by a long shot in terms of the move quality produced on average.

If you do not even commit your AI program to TRYING to find the single
best move available because you think variety is just a whole lot more
interesting and fun, then it will be soft competition at truly long time
controls facing other quality AI programs that are frequently-sometimes
pinpointing the single, best move available and playing it against you.

H. G. Muller wrote on Mon, May 26, 2008 08:09 AM UTC:

Derek: 'I hope you can handle constructive advice.'

It gives me a big laugh, that's for sure.

Of course none of what you say is even remotely true. That is what happens
if you jump to conclusions regarding complex matters you are not
knowledgeable about, without even taking the trouble to verify your ideas.

Of course I extensively TESTED how the playing strength of Joker80, (and
all available other engines), varied as a function of time control. This
was the purpose of several elaborate time-odds tournament I conducted,
where various versions of most engines participated that had to play their
games in 36, 12, 4, 1:30, 0:40 or 0:24 min, where handicapped engines were meeting non-handicapped ones in a full round robin. (I.e. the handicaps were factors 3, 9, 24, 54 or 90, where only the strongest engines were handicapped upto the very maximum, and the weakest only participated in an unhandicapped version).

And of course Joker80 behaves similar to any Shannon-type engine that is reasonably free of bugs: its playing strength measured in Elo monotonically increases in a logarithmic fashion, approximately to the formula rating = 100*ln(time). So Joker80 at 5 min/move crushes Joker80 at 1 sec per move, as you could have easily found out for yourself. So that much for your nonsense about Joker80 failing to improve its move quality with time. For some discussion on one of the tournaments, see:

http://www.talkchess.com/forum/viewtopic.php?t=19764&postdays=0&postorder=asc&topic_view=flat&start=34

At that time Fairy-Max still had a hash-table bug that made it hang (and
subsequently forfeit on time) that was striking at a fixed rate per
second, so that Fairy-Max started to forfeit more and more games at longer
TC. Since then the bug has been identified and repaired, and now also
Fairy-Max performs progressively better at longer TC.

So nice try, but next time better save your breath for telling the surgeon
how to do his job before he will perform open heart surgery on you. Because
he has no doubt much more to learn from you regarding cardiology than I
have in the area of building Chess engines...

Things are as they are, and can become known by observation and testing.
Believing in misconceptions born out of ignorance is not really helpful.
Or, more explicitly: if you think you know how to build better Chess
engines than other people, by all means, do so. It will be fun to confront
your ideas with reality. In the mean time I will continue to build them as
I think best, (and know is best, through extensive testing), so you should have every chance to surpass them. Lacking that, you could at least _use_ the engines of others to check out if your theories of how they behave have any reality value. You don't have to depend on the time-odds tourneys and other tests I conduct. You might not even be aware of them, as the developers of Chess engines hardly ever publish the thousands of games they do for testing if their ideas work in practice.

Derek Nalls wrote on Sun, May 25, 2008 10:03 PM UTC:

The reason you have never been able find any correlation between winning
probabilities for one army and time controls [contrary to the experiences
of people using other AI programs] in asymmetrical playtests using Joker80
is that you have destructively randomized the algorithm within your program
to such an extent that it fails to measurably improve the quality of its
moves as a function of time or plies completed.  A program with serious
problems of this nature may do well in speed chess but at truly long time
controls against quality programs that improve as they should with time or
plies per move, it cannot consistently win.

I have two useful, important pieces of news for you:

1.  All of the statistical data you have generated using Joker80 (appr.
20,000+ games) is corrupt.  It must all be thrown out and started over
from scratch after you repair Joker80.

2.  All of your material values for CRC pieces are unreliable since they
are based upon and derived from #1 (corrupt statistical data).

I hope you can handle constructive advice.

H. G. Muller wrote on Sun, May 25, 2008 03:07 PM UTC:

I would have thought that 'twice the same flip in a row' was pretty
unambiguous, especially in combination with the remark about two-sided
testing. But let's not quibble about the wording.

The point was that for two-sided testing, if you suspect a coin to be
loaded, but have no idea if it is loaded to produce tail or heads, thw two
flips tell you exactly nothing. They are either the same or different, and
on an unbiased coin that would occur with equal probability. So the
'confidence' of any conclusion as to the fairness of the coin drawn from
the two flips would be only 50%. I.e. not better than totally random, you
might as well have guessed if it was fair or not without flipping it at
all. That would also have given you a 50% chance of guessing correct.

Derek Nalls wrote on Sun, May 25, 2008 02:14 PM UTC:

Well, when you said ...

'Actually the chance for twice the same flip in a row is 1/2.'

... that was vague and misleading.

I thought you meant 'heads' twice OR 'tails' twice equals a chance of
1/2 instead of the sum of 'heads' twice AND 'tails' twice equals a chance
of 1/2.

Since English is a second language to you, of course I will overlook this
minor mis-communication and even apologize for implicitly accusing you 
of incompetence.  However, you should expect that you will draw critical 
reactions from others when you have previously, falsely, explicitly
accused them of incompetence in a subject matter.

Tony Hecker wrote on Sun, May 25, 2008 01:47 PM UTC:

'Actually the chance for twice the same flip in a row is 1/2.'

H.G. is correct here.
- The probability of two heads in a row is 1/4.
- The probability of two tails in a row is 1/4.
- The probability of two same flips in a row is the sum of these two
outcomes: 1/4 + 1/4 = 1/2.

Another way to think about it:
With two coin flips, there are 4 equally likely outcomes: HH, HT, TH, TT.
In 2 of the 4 (equally likely) outcomes, the same flip result occurs twice
in a row.

H. G. Muller wrote on Sun, May 25, 2008 11:13 AM UTC:

Indeed, it is a stochastic way to simulate mobility evaluation. In the presence of other terms it should of course not be made so large that it dominates the total evaluation. Like explicit mobility terms should not dominate the evaluation. But its weight should not be set to zero either: properly weighted mobility might add more than 100 Elo to an engine.

Joker has no explicit mobility in its evaluation, and relies entirely on
this probabilistic mechanism to simulate it. The disadvantage is that,
because of the probabilistic nature, it is not 100% guaranteed to always
take the best decision. On rare occasions the single acceptable end leave
does draw a higher random bonus than one-hundred slightly better positions
in another branch. OTOH it is extremely cheap to implement, while explicit
mobility is very expensive. As a result, I might gain an extra ply in
search depth. And then it becomes superior to explicit mobility, as it
only counts tactically sound moves, rather than just every move. So it is
like safe mobility verified by a full Quiescence Search.

In my assesment, the probabilistic mobility adds more strength to Joker
than changing the Rook value by 50cP would add or subtract. This can be
easily verified by play-testing. It is possible to switch this evaluation
term off. In fact, you have to switch it on, but WinBoard does this by
default. To prevent it from being switched on, one should run WinBoard
with the command-line option /firstInitString='new'. (The default
setting is 'new\nrandom'. If Joker is running as second engine, you
will of course have to use /secondInitString='new'.)

Reinhard Scharnagl wrote on Sun, May 25, 2008 10:39 AM UTC:

Harm wrote: ... 'OTOH, a program that evaluates every position as a
completely random number starts to play quite reasonable ches, once the
search reaches 8-10 ply. Because it is biased to seek out moves that lead
to pre-horizon nodes that have the largest number of legal moves, which
usually are the positions where the strongest pieces are still in its
possession.' ...

This is nothing but a probability based heuristic simulating a mobility
evaluation component. But having a working positional evaluation,
especially when also covering mobility, that randomizing method is not
orthogonal to the calculated much more appropriate knowledge. Thus you
will overlay a much better evaluation by a disturbing noise generator. 

Nevertheless this approach might have advantages through the opening,
preventing some else working implementations of preinvestigated killer
combinations.

H. G. Muller wrote on Sun, May 25, 2008 09:14 AM UTC:

'Do not you realize that forcing Joker80 to do otherwise must reduce its
playing strength significantly from its maximum potential?'

On the contrary, it makes it stronger. The explanation is that by adding a
random value to the evaluation, branches with very many equal end leaves
have a much larger probability to have the highest random bonus amongst
them than a branch that leads to only a single end-leaf of that same
score.

The difference can be observed most dramatically when you evaluate all
positions as zero. This makes all moves totally equivalent at any search
depth. Such a program would always play the first legal move it finds, and
would spend the whole game moving its Rook back and forth between a1 and
b1, while the opponent is eating all its other pieces. OTOH, a program
that evaluates every position as a completely random number starts to play
quite reasonable ches, once the search reaches 8-10 ply. Because it is
biased to seek out moves that lead to pre-horizon nodes that have the
largest number of legal moves, which usually are the positions where the
strongest pieces are still in its possession.

It is always possible to make the random addition so small that it only
decides between moves that would otherwise have exactly equal evaluation.
But this is not optimal, as it would then prefer a move (in the root) that
could lead (after 10 ply or so) to a position of score 53 (centiPawn),
while all other choices later in the PV would lead to -250 or worse, over
a move that could lead to 20 different positions (based on later move
choices) all evaluating as 52cP. But, as the scores were just
approximations based on finite-depth search, two moves later, when it can
look ahead further, all the end-leaf scores will change from what they
were, because those nodes are now no longer end-leaves. The 53 cP might
now be 43cP because deeper search revealed it to disappoint by 10cP. But
alas, there is no choice: the alternatives in this branch might have
changed a little too, but now all range from -200 to -300. Not much help,
whe have to settle for the 43cP... 

Had it taken the root move that keeps the option open to go to any of the
20 positions of 52cP, it would now see that their scores on deeper search
would have been spread out between 32cP and 72cP, and it could now go for
the 72cP. In other words, the investment of keeping its options open
rather than greedily commit itself to going for an uncertain, only
marginally better score, typically pays off. 

To properly weight the expected pay-back of keeping options that at the
current search depth seem inferior, it must have an idea of the typical
change of a score from one search depth to the next. And match the size of
the random eval addition to that, to make sure that even sligtly (but
insignificantly) worse end-leaves still contribute to enhancing the
probability that the branch will be chosen. Playing a game in the face of
an approximate (and thus noisy) evaluation is all about contingency
planning.

As to the probability theory, you don't seem to be able to see the math
because of the formulae...

P(hh) = 0.5*0.5 = 0.25
P(tt) = 0.5*0.5 = 0.25
______________________+
P(two equal)    = 0.5

Derek Nalls wrote on Sat, May 24, 2008 02:16 PM UTC:

'... in Joker the source of indeterminism is much less subtle: it is
programmed explicitly.'

This renders Joker80 totally unsuitable for my playtesting purposes.  [I
am just relieved that you told me this bizarre fact now before I invested
large amounts of computer time and effort.]

It is critically important that any AI program attempt (to its greatest
capability) to pinpoint the single, very best possible move in the time allowed upon every move in the game even if this means that it would
often-sometimes repeat an identical move from an identical position.

Do not you realize that forcing Joker80 to do otherwise must reduce its
playing strength significantly from its maximum potential?

Derek Nalls wrote on Sat, May 24, 2008 01:39 PM UTC:

'Actually the chance for twice the same flip in a row is 1/2.'
______________________________________________________

Really?
You obviously need a lesson on probability.
Let us start with elementary stuff.

Mathematical Ideas
fifth edition
Miller & Heeren
1986

It is an old college textbook from a class I took in the mid-90's.
[Yes, I passed the class.]
______________________

It says interesting things such as-

'The relative frequency with which an outcome happens 
represents its probability.'

'In probability, each repetition of an experiment is a trial.
The possible results of each trial are outcomes.'
____________________________________________

An example of a probability experiment is 'tossing a coin'.
Each 'toss' (trial of the experiment) has only two equally-possible 
outcomes, 'heads' or 'tails' ... assuming the condition that the 
coin is fair (i.e., not loaded).

probability = p
heads = h
tails = t
number of tosses = x
addition = +
involution = ^

[This is a substitute upon a single line for superscript representation 
of an exponent to the upper right of a base.]

probability of heads = p(h)
probability of tails = p(t)

p(h) is a base.
p(t) is a base.

x is an exponent.

p(h) = 0.5
p(t) = 0.5
_________________

What follows are examples of the chances of getting the same result
upon EVERY consecutive toss.

1 time
x = 1

p(h) ^ x = 0.5 ^ 1 = 0.5
p(t) ^ x = 0.5 ^ 1 = 0.5

Note:  In this case only ...
p(h) + p(t) = 1.0

2 times
x = 2

p(h) ^ x = 0.5 ^ 2 = 0.25
p(t) ^ x = 0.5 ^ 2 = 0.25

3 times
x = 3

p(h) ^ x = 0.5 ^ 3 = 0.125
p(t) ^ x = 0.5 ^ 3 = 0.125

Etc ...
______________________

By a function that is the inverse of successive exponents of base 2,
the chance for consecutive tosses to yield the same result rapidly
becomes extremely small.

When this occurs, there are only two possibilities- 'random good-bad
luck' or an unfair advantage-disadvantage exists (i.e., 'the coin is loaded').  The sum of these two possibilities always equals 1.

random luck (good or bad) = l
unfair (advantage or disadvantage) = u

luck (heads) = l(h)
luck (tails) = l(t)

unfair (heads) = u(h)
unfair (tails) = u(t)

p(h) ^ x = l(h)
p(t) ^ x = l(t)

l(h) + u(h) = 1
l(t) + u(t) = 1

Therefore, as the chances of 'random good-bad luck' become extremely low in the example, the chances of an advantage-disadvantage existing for 'one side of the coin' or (if you follow the analogy) 'one side of the gameboard' or 'one player' or 'one set of piece values' become likewise extremely high.

Only if it can be proven that an advantage-disadvantage does not exist for one player, then can it be accepted that the extremely unlikely event by
'random good-bad luck' is indeed the case.

It is essential to understand that random good luck or random bad luck
cannot be consistently relied upon.  From this fact alone, firm
conclusions can be responsibly drawn with a strong probability of
correctness.
____________________________________________________________

1 time
x = 1

p(h) ^ x = 0.5
u(h) = 0.5

p(t) ^ x = 0.5
u(t) = 0.5

2 times
x = 2

p(h) ^ x = 0.25
u(h) = 0.75

p(t) ^ x = 0.25
u(t) = 0.75

3 times
x = 3

p(h) ^ x = 0.125
u(h) = 0.875

p(t) ^ x = 0.125
u(t) = 0.875

Etc ...

H. G. Muller wrote on Sat, May 24, 2008 09:49 AM UTC:

Derek:
| Conclusions drawn from playing at normal time controls are
| irrelevant compared to extremely-long time controls.

First, that would only be true if the conclusions would actually depend on
the TC. Which is a totally unproven conjecture on your part, and in fact
contrary to any observation made at TCs where such observations can be
made with any accuracy (because enough games can be played). This whole thing reminds me of my friend, who always claims that stones fall upward. When I then drop a stone to refute him, he jsut shrugs, and says it proves nothing because the stone is 'not big enough'. Very conveniently for him, the upward falling of stones can only be observed on stones that are too big for anyone to lift...
But the main point is of course, if you draw a conclusion that is valid
only at a TC that no one is interested in playing, what use would such a
conclusion be?

| The chance of getting the same flip (heads or tails) twice-in-a-row
| is 1/4. Not impressive but a decent beginning. Add a couple or a
| few or several consecutive same flips and it departs 'luck' by a
| huge margin.

Actually the chance for twice the same flip in a row is 1/2. Unless you
are biased as to what the outcome of the flip should be (one-sided
testing). And indeed, 10 identical flips in a row would be unlikely to
occur by luck by a large margin. But that is rather academic, because you
won't see 10 identical results in a row between the subtly different
models. You will see results like 6-4 or 7-3, which will again be very
likely to be a result of luck (as that is exactly what they are the result
of, as you would realize after 10,000 games when the result is standing at
4,628-5,372).

Calculate the number of games you need to typically get a result for a
53-47 advantage that could not just as easily have been obtained from a
50-50 chance with a little luck. You will be surprised...

| I have wondered why the performance of computer chess programs is
| unpredictable and varied even under identical controls. Despite
| their extraordinary complexity, I think of computer hardware,
| operating systems and applications (such as Joker80) as deterministic.

In most engines there alwas is some residual indeterminism, due to timing
jitter. There are critical decision points, where the engine decides if it
should do one more iteration or not (or search one more move vs aborting
the iteration). If it would take such decisions purely on internal data,
like node count, it would play 100% reproducible. But most engines use the
system clock, (to not forfeit on time if the machine is also running other
tasks), and experience the timing jitter caused by other processes
running, or rotational delays of the hard disk they had been using. In
multi-threaded programs this is even worse, as the scheduling of the
threads by the OS is unpredictable. Even the position where exactly the
program is loaded in physical memory might have an effect.

But in Joker the source of indeterminism is much less subtle: it is
programmed explicitly. Joker uses the starting time of the game as the
seed of a pseudo-random-number generator, and uses the random numbers
generated with the latter as a small addition to the evaluation, in order
to lift the degeneracy of exactly identical scores, and provide a bias for
choosing the move that leads to the widest choice of equivalent positions
later.

The non-determanism is a boon, rather than a bust, as it allows you to
play several games from an identical position, and still do a meaningful
sampling of possible games, and of the decisions that lead to their
results. If one position would always lead to the same game, with the same
result (as would occur if you were playing a simple end-game with the aid
of tablebases), it would not tell you anything about the relative strength
of the armies. It would only tell you that this particular position was won
/ drawn. But noting about the millions of other positons with the same
material on the board. And the value of the material is by definition an
average over all these positions. So with deterministic play, you would be
forced to sample the initial positions, rather than using the indeterminism
of the engine to create a representative sample of positions before
anything is decided.

| In fact, to the extent that your remarks are true, they will
| support my case if my playtesting is successful that the
| unlikelihood of achieving the same outcome (i.e., wins or
| losses for one player) is extreme.
This sentence is to complicated for me to understand. 'Your case' is
that 'the unlikelyhood of achieving the same outcome is extreme'? If the
unlikelyhood is extreme, is that the same as that the likelyhood is
extreme? Is the 'unlikelyhood to be the same' the same as the
'likelyhood to be different'? What does 'extreme' mean for a
likelyhood? Extremely low or extremely high? I wonder if anything is
claimed here at all...

I think you make a mistake by seeing me as a low-quality advocate. I only
advocate minimum quantity to not make the results inconclusive.
Unfortunately, that is high, despite my best efforts to make it as low as
possible through asymmetric playtesting and playing material imbalances in
pairs (e.g. 2 Chancellors agains two Archbisops, rather than one vs one).
And that minimum quantity puts limits to the maximum quality that I can
afford with my limited means. So it would be more accurate to describe me
as a minimum-(significant)-quantity, maximum-(affordable)-quality
advocate...

Derek Nalls wrote on Fri, May 23, 2008 10:22 PM UTC:

'If the result would be different from playing at a a more 'normal' TC,
like one or two hours per game, it would only mean that any conclusions
you draw on them would be irrelevant for playing Chess at normal TC.'

Conclusions drawn from playing at normal time controls are irrelevant
compared to extremely-long time controls. It is desirable to see what
secrets can be discovered from a rarely viewed vantage of extremely
well-played games. Are not you interested at all to analyze move-by-move
games played better than almost any pair of human players are capable?

You do not seem to understand that I, too, am discontent with the
probability of a small number of wins or losses in a row. This is a
compensation that reduces the chance that the games were randomly
played to the greatest extent attainable and consequently, the winner
or loser randomly determined.
_____________________________

'... playing 2 games will be like flipping a coin.'

Correction-

Playing 1 game will be like flipping a coin ... once.
Playing 2 games will be like flipping a coin ... twice.

The chance of getting the same flip (heads or tails) twice-in-a-row is
1/4. Not impressive but a decent beginning. Add a couple or a few or several consecutive same flips and it departs 'luck' by a huge margin.
_______________________________________________________________

'The result, whatever it is, will not prove anything, as it would be
different if you would repeat the test. Experiments that do not give a
fixed outcome will tell you nothing, unless you conduct enough of them to
get a good impression on the probability for each outcome to occur.'

I have wondered why the performance of computer chess programs is
unpredictable and varied even under identical controls. Despite their
extraordinary complexity, I think of computer hardware, operating systems
and applications (such as Joker80) as deterministic.

The details of the differences in outcomes do not concern me. In fact,
to the extent that your remarks are true, they will support my case if my
playtesting is successful that the unlikelihood of achieving the same
outcome (i.e., wins or losses for one player) is extreme.

I am pleased to report that I estimate it will be possible, over time, to
generate enough experiments using Joker80 to have meaning for a
high-quality, low-quantity advocate (such as myself) and even a
moderate-quality, moderate-quantity advocate (such as Scharnagl). As for
a low-quality, high-quantity advocate (such as you), you will always be
disappointed as you are impossible to please.

Derek Nalls wrote on Fri, May 23, 2008 09:38 PM UTC:

I have recently been sufficiently convinced via asymmetrical playtesting
(still underway) that the 2 rooks : 1 queen advantage in material values
is appr. the same in CRC as in FRC.  [I used to think it was higher in
CRC.] Consequently, I revised my model (again) and my CRC piece values:

universal calculation of piece values
http://www.symmetryperfect.com/shots/calc.pdf

CRC
material values of pieces
http://www.symmetryperfect.com/shots/values-capa.pdf

FRC
material values of pieces
http://www.symmetryperfect.com/shots/values-chess.pdf

This change was implemented by raising the value of the queen in CRC- not
by lowering the value of the rook.

revised Joker80 values
Nalls standard CRC model
P85=268=307=518=818=835=950

H. G. Muller wrote on Fri, May 23, 2008 09:36 AM UTC:

Derek Nalls:
| This might require very deep runs of moves with a completion time 
| of a few weeks to a few months per pair of games to achieve 
| conclusive results.

It still escapes me what you hope to prove by playing at such an
excessively long Time Control. If the result would be different from
playing at a a more 'normal' TC,  like one or two hours per game, (which
IMO will not be the case), it would only mean that any conclusions you draw
on them would be irrelevant for playing Chess at normal TC.

Furthermore, playing 2 games will be like flipping a coin. The result,
whatever it is, will not prove anything, as it would be different if you
would repeat the test. Experiments that do not give a fixed outcome will
tell you nothing, unless you conduct enough of them to get a good
impression on the probability for each outcome to occur.

H. G. Muller wrote on Fri, May 23, 2008 08:16 AM UTC:

'Because of all this, I suggest evaluating entire configuration of
pieces,
rather than a single piece.'

This is exactly what Chess engines do. But it is a subject that transcends
piece values. Material evaluation is supposed to answer the question:
'what combination of pieces would you rather have, without knowing where
they stand on the board'. Piece values are an attempt to approximate the
material evaluation as a simple sum of the value of the individual pieces,
making up the army.

It turns out that material evaluation is by far the largest component of
the total evaluation of a Chess position. And this material evaluation
again can be closely approximated by a sum of piece values. The most
well-known exception is the Bishop pair: having two Bishops is worth about
half a Pawn more than double the value of a single Bishop. Other
non-additive terms are those that make the Bishop and Rook value dependent
on the number of Pawns present. To account for such effects some engines
(e.g. Rybka) have tabulated the total value of all possible combinations
of material (ignoring promotions) in a 'material table'. Such tables can
then also account for the material component of the evaluation that gives
the deviation from the sum of piece values due to cooperative effects
between the various pieces.

Useful as this may be, it remains true that piece values are by far the
largest contribution to the total evaluation. The only positional terms
that can compete with it are passed pawns (a Pawn on 7th rank is worth
nearly 2.5 normal Pawns) and King Safety (having a completely exposed King
in the middle game, when the opponent still has a Queen or similar
super-piece, can be worth nearly a Rook).

Rich Hutnik wrote on Fri, May 23, 2008 01:56 AM UTC:

Perhaps we need to look back to exactly why we need piece values.  Is it to
balance different armies, or just because people are curious?  Is the
objective to turn Chess Variants into a single balanced game, or something
else?  Maybe need to think of the reason for the discussion, so then you
can perhaps find a way to cut the Gordian knot instead of trying to
untangle it.

Derek Nalls wrote on Fri, May 23, 2008 12:47 AM UTC:

Originally, I planned two 'internal playtests'.  [By this self-invented
term I mean playtests of the standard model of a person against a special
model that I have compelling reasons to think may be superior by a
provable margin.]

The first planned test involves the standard CRC model of Muller against a
special CRC model with a higher, closer-to-conventional rook value.  Upon
closer examination, I suspected that the discrepancy was possibly too
small to be detected even with very long time controls.  So, I announced
that this test was cancelled.

Notwithstanding, I may change my mind and return to this unsolved mystery
if Joker80 demonstrates unusually-high aptitude as a playtesting tool. 
This might require very deep runs of moves with a completion time of a few
weeks to a few months per pair of games to achieve conclusive results.

The second planned test involves the standard CRC model of Scharnagl
against a special CRC model with a higher, unconventional archbishop
value.

Scharnagl currently assigns the archbishop with a material value of appr.
77% that of the chancellor in his standard CRC model.

Muller currently assigns the archbishop with a material value of greater
than 97% that of the chancellor in his standard CRC model.

Nalls currently assigns the archbishop with a material value of lesser
than 98% that of the chancellor in his standard CRC model.

I devised a special CRC model using identical material values for every
piece in the standard CRC model by Scharnagl except that it assigns the
archbishop with a material value of exactly 95% that of the chancellor
(18% or 1.65 pawns higher).  [Note that this figure is slightly more
moderate than those by Muller & Nalls.]  A discrepancy this large should
be detectable at short-moderate time controls.  This test is now
underway.

If either of these tests are successful at establishing or implicating a
probability that the special models play stronger than the standard
models, then revisions to the standard models may occur.  At that
juncture, we would be ready to begin 'external playtests'.  [By this
self-invented term I mean playtests of the standard models of different
persons against one another.]

Gary Gifford wrote on Thu, May 22, 2008 09:37 PM UTC:

Rich suggested '...evaluating entire configuration of pieces, rather than a single piece.'

I believe that is correct [that is what programs like Fritz and Chess Master seem to do... evaluating the two configurations and giving a score for the deviation] but also I would say, evaluate the pieces within the given position. The values are relative and change with every move.

The lowly pawn about to queen is a fine example. The Knight that attacks 8 spaces compared to one that attacks 4 is another, as is the 'bad' [blockaded] Bishop.

Another concept is that of brain power. For example, the late Bobby Fischer's Knights would be much more powerful than mine... not in potential, but in reality of games played. Pieces have potential, but the amount of creative power behind them is an important factor.

Rich Hutnik wrote on Thu, May 22, 2008 09:24 PM UTC:

It seems like a normal FIDE pawn, but by simply shifting all the pawns up
one row, the value of all them changes.  In other words, their value is
dependent upon their proximity to other pawns.  In light of this, are
pieces worth the same in every configuration of Chess960?

This issue is more complicated than it appears.  Take Near vs Normal
Chess, for example.  Which side has an advantage?  The Near side moves
everything up one row, but drops castling, but has a back row to either
drop the king back or mobilize the rooks.  And, against this, Near can En
Passant the pawns of Normal, but Normal can't do the same to Near.

Because of all this, I suggest evaluating entire configuration of pieces,
rather than a single piece.

H. G. Muller wrote on Thu, May 22, 2008 07:05 PM UTC:

'Let me provide another challenge for people here regarding pawns.  How
much is a pawn that moves only one space forward (not initial 2) but
starts on the third row instead of second worth in contrast to a normal
chess pawn?  How much is it worth alone, and then in a line of pawns that
start on the third row?'

But this is a totally normal FIDE Pawn...

It would get a pretty large positional penalty if it was alone
(isolated-pawn penalty). In a complete line of pawns on the 3rd rank it
would be worth a lot more, as it would not be isolated, and not be
backward. All in all it would be fairly similar to having a line of Pawns
on second rank, as the bonus for pushing the Pawns forward 1 square is
approximately cancelled by not having Pawn control anymore over any of the
squares on the 3rd rank.

Rich Hutnik wrote on Thu, May 22, 2008 05:28 PM UTC:

I believe the value of a piece should relate to its mobility first and
foremost.  If one were to end up rating a piece, come up with a value of 1
for the most pathetic potential piece in the game, and then adjust
accordingly.  How about a pawn that starts out on the second space and
only moves backwards one as its move and doesn't capture?  That pawn has
a value of one.  How much more is an Asian chess pawn that moves only one
space forward, and doesn't promote worth in contrast?

To base it on a normal chess pawn is to not provide a full solution for
the variant community.

Let me provide another challenge for people here regarding pawns.  How
much is a pawn that moves only one space forward (not initial 2) but
starts on the third row instead of second worth in contrast to a normal
chess pawn?  How much is it worth alone, and then in a line of pawns that
start on the third row?

H. G. Muller wrote on Thu, May 22, 2008 08:13 AM UTC:

'Do you think these piece values will work smoothly with Joker80 running
under Winboard F yet remain true to all three models?'

Yes, I think these values will not conflict in anyway with any of the
hard-wired value approximates that are used for pruning decisions. At
least not to the point where it would lead to any observable effect on
playing strength. (Prunings based on the piece values occur only close to
the leaves, and engines are usually quite insensitive as to how exactly
you prune there.)

H. G. Muller wrote on Thu, May 22, 2008 08:07 AM UTC:

'I cannot speak for Reinhard Scharnagl at all, though.'

This is exactly the problem. 'base value' for Pawns is a very
ill-defined concept, as it is the smallest of all piece base values, while
the positional terms regarding to Pawns are usually the largest of all
positional terms. And the whole issue of pawn-structure evaluation in
Joker is so complex that I am not even sure if the average of positional
terms (over all pawns and over a typical game) is positive or negative.
Pawns get penalties for being doubled, or having no Pawns next or behind
them on neigboring files. They get points for advancing, but they get
penalties for creating squares that no longer can be defended by any Pawn.
My guess is that in general, the positional terms are slightly positive,
even for non-passers not involved in King Safety.

A statement like 'a Knight is worth exactly 3 Pawns' is only meaningful
after exactly specifying which kind of pawn. If the Scharnagl model
evaluates all non-passers exactly the same (except, perhaps, edge Pawns),
then the question still arises how to most-closely approximate that in
Joker80, which doesn't. And simply setting the Joker80 base value equal
to the single value of the Scharnagle model is very unlikely to do it. 

Good differentiation in Pawn evaluation is likely to impact play strength
much more than the relative value of Pawns and Pieces, as Pawns are traded
for other Pawns (or such trades are declined by pushing the Pawn and
locking the chains) much more often than they can be traded for Pieces.

25 comments displayed

⇩Latest ⇩Later ⇩Reverse Order⇧ Earlier⇩ Earliest⇧

Permalink to the exact comments currently displayed.