Single Comment

Wikipedia link re: Margin of error (may be relavent to piece value studies)[Subject Thread] [Add Response]

Kevin Pacey wrote on Sat, Jan 14, 2017 07:48 PM UTC:

Not attempting to harp on the subject of computer or statistical studies of piece values, my thoughts on margin of error in these cases has been the same for a long time now. They're threefold: firstly, I'm thinking it's possible in such studies margin of error might have been estimated as at best half of what it should be. That is, say piece X is assumed to be superior to piece Y, then it's superiority might be thought to be manifested as 50%+superiority%+margin of error[assumed no greater than 100/2 or 50]% out of 100% total of n games in a sample. In calculating the margin of error, I think it should be double that [i.e. no greater than 100%], since in THEORY (however unlikely it seems) there could be a sample where piece X scores less than 50%. This is unlikely (though not impossible, given sufficiently weak players or a weak computer program) if X is a rook and Y is a bishop, but suppose X is a knight instead. Another possible problem in estimating the margin of error in such studies is that if one uses a pawn as a kind of standard candle, a pawn is a much greater fraction of a minor piece (e.g. bishop or knight, i.e. about 1/3rd of either of these) than it is a fraction of a senior major piece (e.g. archbishop, chancellor or queen, i.e. a pawn is worth roughly 1/9th [or more] of any of these), which may deserve to be taken into consideration when calculating any sort of margin.

[edit: If the above does deserve to be taken into consideration, after one calculates any 'initial' margin of error for a study, however one approves of doing it, I can suggest it be multiplied by a 'Fudge Factor' to reach a final margin of error. This Fudge Factor could be (as a crude example guess of mine) = ([Assumed total value {in pawns} of the assumed superior or equivalent piece[s] being measured] Squared) Divided by (Assumed total value {in pawns} of the assumed inferior or equivalent piece[s] being measured + 1). Now for example cases: if one side has an extra pawn, Fudge Factor = (1x1)/(0+1) = 1. If one side has a bishop for a knight, Fudge Factor = (3x3)/(3+1) = 9/4. If one side has a rook for 5 pawns, Fudge Factor = (5x5)/(5+1) = 25/6. If one side has a queen and the other side has an Archbishop (or Chancellor), if we say for the sake of argument that they're equivalent then Fudge Factor = (9x9)/(9+1) = 81/10. If one side has two bishops and the other has a knight and bishop, Fudge Factor = (6x6)/(6+1) = 36/7. If one side has 3 queens and the other side has 7 knights (which should actually beat the 3 queens [which are superior on paper in value], with no pawns involved anyway) then Fudge Factor = (9x3x9x3)/(3x7+1) = 27x27/22, i.e. very large. In coming up with Fudge Factor, I tried initially to take into account the total value of each side's Army (not counting kings) for a given setup. That is, the setup being studied in order to measure a [sub-]set of piece[s]. However, this complicated my attempts at finding a plausibly suitable formula (IMHO) too much, in spite of it seeming otherwise very desirable to take the value of the Armies into account somehow.]

Also, I still believe strength of the playing sides (even if they are one and the same player, such as a computer program) can significantly affect the results of such studies (enlarging the margin of error, to put it one way). A link I gave elsewhere notes that knight odds are compensated for by a difference of 600 rating points in chess, so even a pawn difference can be less significant in games between weaker players or computers than in games between stronger players. An analogy I'd make is that if you let kids play games in a sandbox, you'd be lucky if you'd see a somewhat competently designed sand castle at some point, while if a master sculptor played in a sandbox, we'd receive masterpieces that made the best use of the material available.

To sum up my position as it stands now, I believe we'd have piece values from such studies that could be trusted with a high degree of confidence (at least by myself) if margin of error is convincingly accurate and (more importantly, perhaps) computer programs with (widely accepted) very high chess ratings were used as the playing sides in such studies (which are intended, at least for now, for chess and rather chess-like games).