[ Help | Earliest Comments | Latest Comments ]

[ List All Subjects of Discussion | Create New Subject of Discussion ]

[ List Earliest Comments Only For Pages | Games | Rated Pages | Rated Games | Subjects of Discussion ]

Comments/Ratings for a Single Item

~~Earlier~~ ⇧Reverse Order⇩ Later⇧ Latest⇩

Including Piece Values on Rules Pages[Subject Thread] [Add Response]

Kevin Pacey wrote on Fri, Mar 8 04:33 PM UTC:

Re: Including Piece Values on Rules Pages:

Is this something that editorial staff really feel should be done for a well-completed Rules Page (say done in the Notes Section)?

In my latest batch of games I've put up for Review I've left them out (I may insert later, after possible publication). That is because Dr. Muller is currently on editorial staff, and he often vehemently disagrees with certain Piece Values I might give. In spite of my continued doubts about certain aspects of computer studies, for example. Ultimately justified in the course of time or not.

As a result I do not feel I have full freedom to offer piece values (at least for the sake of my peace of mind), even though most people know to take anyone's offered Piece Values with a grain of salt. What should be done? Assume H.G.'s values are infallible, and wait for the next study of his if he has yet to offer piece values for 12x12 boards with given armies, for example?

Many people may be content not to offer piece values, simply because they want to keep their own a secret, as it may affect their chances when playing a given CV (or because it may take a lot of calculations, they may feel). Personally, I don't think it matters too often that way.

One possibility I thought of today is perhaps to have a separate section added (to rules pages!?) for just piece values offered, if any. Another idea is to rate separately someone's offered piece values. The truth is not a democracy, you might say? Well governments sometimes consult experts on matters, but go ahead and put them up for vote in democracies, anyway. Even life and death matters, such as euthanasia and abortion, as we have seen in places around the world. Short of divine intervention/retribution, perhaps, such decisions by democracies are final, subject to future governments or changes of constitutions.

H. G. Muller wrote on Fri, Mar 8 06:00 PM UTC in reply to Kevin Pacey from 04:33 PM:

I am in general against spreading misinformation; there already is so much of that on the internet, no one is longing for us to add more to it. The problem with posted piece values is that they are very often not rooted in reality, but spring purely from the imagination of the author. I don't see what value that would add to an article. Every reader should be able to make unfounded guesses without any help. And it is especially bad if people post values that contradict a large body of evidence.

If I were to edit the article on orthodox Chess, claiming that the piece values are P=1, minor=2, R=4 and Q=8, based on the theory that piece values are inversely proportional to the number of those pieces you have in a game... Should it be allowed to stand? Should it go accompanied by an explanation of how these were calculated? Or by a disclaimer like "virtually everyone in the world agrees that these values are very much off, but I present these here anyway for no other reason that I could calculate them through a method that did not require any experimental evidence"?

My brother always says: "if you don't have anything to say, then don't do that here!". Information should flow from where there is knowledge to where there is none. It is only useful to publish information that is better / more reliable than what the reader already has. Sadly, for piece values that will usually not be the case.

Kevin Pacey wrote on Fri, Mar 8 06:26 PM UTC in reply to H. G. Muller from 06:00 PM:

The main thrust of that gives no value to human intuition when there is not yet consensus or quite conclusive enough evidence (in the eyes of the beholder). Indeed, I personally think you sometimes rely on intuition when it comes to your methods for establishing or estimating piece values - that could arguably lead to misinformation unknowingly, despite your best intentions to be thorough.

🕸Fergus Duniho wrote on Fri, Mar 8 09:28 PM UTC in reply to Kevin Pacey from 04:33 PM:

I think it is easier to determine relative values than to determine precise, absolute values. Piece values are only a guide for evaluating positions, and they do not themselves determine who wins or loses. So, if you stick to relative values, you should be good, but when you attempt to give precise, numeric values to pieces, that will be far more speculative and prone to error.

Kevin Pacey wrote on Sat, Mar 9 01:26 AM UTC in reply to Fergus Duniho from Fri Mar 8 09:28 PM:

It's easier, yes. A problem can surface if ever in a game you have the choice of making 2 for 1 or 3 for 1... trades. Then, for example, in chess it would not help you that you knew that P<N<=B<R<Q, if you want to know with some degree of confidence (or at least an intuitive feeling) that N+P is normally close to worth a R, or whether it's N+2P that is is normally much closer to worth a R - that is with all other affected features of the position at hand being in some kind of balance after making such a trade.

In fact it's usually N+2P, maybe during any phase of a game (if that is also to be taken into account). A (more advanced/different?) tip I've read is that (if I recall right) B+2P are usually worth R, and N+2P are a shade less than a R. Already H.G. might argue that computer studies (not just his) put single N = single B in (8x8) chess - however he might say that things are different for such (3 for 1) trades, because more units are involved, so no loss of face for anyone necessarily in such a case, for those who intrepidly try to assign (or offer to fine-tune) fairly precise piece values. In my case, for chess variants rules pages I've made, I add the caveat that my suggested values are tentative, hopefully wakening any adult who still has a child-like faith in the written word (I'd personally make an exception for a given version of bible, but perhaps even then mistranslations might have happened in some cases).

🕸Fergus Duniho wrote on Sat, Mar 9 02:15 AM UTC in reply to Kevin Pacey from 01:26 AM:

I add the caveat that my suggested values are tentative

That should be fine. I usually ignore estimates of piece values anyway, because I don't expect them to be gospel truth, and when playing a game, I rely more on my own ability to understand and compare different pieces. But piece values could be of more interest to someone who is trying to get a program to play a game better. As someone who is programming engines to play Chess variants well, it makes sense that HG would have a keen interest in this. I recall when Steve Evans and I were working on a better Shogi ZRF together, and one thing we did, which I think was more his idea than mine, was to add code to adjust the values Zillions-of-Games assigned to different pieces.

Aurelian Florea wrote on Sat, Mar 9 06:18 AM UTC in reply to H. G. Muller from Fri Mar 8 06:00 PM:

Agreed! But I am all for publishing piece values obtained through applying your experimental method. That and other tactical or strategic tips the author has found. For example I have observed that it is wrong to move a joker, in all my apothecary games, if there are pawns still ahead because they can move forward attacking the joker while the poor sucker cannot run, as it imitates a pawn. Conversely if there are no pawns ahead moving the joker to the center can be very fruitful as it can imitate anything making it temporarily the most powerful piece on the board.

H. G. Muller wrote on Sat, Mar 9 07:40 AM UTC in reply to Kevin Pacey from 01:26 AM:

The problem with intuition is that it is notoriously unreliable. Humans suffer from an effect called 'observational bias', because one tends to remember the exceptional better than the common. This is probably the reason that GMs/world champions have grossly overestimated the tactical value of a King (as ~4 Pawns): in games where the King plays an important role it can indeed be very strong, but there are plenty of cases where a King is of no use at all (because it cannot catch up with a passed Pawn). These tend to be dismissed, as "the King played no role here, so we could not see how stong it really is". While in fact you could see how weak it was by its lack of ability to play a role. In practice two non-royal Kings are conclusively defeated by the Bishop pair, (in combination with balanced other material, and in particular sufficiently many Pawns). In games between computer programs that most humans could not beat at all. Of course none of these GMs ever played such a game even once.

Other forms of intuition often result from application of simplistic logic, rather than observation. It is 'intuitively obvious' that a BN is worth several Pawns less than RN, as B is worth several Pawns less than R, and it is their only difference. Alas, it is not true. They are almost equivalent. It ignores the effect that some moves can cooperate better than others, and in games BN + Pawn would score convincingly better than RN (and on average even beat Q).

We should also keep in mind that piece values are just an approximation. It is not a law of nature that the strength of an army can be obtained by adding a value of individual pieces, and that the win probability can be calculated from the difference between the thus obtained army strength. And indeed, closer study shows that it is not true at all. The win probability depends on how well pieces in the army cooperate, and complement each other, and how effective they are against what the opponent has.

For example, A=BN and C=RN are more effective against a Queen than against a combination of lighter material (say R+N+2P) that in itself would perfectly balance a Queen. Because all squares attacked by the latter, even though very similar to the number of squares attacked by a single Q, are no-go areas for a C or A, even when they are protected, while they would not have to shy away from a Q attack in similar situations. This causes Q+C+A < R+B+C+A, in Capablanca Chess, even though Q > R+B as usual. The extra C and A on the Q side are effectively weaker pieces than their counterparts on the R+B side, so much that it reverses the advantage. An extreme manifestation of this effect is that 7 Knights easily beat 3 Queens on an 8x8 board. Something that cannot be explained by any value for N/Q that would make sense in a context with more mixed FIDE material.

Your claim that B+2P ~ R and N+2P < R, which I don't doubt, cannot be used to conclude that B > N because of these subtleties. Piece values are not defined as how well the pieces do against a Rook, but by how well they do against a mix of opponent pieces such as these typically occur in end-games. And I have no doubt that the average performance of the Bishop suffers from the fact that there are many cases where B+2P ~ B+P, while N+2P would have done much better (namely when the Bishops are on unlike shades).

Note that the claim lone B ~ N was not based on what I would call a 'computer study'. I have no doubt a computer was used in the process, but just as an aid for quickly searching a huge database of human GM games. Not by playing computers against each other. The fact that a computer was used thus in no way had any effect on the conclusion. In the Kaufman study the claim was detailed further by stating that the B-N difference correlated with the number of Pawns, and exact equality only occurred when each player had about 5 Pawns; for fewer Pawns the Bishop performed better, for more Pawns the Knight. It is also common knowledge that Knights typically perform poorer in end-games where there are Pawns on different wings than when all Pawns are close together. This is of course also something that transcends piece values, which are defined as the best estimate for the chances without knowing the location of the pieces. Piece values are not the only terms that contribute to the heuristic evaluation of individual positions.

But to come back to the main topic: I don't think it would be a good idea to dismis any form of a quality standard on published piece values because "people should know that they should not believe what they read". That is an argument that could be used for publishing any form of fake news. It is already bad enough that this is the case, and we should not make it even more true by adding to the nonsense. There can also be piece values that have a more solid basis, and I think readers should have the right to distinguish the one from the other. So as far as I am concerned people can publish anything, as long as they clearly state how they arrived at those values. Like "personal experience based on N games I played with these pieces" or "based on counting their average number of moves on an NxN board" or whatever. If there is a non-trivial calculation scheme involved, it is fine to publish that as a separate article, and then refer to that.

Kevin Pacey wrote on Sat, Mar 9 03:29 PM UTC in reply to H. G. Muller from 07:40 AM:

One problem with computer studies of chess [variants] is that there has been no peer review by many mathematicians, and grandmasters of chess might be thrown in. For the scientific method to work in a trustworthy way, at least according to the high priests of science etc., you need that.

There are things already about computer studies that give me red flags personally (although I am no scientist/math wizard). The claimed margin of error could be wrong, for one thing. The armies or initial position chosen for each side of a given study could make a hugely underestimated difference. The (2300 FIDE at best!?) engine(s) used have been relatively weak so far, as far as I know - chess endgames take 2700+ human opponent players to play optimally sometimes.

I'm not sure why I should not believe such computer studies in general should be just dismissed as a pile of rubbish, if people more knowledgeable were to insist on rigourous proof for studies being correct at this point in time, if you want to play hardball about publishing standards. Such standards are reserved for scientific journals in the real world anyway, not for hobbyists who do not have (much, if any) money or life and death issues at stake.

More specifically for myself, I already balk at the idea Amazon only =Q+N in value, even on 8x8. As a chess master with the memory of a number of chess world champions' and grandmasters' views, I do not trust that single B merely = N exactly on 8x8 on average. As for Archbishop almost = Chancellor, a bit hard to trust, but that is more alien to my intuition. They both cover 16 nearby cells in a radius of 2 cells, I give you that.

[edit: if you really want, to please/amuse you and others I could always [not just sometimes] put calculations I use for my tentative estimated piece values on Rules Pages Notes - I look at the answers I get and see if my intuition agrees (so far it has, pretty much). I have yet to do such calculations for my most recent large batch of Rules Pages. For what it's worth, sometimes I also borrow some of your rules of thumb, where I lack my own formulae.]

Kevin Pacey wrote on Sat, Mar 9 05:27 PM UTC in reply to Kevin Pacey from 03:29 PM:

I've edited my previous post, for any who missed that.

H. G. Muller wrote on Sat, Mar 9 06:12 PM UTC in reply to Kevin Pacey from 03:29 PM:

Well, from what you say it appears that with 'computer study' you mean statistical data from games that computers played against each other (or themselves). As I would. But as I said, the B = N observation came from the Kaufman study, which was nothing of the sort. He just filtered positions with a B-vs-N imbalance from a huge database of human GM games, selecting those that were the imbalance was stable for some number of moves (to weed out tactics in progress), and counted the number of wins, draws and losses in which these games ended. Which apparently was a 50% score.

It doesn't sound like rocket science to me, but I suppose a complete idiot could bungle even the most simple tasks. And I have met other chess-engine programmers that have done similar things for themselves. (The Kaufman study did not publish more specific things, like how the N or B would do against Rooks, or whether the difference also correlates with the presence of other pieces than Pawns, and some programmers want to make their engines aware of that too, and put a complete table of every conceivable material compustion in their engine.) And they never told me they had proven Kaufman wrong.

The problem is that implying someone is a bungling idiot that even cannot do the simplest thing right, or a fraud who intentionally publishes falsehoods, is a pretty heavy accusation. Most people would hesitate to make such an accusation without having very solid evidence that the published results were indeed wrong. "It was not checked by anyone, so it must be wrong" is not really a valid line of reasoning.

You seem to have a wrong impression of the peer-review system. The 'peers' that are asked to referee a scientific publication will NOT redo the reported work. They only judge whether the described method according to which the results were obtained is a proper procedure. If the claims are in contradiction with earlier results the referees have a hard time. They would at the very least insist that the authors of the new manuscript give an explanation for why their method would be more reliable than what people previously did, and even then they stand a large probability of rejection if that doesn't convince the referees. In a sense everyone is a peer on the internet, and could have contested what others publish there, in particular the Kaufman results. But it didn't happen, and that means much more than when he would just had to fool one or two referees. And there isn't really any need for mathematicians, people that know how to count seem sufficient. You are aware that Larry Kaufman is a GM himself?

I don't really understand your third paragraph, but I am intrigued by the term "more knowledgeable". True knowledge should of course never be dismissed. But what knowledge are you talking about, here?

I agree the Amazon result is suspect; it was only based on a couple of hundred games where the Queen and a Knight where replaced by an Amazon, and the baseline pieces were shuffled to provide more game diversity. That is a very different story than GMs not being able to convert a B-N 'advantage' into a better result in a few thousand games. The remarkable thing about computer games is that it doesn't seem to matter much what the level of play is. Errors tend to cancel out, when both players make them. Even random movers systematically win more games when you give them stronger material. (Although quantitatively they don't make the most of that, as they too easily give the strong material away.)

Rather than describing the calculations in a large number of articles, which is likely to lead to a lot of duplication, you could make a separate article of it. That could lead to a more coherent presentation, and the other pages could then just refer to that.

Kevin Pacey wrote on Sat, Mar 9 06:34 PM UTC in reply to H. G. Muller from 06:12 PM:

By more knowledgeable (than me), I mean someone who might have better qualifications for evaluating the sorts of computer studies done both by yourself and Kaufman (his is, naturally, a different type of computer study, if I may call it that). That is, someone who is a mathematician and/or a chess grandmaster, neither of which I am. Even then, Kaufman may qualify, especially if he is the former besides being a GM - however other people with equally good qualifications may disagree. A body of such people, who are known to be interested/paid (to make the evaluations of the studies/results), really, would be needed to build a consensus. I've read online somewhere long ago that some GM tried to explain the result Kaufman got, maybe unconvincingly.

Most of my calculations for estimating piece values are quite short and simple, even if highly suspect to at least some readers. An article on my assortment of quick and dirty methods of calculation (that try to presently provide for a big range of pieces and board sizes/shapes) would not cover all the piece types that are possible, I suppose, and also I have never done a CVP article/item of that sort (perhaps you could for your computer studies method, too, if it would not be too lengthy).

Kevin Pacey wrote on Sat, Mar 9 07:31 PM UTC in reply to Kevin Pacey from 06:34 PM:

I've edited my last post a bit, for any who missed it.

Kevin Pacey wrote on Sat, Mar 9 09:48 PM UTC in reply to H. G. Muller from 06:12 PM:

Re: "It was not checked by anyone, so it must be wrong" is not really a valid line of reasoning"... (H.G. wrote)

Something being not yet worthy of trust is a shade different than being said to be wrong (or proven to be).

In 'Secrets of Practical Chess', GM Dr.[of math] John Nunn wrote of little- or un-tested sequences of play (in over-the-board games of strong players) in the opening phase of a game, that are recommended by chess authors, not to trust them to be in your chess opening repertoire (especially if you must rely on just such sequence(s) to keep your repertoire from going under, I'd add). Meaning, I suppose, treat them like rubbish until proven otherwise. Or, let someone else be the Guinea Pig - probably the advice was especially meant for players well below GM level.

That was back in the 1990s, when commercially available computer engines were mostly still relatively weak, though. Nowadays maybe you can count on what you come up with at home using a chess engine as to be virtually golden.

H. G. Muller wrote on Sun, Mar 10 08:07 AM UTC in reply to Kevin Pacey from Sat Mar 9 09:48 PM:

I am not sure what you try to demonstrate with this example. Obviously something that has never been tested in any way, but just pulled out of the hat of the one who suggests it, should be considered of questionable value, and should be accompanied by a warning. As Dr. Nunn does, for untested opening lines. And as I do, for untested piece values. That is an entirely different situation than mistrusting someone who reports results of an elaborate investigation, just because he is the only one so far that has done such an investigation. It is the difference of someone being a murder suspect merely because he has no alibi, or having an eye witness that testifies under oath he saw him do it. That seems a pretty big difference. And we are talking here about publication of results that are in principle verifiable, as they were accompanied by a description of the method obtaining them, which others could repeat. That is like a murder in front of an audience, where you so far only had one of the spectators testify. I don't think that the police in that case would postpone the arrest until other witnesses were located and interviewed. But they would not arrest all the people that have no alibi.

And piece values are a lot like opening lines. It is trivial to propose them, as an educated guess, but completely non-obvious what would be the result of actually playing that opening line or using these piece values to guide your play. It is important to know if they are merely proposed as a possibilty, or whether evidence of any kind has been collected that they actually work.

Kevin Pacey wrote on Sun, Mar 10 03:26 PM UTC in reply to H. G. Muller from 08:07 AM:

Your second paragraph may be bang on, except it could be a circular argument to say a body of evidence has been found by chess studies, yet it is the methodology of those very studies that might be viewed as unproven.

H. G. Muller wrote on Sun, Mar 10 04:13 PM UTC in reply to Kevin Pacey from 03:26 PM:

Sure, methods can be wrong, and therefore have to be validated as well. This holds more for true computer studies using engines, than for selecting positions from a game database and counting those. The claim that piece A doesn't have a larger value B if good players cannot beat each other when they have A instead of B more often than not is not really a method. It is the definition of value. So counting the number of wins is by definition a good method. The only thing that might require validation is whether the person having applied this method is able to count. But there is a point where healthy skepsis becomes paranoia, and this seems far over the edge.

Extracting similar statistics from computer-generated games has much larger potential for being in error. Is the level of play good enough to produce realistic games? How sensitive are the result statistics to misconceptions that the engines might have had? It would be expected of someone publishing results from a new method to have investigated those issues. And the method applied to the orthodox pieces should of course reproduce the classical values.

For the self-play method to derive empirical piece values I have of course investigated all that before I started to trust any results. I played games with Pawn odds at many different time controls, as well as with some selected imbalances (such as BB-vs-NN) to see if the number of excess wins was the same fraction of the Pawn-odds advantage. (It was.) And whether the results for a B-N imbalance were different for using an engine that thought B>N as for using one that thought N>B. (They weren't.)

New methods don't become valid because more people apply them; if they all do the same wrong thing they will all confirm each other's faulty results. You validate them by recognizing their potential for error, and then test whether they suffer from this.

Kevin Pacey wrote on Sun, Mar 10 06:49 PM UTC in reply to H. G. Muller from 04:13 PM:

I guess I have to get into the specifics I personally still don't trust about computer studies, again.

First, Kaufman's type of study: saying that B=N based on large number of games stats (I only vaguely recall, but many of the players in his database may have been sub-grandmaster level - GMs are relative adults compared to 2300 players playing wargames in a sandbox vs. each other). If you want to establish the absolute truth of if B=N, solving chess from the setup and then doing some sort of a database wins/losses count for [near-]'perfect' play would be best, but that is impossible on earth right now (perfect play, if it does not result in a draw, would probably favour White).

Today's best chess engines might be used to generate, say, a 3000+ vs. 3000+ engine vs. engine database if enough games could be played over time to very statistically matter - that would be arguably second best, but even then there may be some element of doubt to the result being the truth that might be hard to assign exact probability to, perhaps (maybe even a professional statistician who is also a GM could throw up his hands and say, we simply cannot say). In any case, the time it takes to make such a database makes it impractical for now, yet again.

Coming to the type of study used for fairy chess piece values, I don't know how margin(s) of error for such a study can be confidently established, for one thing. Next, more seriously, on my mind is the exact setup and armies used in a given study. For Chess960, I saw somewhere long ago online that someone figured after their own type of study that certain setups are roughly equal, while others favour White more than in orthodox chess, say up to 0.4 pawns worth over Black (you might find this somewhere on the internet, to check me). Consider also that that's just for armies that are equal in strength exactly, being identical as in chess. You may give both sides equally White and Black, but the setup and armies vary per study, and I'd guess it's hard to always be exhaustively fair to every possible setup/army, given time constraints.

Finally, you wrote earlier that errors tend to cancel each other out with lower level play (say 2300+ vs. 2300+ engines, as opposed to 2700+ vs. 2700+), It would be very good to know how many games and studies (even roughly) you base that conclusion on, if you still recall. Also, does the cancellation ever significantly favour one side or the other very much with any given [sort of] study? I think the strength of the engine(s) used just might be the most underestimated/large factor causing possible undetected error with this type of study (and sub-GM play within Kaufman's database study, as I alluded to above).

H. G. Muller wrote on Sun, Mar 10 09:34 PM UTC in reply to Kevin Pacey from 06:49 PM:

Well, I looked up the exact numbers, and indeed hish threshold for including games in the Kaufman study was FM level (2300) for both players. That left him with 300,000 games out of an original 925,000. So what? Are FIDE masters in your eyes such poor players that the games they produce don't even vaguely resemble a serious chess game? And do you understand the consequences of such a claim being true? If B=N is only true for FIDE masters, and not for 2700+ super-GMs, then there is no such a thing as THE piece values; apparently they would depend on the level of play. There would not be any 'absolute truth'. So which value in that case would you think is more relevant for the readers of this website? The values that correctly predict who is closer to winning in games of players around 1900 Elo, or those for super-GMs?

Your method to cast doubt on the Kaufman study is tantamount to denying that pieces have a well-defined value in the first place. You don't seem to have much support in that area, though. Virtually all chess courses for beginners teach values that are very similar to the Kaufman values. I have never seen a book that says "Just start assuming all pieces are equally valuable for now, and when you have learned to win games that way, you will be ready to value the Queen a bit more". If players of around 1000 Elo would not be taught the 1:3:3:5:9 rule, they would probably never be able to acquire a higher rating.

The nice thing about computer studies is that you can actually test such issues. You can make the look-ahead so shallow and full with oversights that it does play at the level of a beginner, and still measure how much better or worse it does with a Knight instead of a Bishop. And how much the rating would suffer from using a certain set of erroneous piece values to guide its tactical decisions. And whether that is more or less than it would suffer when you improve the reliability of the search to make it a 1500 Elo player.

It would also be no problem at all to generate 300,000 games between 3000+ engines. It doesn't require slow time control to play at that level, as engines lose only little Elo when you make them move faster. (About 30 Elo per halving the time, so giving them 4 sec instead of an hour per game only takes some 300 points of their rating. So you can generate thousands of games per hour, and then just let the computer run for a week. This is how engines like Leela-Chess Zero train themselves. A recent Stockfish patch was accepted after 161,000 self-play games showed that it led to an improvement of 1 Elo...

And in contrast to what you believe, solving chess would not tell you anything about piece values. Solved chess positions (like we have if end-game tables if there are only a few pieces on the board) are evaluated by their distance to mate, irrespective of what material there is on the board. Piece values are a concept for estimating win probability in games of fallible players. With perfect play there is no probability, but a 100% certainty that the game-theoretical result will be reached. In perfect play there is no difference between drawn positions that are 'nearly won' or 'nearly lost'. Both are draws, and a perfect player cannot distinguish them without assuming there is some chance that the opponent will make an error. Then it becomes important if only a tiny error would bring him in a lost position, or that it needs a gross blunder or twenty small errors. And again, in perfect play there is no such thing as a small or a large error; all errors are equal, as they all cost 0.5 point, or they would not be errors at all.

So you don't seem to realize the importance of errors. The whole elo model is constructed on the assumption that players make (small) errors that weaken their position compared to the optimal move with an appreciable probablity, and only seldomly play the very best move. So that the advantage performs a random walk along the score scale. Statistical theory teaches us that the sum total of all these 'micro-errors' during the game has a Gaussian probability distribution by the time you reach the end, and that a difference in the average error/move rate implied by the ratings of the players determines how much luck the weaker player needs to overcome the systematic drift in favor of the stronger player, and consequently how often he would still manage to draw or win. Nearly equivalent pieces can only be assigned a different value because it requires a smaller error to blunder the draw away for the side with the weaker piece than it does for the side with the stronger piece. So that when the players tend to make equally large errors on average (i.e. are equally strong), it becomes less likely for the player with the strong piece to lose than for the player with the weak piece. Without the players making any error, the game would always stay a draw, and there would be no way to determine which piece was stronger.

Kevin Pacey wrote on Sun, Mar 10 10:02 PM UTC in reply to H. G. Muller from 09:34 PM:

If you want a definition of near-perfect play, that still allows for the possibility of a (small) error or two, a very long game (that is well-played) that is a win for one side comes to mind.

You could decide to solve chess and it could be useful for determining piece values - just tag the number of moves a given 'game' of near-perfect chess takes to play until checkmate, and also keep track if B vs. N is involved. Optionally, you could have an engine assessing (albeit not perfectly accurately) who has the advantage (and how much) at every move. This of course is all not practically possible, in today's world at least.

H. G. Muller wrote on Sun, Mar 10 10:23 PM UTC in reply to Kevin Pacey from 10:02 PM:

The problem is that an N vs B imbalance is so small that you would be in the draw zone, and if there aren't sufficiently many errors, or not a sufficiently large one, there wouldn't be any checkmate. A study like Kaufman's, where you analyze statistics in games starting from the FIDE start position, is no longer possible if the level of play gets too high. All 300,000 games would be draws, and most imbalances would not occur in any of the games, because they would be winning advantages, and the perfect players would never allow them to develop. Current top engines already suffer from this problem; the developers cannot determine what is an improvement, because the weakest version is already so good that it doesn't make sufficiently many or large errors to ever lose. If you play from the start position with balanced openings. You need a special book that only plays very poor opening lines, that bring one of the players on the brink of losing. Then it becomes interesting to see which version has the better chances to hold the draw or not.

High level of play is really detrimental for this kind of study, which is all about detecting how much error you need to swing the result.

And then there is still the problem that if it would make a difference, it is the high-level play that is utterly irrelevant to the readers here. No one here is 2700+ Elo. The only thing of interest here is whether the reader would do better with a Bishop or with a Knight.

Kevin Pacey wrote on Sun, Mar 10 10:29 PM UTC in reply to H. G. Muller from 10:23 PM:

My impression was that periodically there is a leap in the strength of one engine someone is working on (e.g. AlphaZero), and then it outperforms other engines, say in 100 game matches, at least for a while, until the next such cycle begins.

edit: Chess.com is quoted by Google as saying AlphaZero lost 8 games to a version of Stockfish in a recent match, out of 1000 games, causing the loss of the match. Not many decisive results this time around, but wins are still possible at such a lofty level 'at the top' as we have right now, given enough games are played.

edit2: I haven't kept track of the progress quantum computing has been making, but that could lead to stronger engines and perhaps even open the door to solving chess.

H. G. Muller wrote on Mon, Mar 11 06:11 AM UTC in reply to Kevin Pacey from Sun Mar 10 10:29 PM:

That is a completely wrong impression. After a short period since their creation, during which all commonly used features are implemented, the bugs have been ironed out, and the evaluation parameters have been tuned, further progress requires originality, and becomes very slow, typically in very small steps of 1-5 Elo. Alpha Zero was a unique revolution, using a completele different algorithm for finding moves, which up to that point had never been used, and was actually designed for playing Go. Using neural nets for evaluation in conventional engines (NNUE) was a somewhat smaller revolution, imported from Shogi, which typically causes an 80 Elo jump in strength for all engines that started to use it.

There are currently no ideas on how you could make quantum computers play chess. Quantum computers are not generally faster computers than those we have now. They are completely different beasts, being able to do some parallellizeable tasks very fast by doing them simultaneously. Using parallelism in chess has always been very problematic. I haven't exactly monitored progress in quantum computing, but I would be surprised if they could already multiply two large numbers.

By now it should be clear that the idea of using 2700+ games is a complete bust:

it measures the wrong thing. We don't want piece values for super GM's, but for use in our own games.
it does it in a very inefficient way, because of the high draw rate, and draws telling you nothing.

So even if you believe/would have proved piece values are independent of player strength, it would be very stupid to do the measurement at 2700+ level, taking 40 times as many games, each requiring 1000 times longer thinking than when you would have done it at the level you are aiming for. If you are smart you do exactly the opposit, measuring at the lowest level (= highest speed) you can afford without altering the results.

Oh, and to answer an earlier question I overlooked: I typically test for Elo-dependence of the results by playing some 800 games at each time control, varying the latter by a factor 10. 800 games gives a statistical error in the result of equivalent to some 10 Elo.

Kevin Pacey wrote on Mon, Mar 11 07:11 PM UTC in reply to H. G. Muller from 06:11 AM:

I'll agree using 2700+ level play for studies is impractical at this time. Earlier I almost did tell you to re-check something you posted saying my implying that 2700+ level play being different from 2300+ level play meant that I'd thought there were no absolute piece values - yet you're going right ahead yourself and saying 2700+ play is different (besides impractical to use for studies). I thought you'd just had some sort of an automatic reaction to try and say everything I write is wrong, and that you wrote inconsistently in place(s) unknowingly. I do not know if you were still trying to be fair. In any case, I get your overall drift, certainly as of this last post of yours.

One person who earlier wrote somewhere that 'the person with the hand that holds the piece' affects it's value was Betza, if you wish to argue with that, too. I personally believe the true piece values (for average case) should be absolute (however I think we might never be able to know them for sure). I still had a couple of other things about studies that I thought were suspect (margins of error, initial setup/armies chosen, as I wrote a bit earlier) that you didn't address, but I now recall we discussed those long ago here on CVP - it's just that I never was fully convinced.

H. G. Muller wrote on Mon, Mar 11 09:59 PM UTC in reply to Kevin Pacey from 07:11 PM:

yet you're going right ahead yourself and saying 2700+ play is different

No, I said that if it was different the 2700+ result would not be of interest, while if they are the same it would be stupid to measure it at 2700+ while it is orders of magnitude easier around 2000. Whether less accurate play would give different results has to be tested. Below some level the games will no longer have any reality value, e.g. you could not expect correct values from a random mover.

So what you do is investigate how results for a few test cases depend on Time Control, starting at a TC where the engine plays at the level you are aiming for, and then reducing the time (and thus the level of play) to see where that starts to matter. With Fairy-Max as engine there turned out to be no change in results until the TC dropped below 40 moves/min. Examining the games also showed the cause of that: many games that could be easily won ended in draws because it was no longer searching deep enough to see that its passers could promote. So I conducted the tests at 40 moves/2min, where the play did not appear to suffer from any unnatural behavior.

You make it sound like it is my fault that you make so many false statements that need correcting...

Betza was actually write: the hand that wields the piece can have an effect on the empirical value. This is why I preferred to do the tests with Fairy-Max, which is basically a knowledge-less engine, which would treat all pieces on an equal basis. If you would use an engine that has advanced knowledge for, say, how to best position a piece w.r.t. the Pawn chain for some pieces and not for others, it would become an unfair comparison. ANd you can definitely make a piece worth less by encouraging very bad handling. E.g. if I would give a large positional bonus for having Knights in the corners, knights would become almost useless at low search depth. It would never use them. If you tell it a Queen is worth less than a Pawn, the side that starts with a Queen instead of a Rook would lose badly, as it would quickly trade Q for P and be R vs P behind.

The point is that the detrimental behavior that is encouraged here can never be stopped by the opponent. Small misconceptions tend to cancel out. E.g. if you twould have told the engine that a Bishop pair is worth less than a pair of Knights, the player with the Knights would avoid trading the Knights for Bishops, which is not much more difficult than avoiding the reverse trades, as the values are close. So it won't affect how often the imbalance will be traded away, and while it lasts, the Bishops will do more damage than the Knights, because the Bishop pair in truth is stronger. But there is no way you can prevent the opponent sacrificing his Queen for a Pawn, even if you have the misconception that the Pawn was worth more.

Note that large search depth tends to correct strategic misconceptions, because it brings the tactical consequences of strategic mistakes within the horizon. Wrecking your Pawn structure will eventually lead to forced loss of a Pawn, so the engine would avoid a wrecked Pawn structure even if it has no clue how to evaluate Pawn structures. Just because it doesn't want to lose a Pawn.

Statistical margins of error is high-school stuff. For N independent games the typical deviation of the result from the true probability will be square-root of N times the typical deviation of a single game result from the average. (Which is about 0.5, because not all games end in a 0 or 1 score.) So the typical deviation of the score percentage in a test of N games is 40%/sqrt(N). Having to calculate a square root isn't really advanced mathematics.

25 comments displayed

~~Earlier~~ ⇧Reverse Order⇩ Later⇧ Latest⇩

Permalink to the exact comments currently displayed.