[ Help | Earliest Comments | Latest Comments ]

[ List All Subjects of Discussion | Create New Subject of Discussion ]

[ List Latest Comments Only For Pages | Games | Rated Pages | Rated Games | Subjects of Discussion ]

Comments/Ratings for a Single Item

⇩Latest ⇩Later ⇩Reverse Order⇧ Earlier⇩ Earliest⇧

Game Courier Ratings. Calculates ratings for players from Game Courier logs. Experimental.[All Comments] [Add Comment or Rating]

🕸📝Fergus Duniho wrote on Tue, Dec 12, 2017 05:04 PM UTC:

Aurelian, I have moved this discussion to the relevant page.

By now I already have quite a few games played, and losing to a very high rated opponent or winning against a very low rated opponent does not mean much for the algorithm, in terms of correcting my rating. I think this is how is supposed to work.

Yes, it is supposed to work that way.

So, are you using a system of equations where the unknowns are the ratings, and the coefients are based on the results :)?!...

I'm using an algorithm, which is a series of instructions, not a system of equations, and the ratings are never treated as unknowns that have to solved for. Everyone starts out with a rating of 1500, and the algorithm finetunes each player's rating as it processes the outcomes of the games. Instead of processing every game chronologically, as Elo does, it processes all games between the same two players at once.

Game Courier Ratings. Calculates ratings for players from Game Courier logs. Experimental.[All Comments] [Add Comment or Rating]

Aurelian Florea wrote on Tue, Dec 12, 2017 03:29 AM UTC:

@Fergus,

I think I know what is going on with my ratings. By now I already have quite a few games played, and losing to a very high rated opponent or winning against a very low rated opponent does not mean much for the algorithm, in terms of correcting my rating. I think this is how is supposed to work.

So, are you using a system of equations where the unknowns are the ratings, and the coefients are based on the results :)?!...

🕸📝Fergus Duniho wrote on Mon, Dec 11, 2017 04:56 PM UTC:

As far as I'm aware, they do.

Aurelian Florea wrote on Mon, Dec 11, 2017 03:48 PM UTC:

I did read the rules, but I have not understood them. It seemed to me that tehy do not look like ELO ratings though. Anyway Fergus, are you saying that they work fine?

🕸📝Fergus Duniho wrote on Mon, Dec 11, 2017 02:59 PM UTC:

Ratings are calculated holistically, and they are designed to become more stable the more games you play. You can read the details on the ratings page for more on how they work differently than Elo ratings.

Aurelian Florea wrote on Mon, Dec 11, 2017 10:58 AM UTC:

The rating system could be off. I'm not sure if ratings should change instantly, meaning once any game is finished, then the rating is recalculated for the 2 palyers in question :)! Anyway yesterday a few games of mine (I think 3) have finished and ratings have not changed. It should have probably ended up a bit bellow 1530.

Game Courier Ratings. Calculates ratings for players from Game Courier logs. Experimental.[All Comments] [Add Comment or Rating]

🕸📝Fergus Duniho wrote on Fri, Jun 3, 2016 10:06 PM UTC:

I finally figured out the problem and got the logs for both of Kevin's games into the FinishedGames table. The problem was that both logs had the same name, and the table was set up to require each log to have a unique name. So I ALTERed the table to remove all keys, then I made the primary key the combination of Log + Game. Different logs had been recorded with INSERT and REPLACE, because INSERT would go with the first log it found with the same name, and REPLACE would replace any previous entries for the same log name with the last one. This change increased the size of the table from 4773 rows to 4883 rows.

🕸📝Fergus Duniho wrote on Fri, Jun 3, 2016 09:12 PM UTC:

I changed INSERT back to REPLACE and ran the script for creating the FinishedGames table again. This time, the log for the game Kevin lost got in, and the log for the game he won vanished even though I did not Truncate the table prior to doing this. Also, the total number of rows in the table did not change.

🕸📝Fergus Duniho wrote on Fri, Jun 3, 2016 08:58 PM UTC:

Things are getting weird. When I looked at Kevin Pacey's rating, I noticed it was still based on one game, not two. For some reason, the game he won was not getting added to the database. At this time, I was using the REPLACE command to populate the database. Also, it was failing silently. So, I Truncated the table, changed REPLACE to INSERT and recreated the table. This time, the game he won got in, but the game he lost did not. Maybe this game didn't make it in originally because of some mysterious problem with how INSERT works. It is frustrating that the MySQL commands are not performing reliably, and they are failing silently. So if it wasn't for noticing these specific logs, I would be unaware of the problem.

🕸📝Fergus Duniho wrote on Fri, Jun 3, 2016 06:03 PM UTC:

Kevin,

I just recreated the FinishedGames table, and your Sac Chess game against Carlos is now listed there. I'm not sure why it didn't get in before, but I have been fixing up the code for entering finished games into this table, and hopefully something like this won't happen again. But if it does, let me know.

🕸📝Fergus Duniho wrote on Fri, Apr 15, 2016 02:21 AM UTC:

Your game is marked as rated, but for some reason it didn't make it into the FinishedGames database table. I will have to look into whether this problem is isolated or more systemic. Just as a quick check, the last two games I finished are in the database. I will give this more attention <s>tomorrow</s> soon.

Kevin Pacey wrote on Fri, Apr 15, 2016 02:08 AM UTC:

Hi Fergus

I lost a game of Sac Chess to Carlos quite some time ago. I thought that it was to be rated, but as far as I can tell my rating is based on only 1 game (a win at Symmetric Glinski's Hexagonal Chess vs. Carlos). I don't know if the ratings have been updated to take into account my Sac Chess loss, but I thought I'd let you know, even though I don't plan to play on Game Courier, likely at least anytime soon.

🕸📝Fergus Duniho wrote on Mon, Apr 13, 2015 11:43 PM UTC:

I have switched the ratings system to the new method, because it is fairer. Details on the new system can be found on the page. I have included a link to the old ratings system, which will let you compare them.

🕸📝Fergus Duniho wrote on Mon, Apr 13, 2015 01:48 AM UTC:

I've been more closely comparing different approaches to the ratings. One is the new approach I described at length earlier, and one is tweaking the stability value. In tweaking the stability value, I could increase the accuracy measurement by raising the number of past games required for a high stability score. But this came at a cost. I noticed that some players who had played only a few games quickly got high ratings. Perhaps they had played a few games against high rated players and won them all. Still, this seemed to be unfair. Maybe the rating really was reflective of their playing abilities, but it's hard to be sure about this, and their high ratings for only a few games seemed unearned. In contrast to this, the new rating method put a stop to this. It made high ratings something to be earned through playing many games. Its highest rated players were all people who had played several games. Its highest rating for someone who played games in the single digits was 1621 for someone who had won 8.5 out of 9 games. In contrast, the tweaked system gave 1824 to someone who won 4 out of 4 games, placing him 5th in the overall rankings. The current system, which has been in place for years, gave 1696 and 1679 to people who won 8.5/9 and 4/4 respectively.

In the ratings for all games, the new system gets a lower accuracy score by less than 2%. That's not much of a difference. In Chess, it gets the higher accuracy score. In some other games, it gets a lower score by a few percentage points. Generally, it's close enough but has the advantage of reducing unearned high ratings, which gives it a greater appearance of fairness. So I may switch over to it soon.

🕸📝Fergus Duniho wrote on Sun, Apr 12, 2015 02:43 AM UTC:

So far, the current method is still getting higher accuracy scores than the new method I described. Maybe gravity does matter. This is the idea that if one player's rating is based on several games, and the other player's rating isn't, the rating of the player with fewer games should change even more than it would if their past number of games were equal. This allows the system to get a better fix on a player's ability by adjusting his rating more when he plays against opponents with better established ratings.

🕸📝Fergus Duniho wrote on Sat, Apr 11, 2015 11:50 PM UTC:

I'm rethinking this even more. I was reading about Elo, and I realized its main feature is a self-correcting mechanism, sort of like evolution. Having written about evolution fairly extensively in recent years, I'm aware of how it's a simple self-correcting process that gets results. So I want a ratings system that is more modeled after evolution, using self-correction to get closer to accurate results.

So let's start with a comparison between expectations and results. The ratings for two players serve as a basis for predicting the percentage of games each should win against the other. Calculate this and compare it to the actual results. The GCR currently does it backward from this. Given the results, it estimates new ratings, then figures out how much to adjust present ratings to the new ratings. The problem with this is that different pairs of ratings can predict the same results, whereas any pair of ratings predicts only one outcome. It is better to go with known factors predicting a single outcome. Going the other way requires some arbitrary decision making.

If there is no difference between predicted outcome and actual outcome, adjustments should be minimal, perhaps even zero. If there is a difference, ratings should be adjusted more. The maximum difference is if one player is predicted to win every time, and the other player wins every time.
Let's call this 100% difference. This would be the case if one rating was 400 points or more higher than another. The maximum change to their scores should be 400 points, raising the lower by 400 points and decreasing the higher by 400. So the actual change may be expressed as a limit that approaches 400. Furthermore, the change should never be greater than the discrepancy between predictions and outcomes. The discrepancy can always be measured as a percentage between 0% and 100%. The maximum change should be that percentage of 400.

But it wouldn't be fair to give the maximum change for only a single game. The actual change should be a function of the games played together. This function may be described as a limit that reaches the maximum change as they play more games together. This is a measure of the reliability of the results. At this point, the decision concerning where to set different levels of reliability seems arbitrary. Let's say that at 10 games, it is 50% reliable, and at 100 games near 100% reliable. So, Games/(Games + 10) works for this. At 10, 10/20 is .5 and at 100, 100/110 is .90909090909. This would give 1 game a reliability of .090909090909, which is almost 10%. So, for one game with 100% difference between predictions and results, the change would be 36.363636363636. This is a bit over half of what the change currently is for two players with ratings of 1500 when one wins and the other loses. Currently, the winner's rating rises to 1564, while the loser's goes down to 1435. With both players at 1500, the predicted outcome would be that both win equally as many games or draw a game. Any outcome where someone won all games would differ from the predicted outcome by 50%, making the maximum change only 200, and for a single game, that change would be 18.1818181818. This seems like a more reasonable change for a single game between 1500 rated players.

Now the question comes in whether anything like stability or gravity should factor into how the scores change. Apparently the USCF uses something called a K-factor, which is a measure of how many games one's current rating is based on. This corresponds to what I have called stability. Let's start with maximums. What should be the maximum amount that stability should minimize the change to a score? Again, this seems like an arbitrary call. Perhaps 50% would be a good maximum. And at what point should a player's rating receive that much protection? Or, since this may be a limit, at what point should change to a player's rating be minimized by half as much, which is 25%? Let's say 200 games. So, Games/(Games + 600) works for this. At 200, it gives 200/800. At 400, it gives 400/1000.

And what about gravity? Since gravity is a function of stability, maybe it adds nothing significant. If one player has high stability and the other doesn't, the one whose rating is less stable will already change more. So, gravity can probably be left out of the calculation.

🕸📝Fergus Duniho wrote on Sat, Apr 11, 2015 09:20 PM UTC:

I'm thinking of tweaking the way the GCR is calculated. As it is right now, the value that is going to grow the quickest is a player's past games. This affects the stability value, which is already designed to near the limit of one more quickly than reliability ever will. Even if games with the current opponent and one's past games remained equal in number, stability would grow more quickly than reliability. But after the first opponent, one's past games will usually outnumber one's games with the current opponent. Besides this, gravity is based on stability scores, and as stability scores for both opponents quickly near the limit of one, gravity becomes fairly insignificant. Given that past games will usually outnumber games played against the current opponent, it makes sense for reliability to grow more quickly than stability.

🕸📝Fergus Duniho wrote on Sat, Apr 11, 2015 10:40 AM UTC:

It is now possible to use wildcards within comma-separated lists of games. Also, Unix style wildcards are now converted to SQL style wildcards. So you can use either.

🕸📝Fergus Duniho wrote on Sat, Apr 11, 2015 03:26 AM UTC:

It's now possible to list multiple games in the Game Filter field. Just comma-separate them and don't use wildcards.

Cameron Miles wrote on Sat, Apr 11, 2015 01:37 AM UTC:

It looks like everything's been fixed! Well done, Fergus, and thank you!

I see that the Finished Games database also allowed for the creation of a page listing Game Courier's top 50 most-played games, which is a very nice addition.

Now I guess I have to see if I can catch Hexa Sakk...  ; )

🕸📝Fergus Duniho wrote on Sat, Apr 11, 2015 01:36 AM UTC:

I have also modified groups to work with mysql, and one new feature that helps with groups is that it shows the sql of the search it does. This lets you see what Chess variants are in a group. Most of the groups are based on the tiers I made in the Recognized variants. These may not be too helpful, since they are not of related games. The Capablanca group, which I just expanded, seems to be the most useful group here, since it groups together similar games. What I would like to do is add more groups of related games. I'm open to suggestions.

🕸📝Fergus Duniho wrote on Sat, Apr 11, 2015 01:02 AM UTC:

This script now reads the database instead of individual logs, and some bugs have been fixed. For one thing, it shouldn't be missing games anymore, as I was complaining about in a previous comment. Also, I found some functions for dealing with mixed character encodings in the database. Some years ago, I tried to start converting everything to UTF-8, but I never finished that. This led to multiple character encodings in the database. By using one function to detect the character encoding and another to convert whatever was detected to UTF-8, I'm now getting everyone's name to show up correctly.

One of the practical changes is the switch from Unix wildcards to SQL wildcards. Basically, use % instead of *, and use _ instead of ?.

One more thing. I moved this script from play/pbmlogs/ to play/pbm/. It was in the former only because it had to read the logs. Now that it doesn't, it seems more logical to put it in play/pbm/. The old script is still at its old location if you want to compare.

🕸📝Fergus Duniho wrote on Tue, Apr 7, 2015 02:11 PM UTC:

The good news is that the reason this didn't work sometimes was not because of too many files but because of a monkey wrench thrown into one of the log files. With that file renamed, it's not being read, and this page generates ratings even when set to give ratings for all public games. The bad news is that it seems to be undercounting the games played by people. I checked out a player it said had played only one game, and the logs page listed 23 games he has finished playing. I was also skeptical that I had played only 62 games. I counted more than that and saw that I had still played several more. So that has to be fixed. And since I have made the new FinishedGames database table, I will eventually rewrite this to use that instead of reading the files directly.

🕸📝Fergus Duniho wrote on Fri, Jul 28, 2006 04:58 PM UTC:

So far, the ratings for all public games fall within a 500 point range. Except for the top rating, all fall within a 400 point range. Most fall within a 200 point range. Ratings of people who have played only two games fall within a 300 point range. Ratings of people who have played only one game fall within a 200 point range. So there does not appear to be any deflation or inflation of ratings. There is a range of variablity among players who have played few games, but you cannot get your rating very high or low without playing many games.

Stephen Stockman wrote on Fri, Jul 28, 2006 07:20 AM UTC:Excellent ★★★★★

I have a suggestion. Is it possible to have a maximum number of points that
a player can gain or lose per game? I am thinking of a maximum change per
game of around 10 or 20 points, because there are many players listed here
who have only played one or two games but they have highly inflated or
deflated ratings.

Hats off to Jeremy Good who apparently has completed more games here than
anyone else, looks like 250 completed games, and counting!

25 comments displayed

⇩Latest ⇩Later ⇩Reverse Order⇧ Earlier⇩ Earliest⇧

Permalink to the exact comments currently displayed.