Hey everyone.
Right, I hope this turns out to be a good idea and not a chance for people to just derail the thread with negativity and so forth, but I thought I would attempt to explain with some detail where the weekly japanese sales data comes from on my site, since people still seem to view it with some negativity.
Firstly, a historical overview of the Japanese sales collection "industry" (if anyone wants to add or correct anything then please do):
- There are three major trackers in Japan- Media Create, Famitsu and Dengeki.
- Up until a few years ago (~2002/3), Media Create and Dengeki published a weekly top 50, and Famitsu a weekly top 30. Now we only get a top 10 from Media Create and a top 30 from Famitsu, with only highlights of numbers released from Dengeki.
- Between 2001 and 2003 Dengeki Top 500's for the year were a regular occurance.
- There have been occasional leaked full Media Create reports in the past and some memebers of these boards get more detailed data than the top 10 which can be found on the internet if you know where to look.
- I stumbled upon a website a few years back that had both Dengeki yearly data since 1992 upto about 2000) and also totals for 92-2001 I think. (I will put these spreadsheets on my site and link to them for people to download if you want, or I may be able to find the original website with a bit of googling).
edit: here's the old Dengeki data which I have used a lot for older games:
http://www.everythingandnothing.org.uk/vg/dengeki.xls
- There are a couple of excellent websites (in Japanese I'm afraid) that have famitsu weekly top 30s (my main source of historical data since I have no weekly info from Dengeki / Media Create before about 2000) going back to Sept 1995, with the odd snippets of data from before then.
http://homepage2.nifty.com/~NOV/top301994.html (years across the top)
http://homepage2.nifty.com/~NOV/t2002tsdx.html top 500 from 2002 for example.
http://www.sfoxstudio.com/sell/
http://www.kyoto.zaq.ne.jp/dkbkq103/yso/
- If you look at the first link and go back to one of the charts from 1992 then you will see that they only have "raw" data from back then. Eg: http://homepage2.nifty.com/~NOV/d173.html. They have the "raw data" column and then in this case have left the extapolated column blank but suggested using a scale of 5 so extrapolate the data up by. This will be largely the same as what they do today. They may survey for example 50% of retailers and then just double-up the raw data (see later for more on this).
Analysing this data in more depth
Right, this is the bit where I tend to lose some people. In my personal experience, the data trackers have become much much more accurate in the last few years than they ever were in the past. Go back to the days of the early N64 in those links above and the N64 games seem to be criminally under-tracked (as with SNES and early PS to an extent). Famistsu seemed to get it's ass in gear in about 98-99 and has just continued to improve since then.
What I am talking about with accuracy is comparing the totals that for example Famitsu provide with the official numbers we get from publishers. If you follow through the old raw Famitsu data, we have games like Mario Kart 64 on about 800k when it drops off the charts, compared to 2.24m Nintendo claim to have sold. About 700k Super Mario 64s compared to 1.91m. Similarly low numbers for Donkey Kong Countries and so on on the SNES. Fast forward to 99 and we have data like 1650k reported for Smash Bros on the N64 vs 1.97m from Nintendo, so the tracking was definitely getting better.
http://ha7.seikyou.ne.jp/home/Ralva/rank.n64.htm
The best example is probably Pokemon Red / Blue / Green, which I have given a good example of in the past, here- (will find the link), where Famitsu realised that they were tracking way behind the figures Nintendo were claiming and just bumped up the sales by a certain amount to reflect this. This also seems to have happened recently with Brain Flex, where Famitsu have been tracking too high and have had to artificially slow the sales to bring the total more in line.
I'd say this is where the tracking companies have got better. Instead of just measuring a sample of retailers and linearly multiplying everything up they now have a lot of past data, trends, info from Publishers and seem to be able to adjust their data better to give better figures.
To try and illustrate this with an example. Media Create may cover say 50% of retailers and they have their chart for the week based on this data. I doubt very much that they just double everything and say "right, this is the final data for all the retailers". They used to, but I don't think any more. I'd imagine they apply weightings to various games and various publishers based on past trends. They may know that they cover more specialist retailers so maybe their figure for X360 will be skewed higher since people only buy them from specialist retailers so they may only extrapolate this up by 1.5 or 1.6 or some factor that they have decided from past data. Look at the X360 launch and the range of different data we had, well this may well have come from the fact there were no past trends at this point and the trackers have just arbitrarily used some scale factor. It may have come from the fact that the stores that one service track sold more than the other, or most likely a combination of these things.
The advantage that I have (and in fact that we all have) in this is hindsight. Media Create don't have this. They have a couple of days to add up all their data, extrapolate up by some scale factors and get it out. We have the benefit of three seperate organisations all doing this and we also have LTD sales numbers from the Publishers every 6-12 months to compare with. And this is exactly what is done on Everything and Nothing. Since we don't have to publish data by a set date and can go back and adjust old data, rescale it and so on then we have a massive advantage over MC / Famitsu etc. Most the data on Everything and Nothing (we'll come to the most in a bit) is directly from the MC / Famitsu weekly charts we get, averaged (surely the sensible thing to do??) and scaled up or down slightly (since 2003 I don't think I've scaled a game by more than 5-6% up or down which isn't a lot) and rounded.
Now to address a couple of those points, why average the data? Well the way I see it we get 2-3 slightly conflicting reports on the same thing so averaging makes sense. To use a comparison if you are investigating something and ask three eye-witnesses to tell you what happened and they all say virtually the same thing but with slight differences then what do you do? I want the most accurate data possible, the fact that the three sources conflict says that at least two are wrong (likely all three) so how do we deal with this? If anyone has a better suggestion then I'd be very happy to hear it!!
Why round the data to the nearest 250? Three resons:
1) A personal pet peeve of mine is that people seem to think that because a number is shown to more significant figures then it is more accurate. 123,652 is more accurate than 123,500. Well that's fine if the number is correct to the nearest unit or ten maybe, but when it is only accurate to the nearest say 5% then it's meaningless and misleading in my opinion to show to that level of detail.
2) Rounding the numbers means that they are not just a reproduction of Famitsu / MC numbers so in theory should remove me from any claims of reproducing copyrighted data etc.
3) It gives a nicew cut-off point as to when a game stops selling. Super Mario Land may still be selling 5-10 copies a week, who knows? But the minute it drops below about 150 then it "falls off" the chart, so it makes life a bit easier for me since most of the sales below 1k are "estimated" anyway.
Right, so that covers present data, and data for games in the top 30 for which we have at least famitsu data for. What about when a game drops out of the top 30?
Well this is where the part 1 of the detective work and data analysis comes in. We have monthly top 100s, we have yearly top 100s (sometimes 500 if we do well) and this can often give enough info to track the descent of a game when it goes sub 5k. Where this data isn't available (for example with a game with short legs) then this descent will be estimated but again this will be done to match up with yearly totals etc. For example, if a game sold 5k in it's last charting week then dropped off the chart in March and sold another 10k by December then we can quite reasonably say that it dropped off quickly and if I guess 4k, 2k, 1.25, 0.75k and basically fit the descent to 10k then is this really all that bad? Again what else would you suggest, just stop tracking the game when it gets below 5k? But surely by matching the total up and having to guess the last few weeks sales (which are so small to be largely insignificant anyway) is better than just not counting them at all?
So this covers all new games, and seriously if anyone has any suggestiong / constructive criticisms etc then please let me know because I personally cannot see a way in which the data could be made any more accurate than it is.
Now for older games we have a slightly different story. Anything pre-95 that I have tried to put up has been a lot more down to guesswork since we simply don't have the data. What we do have for some games is a week 1, a week 8 and a total, say. Not a lot of data, but enough to vaguely try and fit some numbers to. We know how large it started, we can look at similar games (sequels maybe) and see what kind of drop-offs we get week 2 and 3 and then we know how low it must have got by week 8, we know how much it has to do after that and we know how much the "real" total is so how much to scale up by. Now if there is anything people want to argue with then maybe with this you'd have a point. I don't claim that the data pre 95 is anywhere close the very accurate, as you can see from the Famitsu charts in the link at the top we have very little data from that period and the data we do have is likely from about 1/5 of all retailers and so riddled with errors. All these games do is serve to show the very rough trends of their sales.
Believe it or not, Japanese sales (since they are usually so frontloaded) are actually quite easy to predict. Compare for example the last few World Soccers, they follow almost exactly the same weekly trends. Had we been given week 1, week 5, week 8 and a total for Winning Eleven 8 do you think that our detective work of guessing values for the missing weeks based on WE9 and WE7 would have given a result that was that different to the actual data?
Anything post 95 and certainly anything post 2000 will be very very accurate indeed.
Right, I've probably bored people long enough. There's loads more I can add, more data sources, more explanations if people are still not happy, but hopefully this will at least give people a place to start!
Thanks for your time, and thanks to all those who have sent me constructive emails and messages and stuff. And if anyone has any more data that I don't (a load of famitsu magazines sitting around from 1993) then let me know!!
Right, I hope this turns out to be a good idea and not a chance for people to just derail the thread with negativity and so forth, but I thought I would attempt to explain with some detail where the weekly japanese sales data comes from on my site, since people still seem to view it with some negativity.
Firstly, a historical overview of the Japanese sales collection "industry" (if anyone wants to add or correct anything then please do):
- There are three major trackers in Japan- Media Create, Famitsu and Dengeki.
- Up until a few years ago (~2002/3), Media Create and Dengeki published a weekly top 50, and Famitsu a weekly top 30. Now we only get a top 10 from Media Create and a top 30 from Famitsu, with only highlights of numbers released from Dengeki.
- Between 2001 and 2003 Dengeki Top 500's for the year were a regular occurance.
- There have been occasional leaked full Media Create reports in the past and some memebers of these boards get more detailed data than the top 10 which can be found on the internet if you know where to look.
- I stumbled upon a website a few years back that had both Dengeki yearly data since 1992 upto about 2000) and also totals for 92-2001 I think. (I will put these spreadsheets on my site and link to them for people to download if you want, or I may be able to find the original website with a bit of googling).
edit: here's the old Dengeki data which I have used a lot for older games:
http://www.everythingandnothing.org.uk/vg/dengeki.xls
- There are a couple of excellent websites (in Japanese I'm afraid) that have famitsu weekly top 30s (my main source of historical data since I have no weekly info from Dengeki / Media Create before about 2000) going back to Sept 1995, with the odd snippets of data from before then.
http://homepage2.nifty.com/~NOV/top301994.html (years across the top)
http://homepage2.nifty.com/~NOV/t2002tsdx.html top 500 from 2002 for example.
http://www.sfoxstudio.com/sell/
http://www.kyoto.zaq.ne.jp/dkbkq103/yso/
- If you look at the first link and go back to one of the charts from 1992 then you will see that they only have "raw" data from back then. Eg: http://homepage2.nifty.com/~NOV/d173.html. They have the "raw data" column and then in this case have left the extapolated column blank but suggested using a scale of 5 so extrapolate the data up by. This will be largely the same as what they do today. They may survey for example 50% of retailers and then just double-up the raw data (see later for more on this).
Analysing this data in more depth
Right, this is the bit where I tend to lose some people. In my personal experience, the data trackers have become much much more accurate in the last few years than they ever were in the past. Go back to the days of the early N64 in those links above and the N64 games seem to be criminally under-tracked (as with SNES and early PS to an extent). Famistsu seemed to get it's ass in gear in about 98-99 and has just continued to improve since then.
What I am talking about with accuracy is comparing the totals that for example Famitsu provide with the official numbers we get from publishers. If you follow through the old raw Famitsu data, we have games like Mario Kart 64 on about 800k when it drops off the charts, compared to 2.24m Nintendo claim to have sold. About 700k Super Mario 64s compared to 1.91m. Similarly low numbers for Donkey Kong Countries and so on on the SNES. Fast forward to 99 and we have data like 1650k reported for Smash Bros on the N64 vs 1.97m from Nintendo, so the tracking was definitely getting better.
http://ha7.seikyou.ne.jp/home/Ralva/rank.n64.htm
The best example is probably Pokemon Red / Blue / Green, which I have given a good example of in the past, here- (will find the link), where Famitsu realised that they were tracking way behind the figures Nintendo were claiming and just bumped up the sales by a certain amount to reflect this. This also seems to have happened recently with Brain Flex, where Famitsu have been tracking too high and have had to artificially slow the sales to bring the total more in line.
I'd say this is where the tracking companies have got better. Instead of just measuring a sample of retailers and linearly multiplying everything up they now have a lot of past data, trends, info from Publishers and seem to be able to adjust their data better to give better figures.
To try and illustrate this with an example. Media Create may cover say 50% of retailers and they have their chart for the week based on this data. I doubt very much that they just double everything and say "right, this is the final data for all the retailers". They used to, but I don't think any more. I'd imagine they apply weightings to various games and various publishers based on past trends. They may know that they cover more specialist retailers so maybe their figure for X360 will be skewed higher since people only buy them from specialist retailers so they may only extrapolate this up by 1.5 or 1.6 or some factor that they have decided from past data. Look at the X360 launch and the range of different data we had, well this may well have come from the fact there were no past trends at this point and the trackers have just arbitrarily used some scale factor. It may have come from the fact that the stores that one service track sold more than the other, or most likely a combination of these things.
The advantage that I have (and in fact that we all have) in this is hindsight. Media Create don't have this. They have a couple of days to add up all their data, extrapolate up by some scale factors and get it out. We have the benefit of three seperate organisations all doing this and we also have LTD sales numbers from the Publishers every 6-12 months to compare with. And this is exactly what is done on Everything and Nothing. Since we don't have to publish data by a set date and can go back and adjust old data, rescale it and so on then we have a massive advantage over MC / Famitsu etc. Most the data on Everything and Nothing (we'll come to the most in a bit) is directly from the MC / Famitsu weekly charts we get, averaged (surely the sensible thing to do??) and scaled up or down slightly (since 2003 I don't think I've scaled a game by more than 5-6% up or down which isn't a lot) and rounded.
Now to address a couple of those points, why average the data? Well the way I see it we get 2-3 slightly conflicting reports on the same thing so averaging makes sense. To use a comparison if you are investigating something and ask three eye-witnesses to tell you what happened and they all say virtually the same thing but with slight differences then what do you do? I want the most accurate data possible, the fact that the three sources conflict says that at least two are wrong (likely all three) so how do we deal with this? If anyone has a better suggestion then I'd be very happy to hear it!!
Why round the data to the nearest 250? Three resons:
1) A personal pet peeve of mine is that people seem to think that because a number is shown to more significant figures then it is more accurate. 123,652 is more accurate than 123,500. Well that's fine if the number is correct to the nearest unit or ten maybe, but when it is only accurate to the nearest say 5% then it's meaningless and misleading in my opinion to show to that level of detail.
2) Rounding the numbers means that they are not just a reproduction of Famitsu / MC numbers so in theory should remove me from any claims of reproducing copyrighted data etc.
3) It gives a nicew cut-off point as to when a game stops selling. Super Mario Land may still be selling 5-10 copies a week, who knows? But the minute it drops below about 150 then it "falls off" the chart, so it makes life a bit easier for me since most of the sales below 1k are "estimated" anyway.
Right, so that covers present data, and data for games in the top 30 for which we have at least famitsu data for. What about when a game drops out of the top 30?
Well this is where the part 1 of the detective work and data analysis comes in. We have monthly top 100s, we have yearly top 100s (sometimes 500 if we do well) and this can often give enough info to track the descent of a game when it goes sub 5k. Where this data isn't available (for example with a game with short legs) then this descent will be estimated but again this will be done to match up with yearly totals etc. For example, if a game sold 5k in it's last charting week then dropped off the chart in March and sold another 10k by December then we can quite reasonably say that it dropped off quickly and if I guess 4k, 2k, 1.25, 0.75k and basically fit the descent to 10k then is this really all that bad? Again what else would you suggest, just stop tracking the game when it gets below 5k? But surely by matching the total up and having to guess the last few weeks sales (which are so small to be largely insignificant anyway) is better than just not counting them at all?
So this covers all new games, and seriously if anyone has any suggestiong / constructive criticisms etc then please let me know because I personally cannot see a way in which the data could be made any more accurate than it is.
Now for older games we have a slightly different story. Anything pre-95 that I have tried to put up has been a lot more down to guesswork since we simply don't have the data. What we do have for some games is a week 1, a week 8 and a total, say. Not a lot of data, but enough to vaguely try and fit some numbers to. We know how large it started, we can look at similar games (sequels maybe) and see what kind of drop-offs we get week 2 and 3 and then we know how low it must have got by week 8, we know how much it has to do after that and we know how much the "real" total is so how much to scale up by. Now if there is anything people want to argue with then maybe with this you'd have a point. I don't claim that the data pre 95 is anywhere close the very accurate, as you can see from the Famitsu charts in the link at the top we have very little data from that period and the data we do have is likely from about 1/5 of all retailers and so riddled with errors. All these games do is serve to show the very rough trends of their sales.
Believe it or not, Japanese sales (since they are usually so frontloaded) are actually quite easy to predict. Compare for example the last few World Soccers, they follow almost exactly the same weekly trends. Had we been given week 1, week 5, week 8 and a total for Winning Eleven 8 do you think that our detective work of guessing values for the missing weeks based on WE9 and WE7 would have given a result that was that different to the actual data?
Anything post 95 and certainly anything post 2000 will be very very accurate indeed.
Right, I've probably bored people long enough. There's loads more I can add, more data sources, more explanations if people are still not happy, but hopefully this will at least give people a place to start!
Thanks for your time, and thanks to all those who have sent me constructive emails and messages and stuff. And if anyone has any more data that I don't (a load of famitsu magazines sitting around from 1993) then let me know!!