Archive for the 'wiki' Category

Bug 57 laid to rest

Wednesday, May 28th, 2008

Wikipedia page size breakdown

Tuesday, May 27th, 2008

Just for kicks, I cleared my cookies & caches and loaded up Wikipedia’s “Frog” article fresh to see what the breakdown of network bandwidth would look like…

645,947 bytes of data content are transferred, not counting any HTTP headers:

72.5% content images
10% JavaScript code
7.5% style sheets
5.5% HTML web page < the important stuff
4.5% UI images

This would take about 90 seconds to download on a 56kbit connection. It’s easy to forget what low-bandwidth feels like for those of us with broadband, but people outside cities may not have good broadband, and mobile devices are often stuck on pretty slow networks too. Compare regular Wikipedia against our mobile gateway on your mobile phone sometime; even a fancy browser like the iPhone’s will feel like molasses trying to load the full site, while loading things up lickety-split from the more minimal mobile gateway.

Fairly simple compression improvements could save 128kb of that:

  • 64k by gzipping JS and CSS files that are currently served uncompressed
  • another 64k through smarter compression of thumbnails (animated GIF optimization, use of JPEG for some PNG thumbs)

That would save approximately 18 seconds of download time for our hypothetical low-bandwidth user.

Details at mw:Wikipedia_data_size_test

More mobile fixlets

Tuesday, May 27th, 2008

Magic quotes strike again!

I’ve disabled magic_quotes_gpc on our mobile transcoder’s PHP configuration, fixing access to articles with apostrophes or double quotes in their names:

Of course, it should be fixed to detect and undo this data corruption on input. At least this misfeature is finally going to die in PHP 6… :D

CentralAuth / SUL is here!

Tuesday, May 27th, 2008

Okay folks, as of a couple hours ago unified login is available opt-in for all Wikimedia accounts!

In addition, we’ve enabled the site-wide global session cookies (which have been in testing for the SSL interface on secure.wikimedia.org for a few weeks). Some people may not be able to successfully get that working across domains (we’ve got reports of Norton blocking the login-cookie-fetching images), but it seems to be working for most people so far. :)

This means that not only will your global, unified account have the same password on say English Wikipedia and Commons, but once you’ve logged in on one you’ll be logged in on the other, without having to log in a second time. Handy!

Note that to do this fully automatically, when you visit a new wiki for the first time it will autocreate a local account for you, linked to your global account. Initially this was spamming the Recent Changes lists with account creation logs, but I’ve now pulled that (they’re still logged in Special:Log, however). (This has been disabled for now, as it’s spamming logs and user lists faster than expected, even through “invisible” links like shared JS and CSS. You’ll still get your shiny local accounts by going through the regular login form, and once you’ve done it once your sessions remain shared.)

Big thanks to Tim Starling who’s done a huge amount of work on CentralAuth in the last couple months, as well as Andrew Garrett who’s helped a lot with the cross-domain cookie logins and global Steward group management.

Mobile gateway search

Saturday, May 24th, 2008

So it turns out that the search function on Wikipedia’s HawHaw-powered mobile gateway hasn’t been working for a long time, not because it wasn’t implemented, but because it was screen-scraping the search results page.

Some little detail of the results layout changed ages ago, breaking it. Nice! Well, I’ve redone it to use the MediaWiki web service API which should be a little more stable.

Search works again, yay!

Even if the correct search result is fifth in the output *cough* :)

Hey, we’re workin’ on it. ;)

More CentralAuth comin’ Tuesday

Saturday, May 24th, 2008

Hey, just to give y’all a heads-up… after a couple months of good testing w/ the sysops & power users, we’re going to widen the CentralAuth rollout to allow everybody on Wikimedia sites to opt-in to the system.

We’re going to keep automatic migration off for now to keep the volume down, as we may want to roll out more helper tools in response to new issues people might have.

UTF-8 support in Firefox 3 location bar

Friday, May 23rd, 2008

I don’t usually repost other blogs, but this is a big usability help for our non-Latin wikis… Firefox 3 is joining Safari and Opera 9 in displaying human-legible Unicode URLs in the location bar.

Woohoo!

RecentChangesCamp!

Friday, May 9th, 2008

About to head out to RecentChangesCamp 2008 in Palo Alto, CA… see y’all there!

Diff bug fixed, hopefully

Saturday, April 26th, 2008

For a long time we’ve had intermittent problems with diffs displaying incorrectly, with lines on the left side mysteriously repeated:

Reports skyrocketed the other day, when the wikidiff2 extension (our C++ reimplementation of MediaWiki’s diff algorithm, about a billion times faster than the PHP one) was upgraded to match upgrades of PHP on our older, Fedora Core-based servers.

I added in some logging hacks to try to track it down, but didn’t get a lot of data points until I tried the simple expedient of running every diff twice — if the results don’t match, log the error.

With a few hundred instances logged, it became clear that the problem was limited to servers running Fedora 4; even-older Fedora 3 boxes were unaffected, as were all our newer Ubuntu boxes. Mysterious problems caused by C++ run-time library mismatches between different Linux releases are not at all uncommon; it looked like we’d installed an FC3 binary on all the machines, and it was intermittently failing on FC4.

I recompiled the extension, this time with separate builds on FC3 and FC4, and haven’t seen any bad diffs come through my log in the last half hour… so far so good! :)

So what’s in the job queue anyway?

Tuesday, April 22nd, 2008

In en.wikipedia.org’s job queue at the moment, breakdown by job type…

job_cmd count(*)
htmlCacheUpdate 31,147
refreshLinks 10,106,739
renameUser 119

Note that the current system allows for duplicate entries to get put in the queue; the dupes are removed as the first one in the stack gets run. This makes the raw number of refreshLinks entries much higher than it “really” is — Talk:Union Station (Louisville) is listed 9 times, presumably once for each template edit that triggered an “update me!” job.

Update: Figured out why the queues were growing so big last few days — system clock was 7 seconds slow on the database master. This made the replication lag detection misread a 7-second minimum lag on every slave. The job queue batch runners were all sitting waiting for the lag to resolve. :)

Resynced the clock (presumably drifted during the period when some IPs were broken), things are moving again.


I love Wikipedia!