From: Walter Watts (wlwatts@cox.net)
Date: Fri Mar 19 2004 - 19:37:16 MST
Google is using some new procedures in its caching procedures:
(picking their cached page instead of the real one was always my
default, safe choice)--Now you can't always trust it. Clicking on the
cached page might take you to the "real" site, and all the nasty
behavior that can entail. I'm sure they're doing this to try and trim
the indexing from their own figures around 400 terrabytes of data at the
lowest estimate and as much as 700 terrabytes
( depending on exactly what they can "deep crawl" )......see below.
Damn them!.
Anyone else seen this behavior on Google
Walter
---------------------------------------------------------------------------
...
So when in the middle of this last ??? google claimed to be "Caching"
my pages..and yet the "cached" versions were able to show my images
being loaded ...I realised they had made an important change in the
way they handle data ...
They currently are indexing from their own figures around 400
terrabytes of dat at the lowest estimate and as much as 700 terrabytes
( depending on exactly what they can "deep crawl" )......
To treat this data by whatever routines they run ( for example via
msql or whatever ) is not too complex even if they are running hugely
discriminatory algo's ....however it is very very costly in
processing power annd upto now was indexed stored and ranked "off net"
with the dance reflecting the reintroduction of the treated data into
the "publically accessible index"...what we call "Google"...( this I
know is horribly simplifying what happens ...but otherwise it will get
too aesotheric for this forum )...
from a practical point of view it would be more simple to at least
store the data to be treated in situ (wher it already is on your
website server )...ie ..why make a on googles hard drives when it can
( by spidering much more intensiveley and more frequently ...and by
using more spiders each with its own functions ) simply treat all
spidered sites as "in ram"..( again I'm simplifying horribly )
this would require less outlay by google and would actually result in
very much faster updates as it is effectivly now "ranking" on "the
fly"....
stes with purely html would not notice that their "cached" was now
"hotlinked" into their sever and standard java etc neither ...
Side routines ..php..msql etc wouldnt be affected either as its not
"writing to
disc " when it comes by ...
However where this gets really interesting is that up until now You
couldn't build pages in "flash" etc because the "bot" couldn't see
them and would just skate blindly over the top of them and probably
not index the page at all...
if its doing what I think it is it may not now care wether you coded
in "flash" as long as there is the basic minimum of html to get you a
position ....
I don't have a page currently running .swf ...if any one does ? ..When
you click on your "cached" page in google ...Do you see your movie ??
If so it must be "hotlinked" to your page in real time and using the
movie player "you" have installed on your machine to show the
movie....Ok
If this is the case people searching will shft relatively quickly to
the pages which are "interactive" ...and eventually google and the
other engines will notice the diversion in traffic and rerank
accordingly ....
Maybe those of us with "picture" or "multi media sites" will seee the
difference ?
from Google's group: google.public.support.general
--- To unsubscribe from the Virus list go to <http://www.lucifer.com/cgi-bin/virus-l>
This archive was generated by hypermail 2.1.5 : Fri Mar 19 2004 - 19:38:02 MST