Jump to content
TNG Community
wombmate

Google Sitemap Updated to Eliminate Living from being Indexed

Recommended Posts

wombmate

I updated my version of createsitemap to exclude living individuals from being indexed and submitted it to Google. (This Sitemap was submitted Jan 28, 2012 and processed Jan 28, 2012 per Goggle Webmaster Tool). The last check of my crawler access log indicates that living individuals are still being indexed although I have confirmed that the living IDs are not in the sitemap file.

Mon 30 Jan 2012 10:15:01 AM Individual info for Living (I4153) accessed by crawl-66-249-71-146.googlebot.com/66.249.71.146/66.249.71.146/.

Mon 30 Jan 2012 10:13:31 AM Individual info for Living (I4958) accessed by crawl-66-249-71-154.googlebot.com/66.249.71.154/66.249.71.154/.

Can anyone please explain why Google would still be indexing IDs of living individuals that are not in the latest sitemap file?

I have been pretty much out of the loop with recent upgrades due to family illnesses and perhaps this has been discussed before but could not find the answer in the forum. I am running 6.2.0 and in the process of upgrading to v8 in hopes that the upgrade will be complete prior to v9 being released. I am very excited about the next release but want to understand some of the Google nuisances in the process of the upgrade.

Thanks in advance.

Darlene

Share this post


Link to post
Share on other sites
theKiwi

I updated my version of createsitemap to exclude living individuals from being indexed and submitted it to Google. (This Sitemap was submitted Jan 28, 2012 and processed Jan 28, 2012 per Goggle Webmaster Tool). The last check of my crawler access log indicates that living individuals are still being indexed although I have confirmed that the living IDs are not in the sitemap file.

Mon 30 Jan 2012 10:15:01 AM Individual info for Living (I4153) accessed by crawl-66-249-71-146.googlebot.com/66.249.71.146/66.249.71.146/.

Mon 30 Jan 2012 10:13:31 AM Individual info for Living (I4958) accessed by crawl-66-249-71-154.googlebot.com/66.249.71.154/66.249.71.154/.

Can anyone please explain why Google would still be indexing IDs of living individuals that are not in the latest sitemap file?

Two points:

1 - the createsitemap tool that was created doesn't attempt to filter the content of the index - it's designed (not by me) to simply list out every person, by URL to their getperson page in the database - it doesn't care if the person at that URL is living or not.

2 - altering the createsitemap code to exclude Living people from the listing of ID numbers excludes them from being in what you tell Google that you have

but

unless the people that you've hidden are not linked AT ALL to the people who are not hidden, then Google is going to find them anyhow and attempt to index their page.

As an example,

if your grandparents are deceased, they'll be shown in your sitemap file.

so

Google will visit their page and find all their information,

plus

it will find the links to your grandparents' parents and grandparents' children. If they're deceased it will find their names there, if they've living it will find Living

it will then go in turn to each of those pages and index them. The deceased ones will be indexed by their name, the Living ones will be indexed as you've indicated from the logs as Living.

And if any of those Living have children who are Living, it will in turn travel to their page and index them as Living as well.

there is nothing you can (reasonably) do to prevent this.

So it just seems that hiding them from the sitemap file hasn't helped, so may well be the rationale as to why the original create sitemap code simply dumps out a list of the people in the database, without all the overhead of a query to see if they're alive or not first.

Roger

Share this post


Link to post
Share on other sites
wombmate

Roger,

Thank you for your reply and the information. My original script did not filter out living so when I found the createsitemap that did, it made perfect sense to me not fully understanding how Google traverses the records. Looking at my crawl log just now, I can see that living people attached to deceased parents, etc. are being indexed.

Out of curiosity, I Googled I568 (a living individual in my database) to see what would be returned and found a number of hits, none of which were mine but then I may not have drilled down far enough. Just seemed that finding "The details of this individual are private" wouldn't be worth the effort to index them.

Which begs the next question. Will the fact that I have filtered out living from my sitemap affect my site scoring, etc in Google in that more records are being indexed than what it is reflected in my current sitemap?

I do have a number of living individuals that I created records for that are not linked to a parent, wife, or child that I suspect are related but haven't found proof in order to link them to anyone. Without filtering them with the living flag, is there a way to filter then out of the sitemap or am I better off just leaving these alone?

Thanks again for the explanation and your time to answer. I can see that I have a lot to learn and hopefully, won't need to ask too many questions but honestly, want to understand how everything works!

Darlene

Share this post


Link to post
Share on other sites
theKiwi

Which begs the next question. Will the fact that I have filtered out living from my sitemap affect my site scoring, etc in Google in that more records are being indexed than what it is reflected in my current sitemap?

I don't know the answer to that - it might be that the only people who do know work at Google and they aren't telling <g>.

I do have a number of living individuals that I created records for that are not linked to a parent, wife, or child that I suspect are related but haven't found proof in order to link them to anyone. Without filtering them with the living flag, is there a way to filter then out of the sitemap or am I better off just leaving these alone?

Personally I don't think it's worth worrying about trying to filter the sitemap file, since the sitemap file itself contains no genealogy information, just pointers to your site. So as long as your TNG is doing the filtering based on Living and Privacy rights, Google isn't going to find anything you don't want it to.

Roger

Share this post


Link to post
Share on other sites
wombmate

Personally I don't think it's worth worrying about trying to filter the sitemap file, since the sitemap file itself contains no genealogy information, just pointers to your site. So as long as your TNG is doing the filtering based on Living and Privacy rights, Google isn't going to find anything you don't want it to.

Roger

I hadn't about it that way before and it makes sense. Thanks again for your time and logical answers!

Darlene

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×