Jump to content
TNG Community
steve30000

TNG Sitemap Creator for Google

Recommended Posts

steve30000

I've developed a hack to create a sitemap directly from the TNG database. This page will create a simple XML file which will include URLs for all your individual people. I'm not much of a programmer but I hacked this together and think it should work fine. I uploaded my site to Google last night and it verified the file... just waiting for them to google my site now. Right now, it is pretty limited in what it does as it only includes links to the person page as I thought these were the most important to get googled. I plan to add additional URLs as well if people are interested. Let me know if there are features that you can think of!

Sitemap Creator

Steve

Share this post


Link to post
Share on other sites
genelea

Sure works great! I got a full 49999 and need it to get the other 48000 but I bet that will come it would be nice to do trees if it could be done?? I myself keep my data in trees and keep them under 20,000 The server seems to give me trouble if I get them larger?? Why Who Knows no one has been able to answer that one!

Thanks Much

Gene

Share this post


Link to post
Share on other sites
mwilbers

Steve,

A great idea! (Why didn't I think of that laugh.gif The program excels by it's simplicity! I've been a programmer myself for almost 25 years now, but in PHP (yet)).

Anyway. I think a big thing to search for is "Places". You usually know, or can guess a place of birth, and I often search for a name plus place of birth, so I get less hits. Seems a good addition. Maybe you can offer checkboxes on the page, so you can choose weather or not to add places?

Maarten

www.maartenwilbers.nl

PS

I hope not too much spelling errors...I'm Dutch, so English is not my native language.

Share this post


Link to post
Share on other sites
pjelstrup

Steve,

What a good idea...! biggrin.gif

I actually came up with a similar idea around the time where the sitemaps threads began.

You don't have to manually copy/paste the result into a static .xml file since you can submit at .php file directely as a sitemap file with Google.

I have don this and it works lika a charm. Google fetches the sitemap around once a day.

Your can see the output of my file here: http://tng.jelstrup.dk/tngsitemap.php

I have give the pages individual priority as follows (0.5 default if unspecified):

1.0 index.php

0.9 surnames-all

0.8 surnames-oneletter.php

0.7 getperson.php

This way I should ensure that the pages are presented in above order when Googled. (However Google doesn't guarantee anything)

Change dates are as follows:

index: the timestamp of last request of the tngsitemap.php file

surnames-all: newest changedate in the people table

surnames-oneletter: newest changedate in the people table within the group of people that have a particular first character in their lastname

getperson: every individual's changedate

I have around 3700 people in my TNG database, so the responsetime of the file is acceptable.

I plan to modify the page so it can either function as a sitemap file for smaller databases, say < 20000 individuals, or as a siteindex file for larger databases pushing or exceeding the limit of 50000 links in one sitemap file.

Brgds. Peter Jelstrup

http://tng.jelstrup.dk

*** This msg. cross-posted on the tngusers2 mailing list ***

Share this post


Link to post
Share on other sites
steve30000

My thought is to add checkboxes so you can select which URLs you want to add. I only have about 7000 people in my database so I don't have to worry about the URL limit but I have a feeling many people will need multiple sitemaps especially if you are adding families, places, etc.

I'll let you know when I have an updated version.

Steve

Share this post


Link to post
Share on other sites
steve30000

Steve,

What a good idea...!  :D  

I actually came up with a similar idea around the time where the sitemaps threads began.

You don't have to manually copy/paste the result into a static .xml file since you can submit at .php file directely as a sitemap file with Google.

I have don this and it works lika a charm. Google fetches the sitemap around once a day.

Your can see the output of my file here: http://tng.jelstrup.dk/tngsitemap.php

I have give the pages individual priority as follows (0.5 default if unspecified):

1.0 index.php

0.9 surnames-all

0.8 surnames-oneletter.php

0.7 getperson.php                

This way I should ensure that the pages are presented in above order when Googled. (However Google doesn't guarantee anything)

Change dates are as follows:

index: the timestamp of last request of the tngsitemap.php file

surnames-all: newest changedate in the people table

surnames-oneletter: newest changedate in the people table within the group of people that have a particular first character in their lastname

getperson: every individual's changedate  

I have around 3700 people in my TNG database, so the responsetime of the file is acceptable.

I plan to modify the page so it can either function as a sitemap file for smaller databases, say < 20000 individuals, or as a siteindex file for larger databases pushing or exceeding the limit of 50000 links in one sitemap file.

Brgds. Peter Jelstrup

http://tng.jelstrup.dk

*** This msg. cross-posted on the tngusers2 mailing list ***

Wow, your page looks great. Guess I'm behind the times. I emailed you about your future plans for the page.

Share this post


Link to post
Share on other sites
arnold

We have over 241,000 individuals in our database. From what I think I read above, this may be far too many individuals.

I went to Google.com to read about its requirements for a sitemap, but found nothing about limitations.

Any input would be appreciated.

ADDED: I was able to find where Google does set a 50,000 limit on URLs.

Too many URLs

The list of URLs in your Sitemap exceeds the maximum allowed. A Sitemap can contain no more than 50,000 URLs. Split your Sitemap into multiple Sitemaps and ensure that each contains no more than 50,000 URLs. You can also use a Sitemap index to manage your Sitemaps. Then, submit your Sitemap index or your Sitemap files individually.

Share this post


Link to post
Share on other sites
steve30000

Yeah, the URL limits on the sitemaps will definitely be issues with a lot of TNG sites. Luckily, my piddily little site doesn't run into that problem so I didn't have to address it in the first version. laugh.gif

Share this post


Link to post
Share on other sites
pjelstrup

Steve,

You are by no means behind the times!

And your solution could very well turn out to be the solution for grafty people like Arnold with their very large databases.

I'm not sure the automatic solution will be able to respond in time if the number of individuals become to high.

I'll respond to your post on the mailing list regarding my future plans for the page and cross-post her eif needed.

Brgds. Peter

Share this post


Link to post
Share on other sites
cniemira

Excellent work, and I agree - great idea!

I'd like to offer one small suggestion... you might want to toss something like this near the top of your code (after you include the config file):

$tngdomain = preg_replace('//$/', '', trim($tngdomain));

Otherwise the sitemap may generate links with a double-slash in them. I had this problem with my calendar app when using $tngdomain. Some people have a trailing slash, others don't. A double-slash in a URL is usually completely harmless, but technically incorrect.

Again, great work!

Share this post


Link to post
Share on other sites
steve30000

Played around with it a little more last night. Have it writing directly to a file now so you don't have to copy and paste. I think this will have to be the way to go because I think google will timeout while trying to process large files online.

I'm in the process of making it so it can handle more than 50000 URLs. However, Valentine's Day and work are slowing me down a bit :wink: Hopefully I'll post an updated file tonight.

Steve

Share this post


Link to post
Share on other sites
steve30000

I just uploaded a new version of the TNG Sitemap Creator file. It can be downloaded from my website at Download TNG Sitemap Creator

The new version can handle an unlimited number of people. It will also write the sitemap(s) directly to your server so you do not have to copy and paste and upload anymore. It will also create a sitemap index file so that if you have multiple sitemaps, you can give google your sitemap index file name and it will tell google where the individual sitemaps are located. That way you do not have to have list multiple sitemaps on google. I'm still working on adding other features so that you can customize what other information you can include in your sitemap (ie families, notes, etc.).

Please let me know of any other suggestions.

Steve

www.hooverfamily.com

Share this post


Link to post
Share on other sites
pjelstrup

Beautiful Steve!

Just the sort of thing I was thinking off.

Did you know that you can actually let Google know automatically that you have updated your sitemap index and files by submitting a html request?

If you're interested - please find more info on this page:

http://www.google.com/webmasters/sitemaps/...ubmit.html#ping

Brgds. Peter

Share this post


Link to post
Share on other sites
steve30000

Excellent work, and I agree - great idea!

I'd like to offer one small suggestion... you might want to toss something like this near the top of your code (after you include the config file):

$tngdomain = preg_replace('//$/', '', trim($tngdomain));

Otherwise the sitemap may generate links with a double-slash in them. I had this problem with my calendar app when using $tngdomain. Some people have a trailing slash, others don't. A double-slash in a URL is usually completely harmless, but technically incorrect.

Again, great work!

Thanks for this little piece of code. If you are on the listserv, you saw that this problem already came up. I inserted your code into the script and it worked great. Thanks again.

Steve

Share this post


Link to post
Share on other sites
steve30000

Beautiful Steve!

Just the sort of thing I was thinking off.

Did you know that you can actually let Google know automatically that you have updated your sitemap index and files by submitting a html request?

If you're interested - please find more info on this page:

http://www.google.com/webmasters/sitemaps/...ubmit.html#ping

Brgds. Peter

Thanks. I have to look at the automatic requests a little more and think about how to implement it. Guess I need to get around to putting some type of front end on this script so people can choose some options instead of forcing them to do just what I want :wink:

Share this post


Link to post
Share on other sites
steve30000

Looks like the sitemap has finally caught google's attention. Found a ton of these in my log today just going through individual after individual:

Wed 22 Feb 2006 06:54:03 AM Individual info for Anne Charlotte Petit (I1945) accessed by crawl-66-249-65-77.googlebot.com.

Wed 22 Feb 2006 06:05:07 AM Individual info for Angelique Huard (I439) accessed by crawl-66-249-65-77.googlebot.com.

Wed 22 Feb 2006 06:05:07 AM Individual info for Jemima Tracy (I546) accessed by crawl-66-249-65-77.googlebot.com.

Share this post


Link to post
Share on other sites
steve30000

I was looking at doing some updates to this script to add additional pages to the sitemap but was wondering what additional pages people think their should be? I've added families to the script but beyond that I'm not sure what other pages would hold value in terms of being located in Google. I've also thought about adding a space where you could input other non-TNG pages in case you have other portions of your website. Any suggestions?

Thanks,

Steve

Share this post


Link to post
Share on other sites
steve30000

Just checked my site information on google tonight... the mapped pages finally started showing up:

Results 1 - 10 of about 18,800 from hooverfamily.com/genealogy_new

:-D:-D:-D:-D:-D

I have noticed though that it hasn't spidered any pages other then the individual people pages in the sitemap.

Share this post


Link to post
Share on other sites
Ed Barnard

I just purchased TNG today, so can't share any experiences as yet! This is a quick thanks for the sitemaps plugin. I definitely need it! If I add code to automatically ping google, I'll be glad to feed that back to you :)

Share this post


Link to post
Share on other sites
steve30000

Just posted version 0.61. There was a bug in version 0.6 for users who had greater than 50000 URLs. It was not creating multiple files properly. Version 0.61 fixes it.

Download

Share this post


Link to post
Share on other sites
arnold

Just posted version 0.61. There was a bug in version 0.6 for users who had greater than 50000 URLs. It was not creating multiple files properly. Version 0.61 fixes it.

Download

Steve3000,

When I click on Download, the file to be downloaded is still v0.60, not v0.61. It is the same file as before which would not provide multiple files.

Thanks.

Share this post


Link to post
Share on other sites
steve30000

Steve3000,

When I click on Download, the file to be downloaded is still v0.60, not v0.61. It is the same file as before which would not provide multiple files.

Thanks.

Fixed....sorry about that.

Share this post


Link to post
Share on other sites
steve30000

For those of you that have updated to TNGv6, you will need to download an update to the createsitemap script. The change date is formatted differently in version 6 and was causing google to reject the sitemaps. This version fixes that. This version is also compatible with TNGv5.

Download Here

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×