Jump to content
TNG Community
Sign in to follow this  
beckwith

SiteMap Mod issues

Recommended Posts

beckwith

I have an issue with the SiteMap results for my site. I manage my data locally using PAF and upload weekly to TNG. The GEDCOM produced does not track the mod date for family records if I recall correctly. Thus, the tngsitemap.xml file incorrectly lists a lastmod date of the GEDCOM import.

Any ideas how to resolve this?

Also, why aren't the sitemap files compressed? Mine end up being greater than 90 MB which probably can compress down to 2MB with GZip. XML is notorious for being wasteful but is very very compressible.

Share this post


Link to post
Share on other sites
theKiwi

I don't know about the first part - I'm not sure what is in your GEDCOM file and what you've got TNG set to do when importing things with no date - if it is set to "Leave As Is" it should not have an update date for the families.

For the second part - I've updated the config file to include gzip compression on the .xml files.

I've tested it on my site, and have had Google test the resulting .gz file and it says it's OK.

To get this new Mod version go to http://LisaAndRoger.com/downloads/

Cheers

Roger

Share this post


Link to post
Share on other sites
beckwith

Wow, great work Roger! The total size is now a little over 3 MB.

I changed it to "Leave it as is" for the next import to see if the results are desirable.

I'll have to look at it carefully because I think it would not update families in the past. It has been so long since I set the import up I can't recall. I know it took me a few iterations.

Thanks for the timely response!

Share this post


Link to post
Share on other sites
Jay Wilpolt

I have an issue with the SiteMap results for my site. I manage my data locally using PAF and upload weekly to TNG. The GEDCOM produced does not track the mod date for family records if I recall correctly. Thus, the tngsitemap.xml file incorrectly lists a lastmod date of the GEDCOM import.

Any ideas how to resolve this?

Also, why aren't the sitemap files compressed? Mine end up being greater than 90 MB which probably can compress down to 2MB with GZip. XML is notorious for being wasteful but is very very compressible.

Not sure if you want, but I use SQL code to modify the media change date in tng after a newly imported gedcom and update the change date (modified date) of my media items for both persons and families to match the last change date for the related person.

If this is what you might be looking for , Let me know and I will post the sql code here.

Jay

Share this post


Link to post
Share on other sites
beckwith

Thanks Jay, but for me the change date that is wrong is the family record and not the media.

Changing "Leave it as is" did not seem to solve my problem.

I understand why the change date for the family is important, it just isn't exported in my GEDCOM. And pulling the date from the individual won't really help.

My concern is a bad change date may mess up Google. Thus, a bad sitemap may be worse than none at all.

Any other opinions or ideas?

Share this post


Link to post
Share on other sites
theKiwi

Thanks Jay, but for me the change date that is wrong is the family record and not the media.

Changing "Leave it as is" did not seem to solve my problem.

I understand why the change date for the family is important, it just isn't exported in my GEDCOM. And pulling the date from the individual won't really help.

My concern is a bad change date may mess up Google. Thus, a bad sitemap may be worse than none at all.

Any other opinions or ideas?

I don't understand exactly what you're asking about. Can you post the following:

Are you worried that an empty changed date in PAF, and so in TNG translates into the date you create the sitemapindex file, so in effect it says all families were updated the day you created the index?

My sitemapindex files have this situation, but according to Google, my site has over 50,000 URLs indexed.

Roger

Share this post


Link to post
Share on other sites
beckwith

Are you worried that an empty changed date in PAF, and so in TNG translates into the date you create the sitemapindex file, so in effect it says all families were updated the day you created the index?

Yes, that is my concern. Isn't this telling Google the families changed on this date which is not true? Doesn't this cause Google to reindex each of these pages for naught? Doesn't this defeat the purpose of the sitemap?

Share this post


Link to post
Share on other sites
theKiwi

Yes, that is my concern. Isn't this telling Google the families changed on this date which is not true? Doesn't this cause Google to reindex each of these pages for naught? Doesn't this defeat the purpose of the sitemap?

I don't know the answers to this, but it seems that when there is no date for either what you know is new or what you know is old, the choices are

use a new date and risk that possibly all of it gets recrawled

use a really old date which would mean the new stuff would likely get skipped.

Roger

Share this post


Link to post
Share on other sites
beckwith

I think my concerns may be misplaced. Google indicates you will NOT be penalized for using a site map. I would assume that means if it is used in error or with the issue I'm having.

On another note, I didn't see showtree.php included in the site map. This includes a description of each tree.

Share this post


Link to post
Share on other sites
theKiwi

I did not create this script - the original 2 authors have dropped off the scene so I'm carrying on keeping it going.

I think the original idea was to get the media items and the people and families indexed. The rest, including showtree is linked to from some or all of those other pages, so the crawler will get there of its own accord.

Roger

Share this post


Link to post
Share on other sites
beckwith

Thanks Roger, you have been very helpful.

I was under the impression site map was used in place of crawling. In other words, it will crawl just the pages in the site map and no others. To crawl other pages would just be redundant or irrelevant. This makes sense for genealogy and TNG in particular. There are many different ways to look at place, name and relation data but it basically breaks down to getperson, familygroup and showmedia. (having said that I can't find it actually documented this way...).

I suggested Showtree because it is the only part of the above I could think of not included, but no big deal.

Google is indexing about 14,000 pages a day out of the nearly 500,000 listed in my site map so it is going to take over a month.

Thanks again for your help!

Share this post


Link to post
Share on other sites
beckwith

This seems to be moot for my situation since Google stopped indexing after 150,000 pages out of nearly 500,000. What is odd is they continue crawling other pages but not from the sitemap.

Seems like such a waste.

Anyone else get similar results or know how to "kick" Google to use the sitemap.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×