A broken link cleanup checklist ensures you systematically identify, prioritize, and resolve 404s, redirects, and orphaned pages that harm user experience and crawl efficiency. This guide covers discovery tools, triage logic, and remediation tactics for sites of any scale.
Start with a full-site crawl using Screaming Frog Spider, Sitebulb, or OnCrawl set to follow all internal links. Configure the crawler to report 4xx and 5xx status codes, redirect chains longer than two hops, and orphaned URLs. Export the results and cross-reference against Google Search Console's Coverage report and the Page Indexing section to catch pages Google has tried to crawl but found broken. Also pull the Links report to identify external domains pointing to defunct URLs on your site—those carry link equity you don't want to waste. For Canadian bilingual sites, ensure the crawler respects hreflang and checks both English and French URL variants. If you run a large portfolio or e-commerce platform, consider using the crawl API or segmenting by subdirectory to avoid overwhelming your server. The goal is a master list: every broken link, where it's linked from, and its current HTTP status.
Not all 404s deserve the same urgency. Sort your list by inbound link count—pages with backlinks from authoritative domains or high internal PageRank should be fixed first via 301 redirects to the closest semantic match. Next, filter by historical traffic in Google Analytics or your analytics platform; URLs that drove meaningful visits in the past six to twelve months warrant redirection even if they lack external links. Pages with zero inbound links, zero traffic, and thin content can often be left as soft-404s or simply removed from your sitemap and internal navigation. On large Canadian e-commerce sites, seasonal or discontinued product pages fall into this category—redirect only if the product has a direct successor. Tag each URL in your spreadsheet with a priority tier: immediate (live backlinks or high traffic), medium (internal links from key pages), or low (orphaned, no equity). This triage prevents wasting hours on URLs that never mattered.
For high-priority 404s, implement server-side 301 redirects in your .htaccess, nginx.conf, or through your CMS redirect manager—never rely on JavaScript or meta-refresh. Choose the redirect target by topic relevance and user intent: if the old page was a specific service, send visitors to the updated service page or the closest category. If the content was merged into a broader guide, redirect there and add an anchor link so users land near the relevant section. For redirect chains you discovered, collapse them into a single hop by updating the earliest redirect to point directly to the final destination. Pages you've decided to abandon should return a true 404 status and be removed from your XML sitemap; submit the updated sitemap in Search Console and mark them noindex if they're still generating crawl requests. Document every redirect in a central spreadsheet with columns for old URL, new URL, date implemented, and reason—this log prevents future redirect loops and helps onboard new team members.
When you find valuable external sites linking to your 404s, reach out to the webmaster or content owner with a polite request to update the link to your new URL. Draft a short email explaining the redirect, provide the correct replacement link, and mention why the update benefits their readers—most publishers will comply if the ask is reasonable and the link still fits their content. For Canadian sites, consider bilingual outreach if the linking domain is Quebec-based or francophone. If the linking page is a resource list or directory, offer to verify other links on their page as a goodwill gesture. Track these requests in your spreadsheet and follow up once after two weeks if you hear nothing. Some links you'll never reclaim—outdated forum threads, archived news articles—but even recapturing a handful of authoritative links preserves equity that would otherwise evaporate. This step is often skipped yet yields disproportionate SEO value for minimal effort.
Broken links accumulate because sites lack editorial process around URL changes. Establish a rule: any time a page is deleted, moved, or merged, the responsible editor must file a redirect request in a shared task tracker—Asana, Monday, or even a Google Sheet—before publishing the change. For agencies managing client portfolios or multi-site networks, build this into your content checklist template. Schedule quarterly crawls as a maintenance ritual, not a crisis response; set a calendar reminder and budget two to four hours per site depending on size. If you're on WordPress, plugins like Redirection or Rank Math log 404 hits in real time and let you create redirects from the dashboard, reducing friction. On larger platforms, integrate crawl monitoring via OnCrawl or ContentKing to alert you when new 404s spike week-over-week. The objective is to catch breaks early—within days, not months—so they never accumulate into the thousand-URL cleanup projects that paralyze teams.
Redirect chains occur when URL A redirects to B, which redirects to C. Search engines follow only a limited number of hops and dilute equity at each step, so collapse these immediately by pointing A directly to C. Soft 404s return a 200 status but serve error content—often a CMS misconfiguration or a custom 404 template without the proper header. Identify them by crawling for pages with unusually low word counts or phrases like 'page not found' in the title; fix by updating the server response to true 404. Canonicalization conflicts arise when a broken URL is still canonicalized to itself or when multiple broken variants exist—review your canonical tags during the crawl and ensure they point to live, accessible pages. In Canada, check that bilingual URL pairs both resolve correctly and that hreflang annotations don't reference 404s. These edge cases often represent only five to ten percent of your broken-link list but disproportionately confuse crawlers and waste budget.
Quarterly audits work well for most sites, catching breaks before they accumulate and preserving crawl efficiency. High-churn sites—news portals, large e-commerce catalogs, or agencies managing dozens of domains—benefit from monthly or even automated real-time monitoring via tools like ContentKing. The key is regularity: a predictable schedule prevents the backlog that makes cleanup feel overwhelming.
Redirect only URLs with demonstrable value: inbound links, historical traffic, or relevance to current content. Low-value pages with zero equity can remain 404s and be excluded from your sitemap. Over-redirecting creates bloat and maintenance headaches, while strategic redirection preserves link equity and user experience. Document your decision criteria in your triage process.
A 404 returns an error status when requested; the page doesn't exist or the URL is wrong. An orphaned page is live and returns 200 but has no internal links pointing to it, making it effectively invisible to crawlers and users. Orphans often result from removed navigation or deleted parent pages. Fix orphans by adding internal links or, if they're obsolete, by deleting them or setting them to noindex.
Broken outbound links—links from your site to other domains that return 404—primarily harm user experience rather than rankings directly, though they signal poor maintenance if widespread. Prioritize fixing them on high-visibility pages like resource lists or cornerstone guides. Internal broken links are more damaging because they waste crawl budget, fragment PageRank flow, and frustrate users navigating your site. Address internal breaks first.
Crawl both language trees separately and verify that hreflang annotations on broken pages don't reference 404s in the alternate language. If you redirect an English URL, ensure the corresponding French URL is also checked and redirected to the appropriate French equivalent. Quebecois users expect functional bilingual navigation, so broken cross-language links damage trust and accessibility compliance. Document language pairs in your redirect log.
Pull the External Links report from Google Search Console under the Links section, then cross-reference those URLs against your crawler's 404 list. Alternatively, use Ahrefs or Majestic to see all backlinks pointing to your domain, filter by HTTP status if the tool supports it, and export URLs returning 4xx codes. These are your highest-value targets for immediate 301 redirects, as they represent live link equity at risk of loss.