Technical SEO in Next.js: Metadata, JSON-LD, and Sitemaps Done Right

Q: Should JSON-LD be server-rendered or client-side?

Server-rendered, always. Inject it as a script tag of type application/ld+json in a server component so it appears in the initial HTML. Many crawlers and answer engines execute little or no JavaScript, so structured data added after hydration may never be read. Server rendering guarantees it is present on first fetch.

Q: What is the difference between noindex and robots.txt disallow?

noindex removes a page from search results but keeps it crawlable, so the directive can actually be read. robots.txt disallow blocks crawling entirely, meaning the engine never sees a noindex and the URL can still surface as a bare link. To remove a page from results, use noindex and leave it crawlable.

Technical SEO is the part of search visibility that engineers actually control. It is not keyword guessing or link begging; it is making sure every URL ships correct metadata, valid structured data, an accurate sitemap, and clear indexing rules. Next.js gives you primitives for all of this, but the defaults are not always the right answer, and a few sharp edges quietly break indexing in ways that take weeks to surface. This guide covers how we wire technical SEO on the App Router in production, with opinions formed from running dozens of live Next.js sites.

The Metadata API is your source of truth

On the App Router, metadata lives in code. Every layout.tsx and page.tsx can export a static metadata object or a dynamic generateMetadata function, and Next.js merges them down the route tree. The merge is shallow per field, which is the single most important thing to internalize: a child title replaces the parent title, it does not extend it.

The cleanest pattern is to define a metadataBase and a title template once at the root, then let each page supply only what is unique to it.

// app/layout.tsx
import type { Metadata } from "next";

export const metadata: Metadata = {
  metadataBase: new URL("https://codeaustral.com"),
  title: {
    default: "CodeAustral — Web & AI Studio",
    template: "%s | CodeAustral",
  },
  description: "Design and software studio building web platforms and AI products.",
  openGraph: {
    type: "website",
    siteName: "CodeAustral",
    locale: "en_US",
  },
  twitter: { card: "summary_large_image" },
};

metadataBase matters more than it looks. Without it, relative URLs in openGraph.images, canonical, and other fields resolve against localhost during build and against an unpredictable origin at runtime. Set it once, use root-relative image paths everywhere, and stop thinking about it.

For dynamic routes, generateMetadata runs on the server and can fetch the same data your page renders. Next.js dedupes identical fetch calls within a request, so calling your CMS in both generateMetadata and the page component does not double the cost.

// app/blog/[slug]/page.tsx
import type { Metadata } from "next";

export async function generateMetadata(
  { params }: { params: Promise<{ slug: string }> }
): Promise<Metadata> {
  const { slug } = await params;
  const post = await getPost(slug); // deduped with the page body
  if (!post) return { title: "Not found" };

  return {
    title: post.title,
    description: post.excerpt,
    alternates: { canonical: `/blog/${slug}` },
    openGraph: {
      type: "article",
      title: post.title,
      description: post.excerpt,
      publishedTime: post.publishedAt,
      images: [{ url: post.cover, width: 1200, height: 630 }],
    },
  };
}

Note that in current Next.js, params is a promise you must await. Forgetting this is a common upgrade-time bug that produces empty metadata.

Canonicals: the field most teams get wrong

A canonical tag tells search engines which URL is the authoritative version of a page. Get it wrong and you either split ranking signals across duplicates or, worse, point everything at the wrong URL and watch real pages drop out of the index.

Rules we hold to:

Self-referential canonicals on every indexable page. A page should declare itself canonical unless it is genuinely a duplicate.
Set them via `alternates.canonical` relative to `metadataBase` so the origin stays consistent across environments.
Strip query parameters that do not change content (utm_*, ?ref=, pagination tokens that reorder the same set). The canonical should be the clean URL.
Never canonicalize a paginated series to page one. Page two is not a duplicate of page one; canonicalize each page to itself.
Match the canonical to the indexable URL exactly — trailing slash, casing, and www must align with what your redirects produce, or you create a redirect-then-canonical mismatch that wastes crawl budget.

For internationalized sites, pair canonicals with alternates.languages to emit hreflang. Each locale variant points its canonical at itself and lists the others as alternates, including an x-default.

OpenGraph and the social card that actually renders

OpenGraph data drives how your links look in Slack, LinkedIn, iMessage, and search-result rich previews. The failure mode is almost always the image: wrong dimensions, missing metadataBase, or a dynamically generated card that times out.

Two reliable approaches:

Static cover images referenced by absolute-from-base URL at 1200x630. Simple, cacheable, no runtime cost.
Dynamic OG images via a route that uses ImageResponse from next/og. Powerful for per-post cards, but it runs on the edge runtime and has real constraints — limited CSS, no arbitrary fonts unless you ship the font file, and a payload budget. Keep these templates boring.

A subtle production gotcha: next/og ImageResponse only supports a subset of CSS (flexbox, no grid, limited filters). Designs that look fine in a browser silently fall back or fail in the OG renderer. Test the actual generated image, not the preview.

Structured data with JSON-LD

Structured data is how you hand machine-readable facts to search and answer engines. On Next.js, inject JSON-LD as a <script type="application/ld+json"> in the server-rendered output. Do not use a third-party React helper that hydrates client-side — crawlers should see it in the initial HTML.

// In a server component
export default function ArticlePage({ post }: { post: Post }) {
  const jsonLd = {
    "@context": "https://schema.org",
    "@type": "Article",
    headline: post.title,
    description: post.excerpt,
    image: `https://codeaustral.com${post.cover}`,
    datePublished: post.publishedAt,
    dateModified: post.updatedAt ?? post.publishedAt,
    author: { "@type": "Organization", name: "CodeAustral" },
    publisher: {
      "@type": "Organization",
      name: "CodeAustral",
      logo: {
        "@type": "ImageObject",
        url: "https://codeaustral.com/logo.png",
      },
    },
  };

  return (
    <>
      <script
        type="application/ld+json"
        dangerouslySetInnerHTML={{ __html: JSON.stringify(jsonLd) }}
      />
      {/* article markup */}
    </>
  );
}

Which schema types to ship

Organization (or LocalBusiness): once, sitewide, in the root layout. Establishes the entity, logo, sameAs social profiles, and contact info. This is what feeds knowledge-panel and brand recognition.
Article / BlogPosting: per editorial page. headline must match the visible title and stay under 110 characters or Google truncates it.
BreadcrumbList: on any page with a hierarchy. It produces the breadcrumb trail in results and reinforces site structure.
FAQPage: only when the questions and answers are genuinely visible on the page. Marking up hidden content is a guideline violation and gets manual actions.
Product / Offer / Review: for commerce. Never emit aggregateRating without real, verifiable reviews — fabricated ratings are the fastest route to a structured-data penalty.

Validate everything against schema.org and the Rich Results Test. The two most common errors are missing required properties (Article needs headline, image, and a date) and mismatches between the JSON-LD and what the page actually shows.

Sitemaps and robots as code

Next.js generates both from convention files. A sitemap.ts exporting a default function produces /sitemap.xml; a robots.ts produces /robots.txt. Generate them from the same data source that builds your pages so they never drift.

// app/sitemap.ts
import type { MetadataRoute } from "next";

export default async function sitemap(): Promise<MetadataRoute.Sitemap> {
  const posts = await getAllPosts();
  const base = "https://codeaustral.com";

  const staticRoutes = ["", "/work", "/blog", "/contact"].map((path) => ({
    url: `${base}${path}`,
    lastModified: new Date(),
    changeFrequency: "weekly" as const,
  }));

  const postRoutes = posts.map((p) => ({
    url: `${base}/blog/${p.slug}`,
    lastModified: new Date(p.updatedAt ?? p.publishedAt),
    changeFrequency: "monthly" as const,
  }));

  return [...staticRoutes, ...postRoutes];
}

Practical sitemap discipline:

Only include indexable, canonical, 200-status URLs. A sitemap full of redirects and noindex pages teaches crawlers to trust it less.
Make `lastModified` honest. Bumping every URL to now on every build trains engines to ignore the field. Use the real content modification date.
Split above ~50,000 URLs or 50 MB using generateSitemaps to produce an index. Most sites never hit this; pSEO factories do.
Reference the sitemap from `robots.txt` and submit it once in Search Console. Resubmission is unnecessary; engines refetch on their own cadence.

// app/robots.ts
import type { MetadataRoute } from "next";

export default function robots(): MetadataRoute.Robots {
  return {
    rules: { userAgent: "*", allow: "/", disallow: ["/admin", "/api"] },
    sitemap: "https://codeaustral.com/sitemap.xml",
    host: "https://codeaustral.com",
  };
}

Indexing control: the difference between robots.txt and noindex

These two are constantly confused and the distinction is operationally critical:

`robots.txt disallow` blocks crawling. The page will not be fetched — but it can still appear in results as a bare URL if other sites link to it, because the engine never sees the noindex it cannot crawl.
`noindex` (via the robots metadata field) blocks indexing. The page is crawled, the directive is read, and it is dropped from results.

If you want a page gone from search, use noindex and let it remain crawlable. If you disallow it in robots.txt, the crawler can never read the noindex and the URL can linger. Reserve robots.txt disallow for things you genuinely never want fetched: admin panels, API routes, faceted-filter URL explosions.

Set per-page indexing in metadata:

export const metadata = {
  robots: { index: false, follow: true },
};

For staging environments, gate the entire site behind noindex at the root layout and flip it via an environment variable — never ship a staging deploy that is crawlable on a public domain.

GEO and AEO: writing for answer engines

Generative engines (ChatGPT, Perplexity, Google's AI surfaces) and classic search increasingly draw from the same signals, but they reward slightly different structure. Generative Engine Optimization (GEO) and Answer Engine Optimization (AEO) are mostly good information architecture made explicit:

Lead with the answer. Put a direct, self-contained statement near the top of a section. Answer engines extract spans; they prefer paragraphs that stand alone without the surrounding context.
Use real headings as questions. An H2 or H3 phrased as the question a user asks maps cleanly onto how engines retrieve and cite.
Ship FAQ schema for genuine Q&A. It both enables rich results and gives extractors clean, labeled pairs.
Keep facts in server-rendered HTML. Answer engines crawl with limited or no JavaScript execution. Content that only appears after hydration may be invisible to them.
Be citable. Specific, verifiable claims with clear attribution get quoted; vague marketing copy does not.

Many teams now add an llms.txt file at the root to point AI crawlers at canonical documentation. It is a convention, not a standard, and not yet honored by the major engines as a ranking input — treat it as low-cost and optional, not as a substitute for clean HTML and structured data.

A pre-launch SEO checklist

Before any Next.js site goes live, we verify:

metadataBase set; titles use a template; every page has a unique title and description.
Self-referential canonicals on indexable pages; hreflang correct on multilingual routes.
OG image renders at 1200x630 and resolves to an absolute URL.
JSON-LD present in initial HTML and passing the Rich Results Test.
sitemap.xml lists only canonical 200 URLs with honest lastModified.
robots.txt allows production, references the sitemap, and staging is noindex.
No accidental noindex left over from a template, and no disallow blocking pages you want indexed.

Frequently Asked Questions

Does Next.js handle SEO automatically?

Partly. The App Router gives you server-rendered HTML, a Metadata API, and convention-based sitemap.ts and robots.ts. But it does not write your canonicals, structured data, or indexing rules. Those require deliberate configuration. The defaults produce a crawlable site; they do not produce a well-optimized one.

Should JSON-LD be server-rendered or client-side?

Server-rendered, always. Inject it as a <script type="application/ld+json"> in a server component so it appears in the initial HTML. Many crawlers and answer engines execute little or no JavaScript, so structured data added after hydration may never be read. Server rendering guarantees it is present on first fetch.

What is the difference between noindex and robots.txt disallow?

noindex removes a page from search results but keeps it crawlable, so the directive can actually be read. robots.txt disallow blocks crawling entirely, which means the engine never sees a noindex and the URL can still surface as a bare link. To remove a page from results, use noindex and leave it crawlable.

How do I make a Next.js site rank in AI answer engines?

Lead each section with a direct, self-contained answer, phrase headings as real questions, and ship FAQ and Article structured data for genuine content. Keep all facts in server-rendered HTML since answer engines often skip JavaScript. Be specific and verifiable — concrete, attributable claims get cited far more than generic marketing copy.

Do I need to resubmit my sitemap after every deploy?

No. Submit the sitemap once in Search Console and reference it from robots.txt. Search engines refetch on their own schedule based on how often your content changes. Keep lastModified dates honest so engines learn your update cadence; bumping every URL on every build trains them to ignore the signal.

Working with CodeAustral

We build and ship production Next.js applications where technical SEO is wired in from the first commit, not bolted on before launch. If you have a site that is not getting indexed, a migration that tanked rankings, or a new build that needs to be discoverable and citable from day one, send us a short brief at codeaustral.com/contact and we will tell you exactly where the leverage is.