4 Steps to Building a Content Knowledge Graph


Knowledge graphs have been central to semantic technology for decades. From healthcare to eCommerce to fraud detection to SEO, knowledge graphs empower organizations to harness the full potential of their information architecture.

But even with a long history, knowledge graphs are more relevant now than ever. According to Gartner’s Emerging Tech Impact Report, a robust knowledge graph is imperative for organizations looking to implement generative AI technologies. Knowledge graphs can help organizations ground their LLMs (e.g., internal chatbots) and machines in factual data about the organization. They can also influence how a brand is represented in search results.

You may be interested in building a knowledge graph but need help figuring out where to start. The good news is that if you have a website, you can construct a reusable content knowledge graph that supports both SEO and your internal AI initiatives.

This article will take you through the four steps of building a content knowledge graph using the Schema.org vocabulary.

Why should you use Schema.org to build your content knowledge graph?

You can create a knowledge graph using any number of ontologies, vocabularies, or glossaries. However, to maximize its SEO benefits, we recommend using the Schema.org vocabulary to create your content knowledge graph.

Help search engines clearly understand and contextualize the content on your web page

The Schema.org vocabulary was created by major search engines as an industry-standard vocabulary for translating human-readable web content into a language that machines understand. By using this vocabulary to construct your knowledge graph, you’re also reaping the SEO benefits that come with it, including:

  • Equipping search engines with an accurate understanding of your brand content
  • Facilitating accurate and pertinent search queries that closely match your content
  • Driving more targeted, engaged, and quality traffic to your site

Achieve rich results and stand out in search

You can also achieve rich results on Google when you markup your page content using certain Schema.org types. By annotating your web content with the required types and properties, search engines like Google may award visually enhanced search features for content like Products, Videos, Recipes, and Ratings. Presenting key information directly in the SERP as a rich result can increase click-through rates and drive more engagement and quality traffic to your pages.

Building Your Content Knowledge Graph

In the book Knowledge Graphs: Methodology, Tools and Selected Use Cases, Semantic Web and Knowledge Graph Experts, Fensel et al., broke down the process of creating a knowledge graph into these four steps:

  1. Knowledge Creation,
  2. Knowledge Hosting,
  3. Knowledge Curation, and
  4. Knowledge Deployment.

We’ve applied an SEO lens to these steps to teach you how to create a robust content knowledge graph using your organization’s web content. Let’s get started.

Step 1: Knowledge Creation

The first step to building a robust content knowledge graph is having high-quality, original content on your website and marking up the content using the Schema.org vocabulary.

Have high-quality content on your website

Consider what you want your organization to be known for, whether you are providing your visitors with education, answering common questions, enhancing user experience, etc.

Beyond this, Google has shared guidelines on what it deems “helpful, reliable, people-first content.” This is an excellent resource that provides a series of questions you can use to assess the quality of your content. For example, you’ll want to ensure your content provides:

  • Original information, reporting, research, or analysis
  • A substantial, complete, or comprehensive description of the topic
  • Substantial value when compared to other pages in search results

Marking up your content using the Schema.org vocabulary

Once you align your web content with these guidelines, you must annotate it using the Schema.org vocabulary’s types and properties in the form of Schema Markup to start building your content knowledge graph. This translates the human-readable content on your website into machine-readable RDF triples. While these triples can be expressed in various formats, Google recommends using JSON-LD.

Include URIs in your Schema Markup to disambiguate your entities
To develop your content knowledge graph, you must include Uniform Resource Identifiers (URIs) in your Schema Markup.

In JSON-LD, these identifiers appear as @ids to give the entities in your markup a unique identity that disambiguates and differentiates them from all other entities – similar to how a social security number can uniquely differentiate people who may share the same name.

Image of JSON-LD code on the left and RDF triple equivalent on the right

On the left, Schema Markup is expressed in JSON-LD and identifies the entity “Mark van Berkel” and his relationship with “Schema App” – Mark van Berkel worksFor Schema App.

On the right is the JSON-LD represented as an RDF triple showcasing the same statement, except “Mark van Berkel” and “Schema App” are identified by their URIs / @ids. By having these URIs, you can link the entities on your site within your markup and help search engines identify the entities within your knowledge graph.

While Schema Markup still provides SEO value without including @ids, they are a requirement for the markup to become a reusable knowledge graph.

How to Apply JSON-LD to Web Pages
There are a few options for implementing Schema Markup on your web pages. You can manually author the JSON-LD and insert it in your HTML code, or you can use a plugin to generate and deploy the markup on your site.

These options require expertise to implement and are not scalable if you have a large number of pages to mark up. If you want to customize your markup and ensure it is dynamic and connected, we recommend using the Schema App Highlighter to generate and deploy your markup at scale without having to do any manual coding.

Whatever method you choose, after adding Schema Markup to your pages, it appears as a block of code in the HTML, making it available for search engines and other web crawlers. In this state, your webpage content transforms into semantically enriched data that can be collected and stored as a knowledge graph.

An image depicting the process of webpage content being transformed into JSON-LD, and then that JSON-LD being expressed as connected RDF triples.

Step 2: Knowledge Hosting

To reuse your Schema Markup as a knowledge graph, you must collect and host the authored markup.

There are two ways of collecting the Schema Markup once it has been applied to a website.

Collecting the Schema Markup

1. Crawling: Where a crawler crawls a website, extracts the JSON-LD that has been applied, and stores it in a knowledge graph.

2. Mapping: Many tools that map content to Schema.org will also store that markup in a knowledge graph.

But how does this storage occur?

Storing Data

Because knowledge graphs are represented as RDF triples, the best place to store them for easy retrieval is an RDF database or triplestore. There are a variety of RDF stores available, some open source, but most proprietary. Examples include:

  • OpenLink Virtuoso
  • Ontotext GraphDB
  • Amazon Neptune
  • Stardog
  • AllegroGraph

For more information and to compare the various options, check out DB-engines.com. They rank the popularity of database management systems and provide helpful analysis.

Retrieving Data

You can retrieve RDF data from a database or triplestore using SPARQL – an RDF query language. In the simplest terms, SPARQL uses known information to find unknown information (variables) using pattern matching.

For example, we could write a SPARQL query that says, “Find all the people in my database who work for Schema App and know about semantic technology.” “Mark van Berkel,” our co-founder, would return as a match, and so would all other entities in our knowledge graph that match the same criteria.

When you add Schema Markup to your website using Schema App’s authoring tools, we host that data for you in our Knowledge Graph Data Platform. You can query your own graph using the SPARQL endpoint interface in your account. You can also use our Export Data API to export your knowledge graph for reuse in other contexts.

Once you have found an appropriate way to host your knowledge graph, you can move on to curation.

Step 3: Knowledge Curation

Knowledge curation can be taken in many different directions and doesn’t really have a simple, straightforward solution. That said, we will address 3 of the most important aspects of curating your data to build a robust content knowledge graph.

In the knowledge curation step, you should ensure that the data within your content knowledge graph is:

  • Accessible
  • Correct
  • Complete

Let’s break those down further.

Accessible

The data in your knowledge graph needs to be available.

For example, when extracting your content knowledge graph from your website, you’ll want to ensure that none of your web pages run into issues like “404 not found” errors. You will also want to ensure that the RDF store you’ve selected for hosting keeps your data retrievable and secure.

Correct

The markup syntax has to be free of errors
That means that your JSON-LD syntax doesn’t have errors like missing commas or brackets in the wrong places.

The markup must align with the content on the page
If you make content changes to your page without updating your markup, your triple could be inaccurate.

Assessing whether your triples are correct and up-to-**** can be difficult depending on the size of your dataset and how you manage your Schema Markup. This is especially true if you implement your markup manually. Data cleanup becomes complex and resource-intensive over time.

Therefore, we recommend using a dynamic Schema Markup generator tool like the Schema App Highlighter to ensure your page’s markup always aligns with its content and your triples remain correct.

The markup follows the Schema.org vocabulary guidelines
You also need to ensure that your entity types use the most descriptive properties and that the properties used connect to expected types. For example, I can’t say that a Person worksFor another Person, because Schema.org states that the worksFor property expects to connect a Person to an Organization.

This connection between the entities on your site is critical to the structure of your knowledge graph, as it showcases the relationships between the entities across your site.

By connecting your entities with other entities defined on your website, you provide an extra layer of context to the data, which can help machines infer more from your knowledge graph.

Complete

Ensure your knowledge graph contains enough data to answer queries relevant to your use cases. For example, if you want to know the correlation between ratings for products of specific sizes, colors, or prices, those properties must exist in your data.

In cases where your content references well-known entities (like brands, people, places, or concepts), you may also want to implement entity linking. Entity linking is a process that identifies entities in text and links them to corresponding known entities from external knowledge bases like Wikipedia, DBpedia, and Google’s Knowledge Graph.

You can apply entity linking:

  1. Manually for absolute precision
  2. Automatically using Natural Language Processing APIs

Once embedded in your markup, these entities provide additional SEO value by helping search engines like Google disambiguate and contextualize your content to provide more accurate results for search queries. When it comes to your content knowledge graph, entity linking makes your knowledge graph more descriptive, providing an richer data layer for you to reuse.

Step 4: Knowledge Deployment

During the knowledge deployment stage, your organization publishes the Schema Markup in a consumable format. This enables you to reuse the data for various aspects of your operations or strategies. In fact, it helps to think of the “Deployment” stage as the “Reuse” stage.

If you’ve authored your knowledge graph with Schema.org, ensure you publish your Schema Markup externally for search engines to consume and to reap the SEO benefits we’ve previously described. Moreover, you can reuse this data (knowledge) to support other internal and external projects.

Enhancing User Experience

You can utilize your content knowledge graph to improve website navigation and internal search functionality.

For example, a user visits a product page on an eCommerce site for smartphones. You can use a knowledge graph to enhance the user experience by providing complementary items and frequently purchased together products.

The knowledge graph enables the website’s recommendation engine to dynamically generate suggestions based on the products being viewed. This can appear as a “You May Also Like” section or complementary products, like phone cases or chargers, suggested during checkout. This enhanced user experience can significantly increase engagement and conversion rates.

Content Optimization

You can use your content knowledge graph to optimize existing content or identify gaps in your content.

For instance, your organization likely publishes blog posts on various topics. With a content knowledge graph, you can analyze the connections among entities in your blog posts. This analysis helps you pinpoint clusters of related topics or categories that have more coverage. If you notice gaps in the topics your organization wants to emphasize in the knowledge graph, you can create additional content to fill those gaps.

AI and Machine Learning Applications

Organizations can use knowledge graphs to accelerate their AI initiatives, including Chatbots and other LLM functions.

Knowledge graphs provide a foundation for training AI and machine learning ****** for tasks such as natural language processing, recommendation systems, and predictive analytics. Knowledge graph data is already structured, making it easier for machines to process than unstructured content (natural language). This makes using AI less costly as use continues to scale.

Large Language ****** can also leverage knowledge graphs for Retrieval-Augmented Generation (RAG), resulting in more accurate answers to queries.

Overall, knowledge deployment transforms the knowledge graph’s theoretical structure into practical applications that drive tangible benefits for your organization and its stakeholders.

Developing a Content Knowledge Graph for Your Organization

Although creating a content knowledge graph has only four steps, implementing these steps can be resource-intensive. However, with the numerous possibilities for reuse, building a content knowledge graph is a worthwhile investment that will yield a strong return as semantic search, AI, and knowledge management continue to evolve.

At Schema App, we can help you implement your Schema Markup data layer and develop a semantically enriched reusable content knowledge graph to prepare your organization for AI and support your semantic SEO efforts.

Get in touch with our team to learn more.

Image of Jasmine Drudge-Willson

Jasmine is the Product Manager at Schema App. Schema App is an end-to-end Schema Markup solution that helps enterprise SEO teams create, deploy and manage Schema Markup to stand out in search.



Source link

Social media & sharing icons powered by UltimatelySocial
error

Enjoy Our Website? Please share :) Thank you!