Skip to content

Welcome to SangSue

SangSue is a Python-based web scraping and exploration tool designed to traverse websites, discover links, and gather essential information about web pages within a given domain. This versatile tool empowers users to perform in-depth inspections of websites, map out the website structure, extract valuable data for analysis, and even perform Web Page Classification, categorizing pages based on their content.

Key Features

  • Web Crawling: Automatically explore a specified domain, starting from a given URL.
  • Depth Control: Define the maximum depth of exploration to focus on specific areas of a website.
  • URL Validation: Ensure that only valid URLs are processed to maintain data accuracy.
  • Information Gathering: Collect page titles, meta tags, and discovered URLs during the exploration.
  • Interactive Visualization: Visualize the website structure as an interactive graph using Plotly.
  • Data Export: Export exploration data in various formats such as JSON or CSV for further analysis.
  • Custom Filters: Implement filters based on regular expressions, keywords, or content types.
  • Error Handling: Identify and report HTTP errors encountered during the exploration.
  • Authentication Support: Handle authentication for protected web pages.
  • Scheduled Scans: Plan and schedule explorations at specified intervals.
  • User-Friendly Interface: Incorporate a graphical user interface (GUI) for ease of use.
  • Web page classification: Categorize web pages based on their content, enhancing your analytical capabilities.
  • User-Agent Configuration: Customize the user-agent used during exploration for more versatile web crawling.
  • Proxy Support: Utilize proxy servers to enhance privacy and control IP access during web crawling.
  • Exploration Delay: Define a time delay between the exploration of two URLs to manage web traffic and prevent overloading servers.
  • Pause and resume: Pause and resume exploration precisely where you left off

For full documentation visit sangsue.