Q: Need to know more detailed information about "Scrape website" feature
I’m interested in understanding the scraping process in more detail. Specifically:
Is it possible to scrape images directly, or just the image URLs?
Can we selectively scrape specific data, such as only titles and text, or does the tool allow for deeper customization to extract exactly the data we need?
A detailed explanation of these capabilities will help us determine if this solution fits our needs.
Shawn_AahSheet
Sep 11, 2024A: At this time we are only scraping website text. But, this could be updated to grab all images from a url. So, not built, but I think that should be easy to add really. I could support that update and get something in a couple days or so if needed.
However, I will update our sitemap tool to grab the image from sitemap if available. I have used vision to pull text from images which is fun, but I haven't released that to our group. Again, another generally simple thing to add for everyone.
My plan is to add scrapingowl or similar to start bringing in scraping into our content sheets.
The scraping is more or less some bonus things we offer but its not technically our core product. I am adding functions commonly found in other tools so it's easier to just use Aah Sheet along side our content processes. But, you will have access to code and any scraping I add so that you can extend this functionality to meet your needs. If you need something custom then I could be open for hire to build you something just for you all, but my focus is on the community.
If you go to our homepage at aahsheet and scroll down you will see a list of the available functions. Here is the current list:
=AI(...arg) - AI Function: Combine multiple cells or direct text into one prompt for OpenAI. Ideal for generating AI-driven content or responses based on diverse inputs.
=visit("URL") - Visit: Fetches and displays the entire HTML content of a given URL. Useful for content analysis, web research, or data extraction tasks.
=serp("SEARCH QUERY") - SERP: Retrieves the top 20 DuckDuckGo search results for a query, providing URLs, titles, and descriptions. Great for SEO research or competitive analysis.
=getMetaTitle("URL") - Get Meta Title: Extracts the meta title of a specified webpage. Useful for SEO analysis and understanding how pages are titled for search engines.
=getMetaDescription("URL") - Get Meta Description: Pulls the meta description from a given URL, aiding in understanding webpage summaries as seen by search engines.
=getH1("URL") - Get H1: Returns the main H1 heading of a specified webpage, crucial for SEO and understanding the primary focus of the content.
=getH2("URL") - Get H2s: Gathers all H2 subheadings from a webpage, helpful in content structure analysis and identifying key subtopics.
=getHeadings("URL") - Get Headings: Collects all headings (H1, H2, etc.) from a page, offering a quick overview of the content structure and hierarchy.
=getp("URL") - Get Paragraphs: Extracts all text within paragraph tags from a webpage, useful for content analysis, text extraction, or readability assessments.
=visitAll("URL") - Visit All: Aggregates main content elements (H1, H2s, paragraphs, meta title, and description) from a URL and arranges them in sequential cells, starting from the function's location. Ideal for comprehensive webpage content analysis.
=imageT("Prompt") - Image Tall: Generates a tall (1024x1792) image based on the given prompt using OpenAI DALL-E, and returns the URL of the generated image.
=imageW("Prompt") - Image Wide: Generates a wide (1792x1024) image based on the given prompt using OpenAI DALL-E, and returns the URL of the generated image.
=imageS("Prompt") - Image Square: Generates a square (1024x1024) image based on the given prompt using OpenAI DALL-E, and returns the URL of the generated image.