Need to know more detailed information about "Scrape website" feature

Question

I’m interested in understanding the scraping process in more detail. Specifically:

Is it possible to scrape images directly, or just the image URLs?
Can we selectively scrape specific data, such as only titles and text, or does the tool allow for deeper customization to extract exactly the data we need?
A detailed explanation of these capabilities will help us determine if this solution fits our needs.

Shawn_AahSheet · Answer

At this time we are only scraping website text. But, this could be updated to grab all images from a url. So, not built, but I think that should be easy to add really. I could support that update and get something in a couple days or so if needed. \u000a\u000aHowever, I will update our sitemap tool to grab the image from sitemap if available. I have used vision to pull text from images which is fun, but I haven\u0027t released that to our group. Again, another generally simple thing to add for everyone.\u000a\u000aMy plan is to add scrapingowl or similar to start bringing in scraping into our content sheets.\u000a\u000aThe scraping is more or less some bonus things we offer but its not technically our core product. I am adding functions commonly found in other tools so it\u0027s easier to just use Aah Sheet along side our content processes. But, you will have access to code and any scraping I add so that you can extend this functionality to meet your needs. If you need something custom then I could be open for hire to build you something just for you all, but my focus is on the community.\u000a\u000aIf you go to our homepage at aahsheet and scroll down you will see a list of the available functions. Here is the current list:\u000a\u003DAI(...arg) \u002D AI Function: Combine multiple cells or direct text into one prompt for OpenAI. Ideal for generating AI\u002Ddriven content or responses based on diverse inputs.\u000a\u000a\u003Dvisit(\u0022URL\u0022) \u002D Visit: Fetches and displays the entire HTML content of a given URL. Useful for content analysis, web research, or data extraction tasks.\u000a\u000a\u003Dserp(\u0022SEARCH QUERY\u0022) \u002D SERP: Retrieves the top 20 DuckDuckGo search results for a query, providing URLs, titles, and descriptions. Great for SEO research or competitive analysis.\u000a\u000a\u003DgetMetaTitle(\u0022URL\u0022) \u002D Get Meta Title: Extracts the meta title of a specified webpage. Useful for SEO analysis and understanding how pages are titled for search engines.\u000a\u000a\u003DgetMetaDescription(\u0022URL\u0022) \u002D Get Meta Description: Pulls the meta description from a given URL, aiding in understanding webpage summaries as seen by search engines.\u000a\u000a\u003DgetH1(\u0022URL\u0022) \u002D Get H1: Returns the main H1 heading of a specified webpage, crucial for SEO and understanding the primary focus of the content.\u000a\u000a\u003DgetH2(\u0022URL\u0022) \u002D Get H2s: Gathers all H2 subheadings from a webpage, helpful in content structure analysis and identifying key subtopics.\u000a\u000a\u003DgetHeadings(\u0022URL\u0022) \u002D Get Headings: Collects all headings (H1, H2, etc.) from a page, offering a quick overview of the content structure and hierarchy.\u000a\u000a\u003Dgetp(\u0022URL\u0022) \u002D Get Paragraphs: Extracts all text within paragraph tags from a webpage, useful for content analysis, text extraction, or readability assessments.\u000a\u000a\u003DvisitAll(\u0022URL\u0022) \u002D Visit All: Aggregates main content elements (H1, H2s, paragraphs, meta title, and description) from a URL and arranges them in sequential cells, starting from the function\u0027s location. Ideal for comprehensive webpage content analysis.\u000a\u000a\u003DimageT(\u0022Prompt\u0022) \u002D Image Tall: Generates a tall (1024x1792) image based on the given prompt using OpenAI DALL\u002DE, and returns the URL of the generated image.\u000a\u000a\u003DimageW(\u0022Prompt\u0022) \u002D Image Wide: Generates a wide (1792x1024) image based on the given prompt using OpenAI DALL\u002DE, and returns the URL of the generated image.\u000a\u000a\u003DimageS(\u0022Prompt\u0022) \u002D Image Square: Generates a square (1024x1024) image based on the given prompt using OpenAI DALL\u002DE, and returns the URL of the generated image.

AahSheet

Share AahSheet

Related questions