The Curiosity Blog

Experiments with GPT Vision for modelling: A journey from screenshots to whiteboards

Written by James Walker | 16 November 2023 14:00:00 Z

The landscape of artificial intelligence is rapidly evolving. The recent announcement of GPT-4 with vision capabilities by OpenAI stands as a groundbreaking development in multi-modal language models. Although this feature is still in its preview phase, the Curiosity team is always on the lookout for innovative solutions. We were immediately captivated and excited by the potential of integrating this advanced technology into our Quality Modeller product.

Our journey with GPT-4 began with a simple, yet ambitious, goal: to integrate cutting-edge AI into model-based testing and quality assurance, realising the combined value of both. In this blog, we showcase some of our early experiments using GPT-4's vision capabilities within Quality Modeller - a tool designed for collaborative, quality-focused development.

Preview #1: Screenshots to models

Model-based design and testing provides a structured, collaborative development methodology, that maximises accuracy and efficiency across the whole development lifecycle. Collaborative models engineer better quality requirements to drive development, while simultaneously generating optimised tests to rigorously validate the development work.

However, it's rare to start a modelling project from scratch. More often, we find ourselves needing to model existing applications or business logic. This led us to an intriguing question: How effectively could OpenAI's GPT Vision handle the task of creating models from screenshots of existing applications?

To put this to the test, we chose a practical example: a registration form from an ecommerce system. Our goal was to convert this form into a model using GPT Vision, feeding the screenshot into the system. The co-pilot then efficiently analysed and translated the form image into a linear model.

This model served as a starting point but did not give full coverage of the system behaviour when responding to different inputs, and the underlying business logic for validations, which ultimately triggers error or success conditions. This is one area where the power of modelling comes into play, with the ability to further refine and develop the model with SME knowledge of negative scenarios and edge cases, rapidly creating complete system specifications that can then auto-generate tests.

Preview #2: Realising the value of siloed information and artifacts

Software teams often invest time, thought and effort formulating information in static, siloed artifacts. This includes ideation sessions on whiteboards, converted into photos at best, or wireframes with no downstream integrations. Similarly, BAs often formulate valuable requirements in “static” diagrams with limited integration or automated downstream applications.

These siloed artifacts quickly become out-of-date, as they do not react to changes from across the SDLC. The siloes additionally hinder communication and feedback, while downstream teams overlook valuable artifacts due to the time and manual effort associated with converting them into code and tests.

These limitations not only hamper the efficiency of workflow documentation and requirements gathering; they significantly increase manual effort, miscommunication and the potential for bugs in development and testing.

We are leveraging GPT Vision to un-silo this valuable information, converting “static” artifacts into “living documentation”. For instance, we can convert computer-generated flowcharts, wireframes and whiteboard images directly into Modeller’s flowcharts. These “active” integrated flows are then used to drive collaborative requirements engineering, accurate development and automated test generation.

For flowcharts, this process begins by creating visual representations of the application's processes, which are then fed into the system. The co-pilot intelligently analyses these visuals, interpreting and converting them into detailed, editable flowcharts within our modelling tool.

This approach is particularly effective for flowcharting applications that are visually rich, but lack the necessary export functionalities. Bypassing the need for manual data entry or complex integration solutions, we can swiftly convert static images into dynamic, interactive models that accurately reflect an application's intended behaviour. This process yields accurate results because the input images (flowcharts) are themselves computer-generated, and therefore lend themselves well to automated, AI-augmented analysis.

Preview #3: Wireframes to flowcharts

Wireframes are the skeletal framework of a digital application, outlining its structure and layout, without delving into the finer details of design. The transition from design wireframes to comprehensive flowcharts is a crucial step in the model-based testing process. Our challenge was to convert these wireframes into detailed flowcharts that not only represent the structure, but also encapsulate the application's flow and functionality.

The system takes the wireframes and intelligently interprets them, identifying key elements like navigation menus, input fields, and user interaction points. From these elements, Quality Modeller’s “modeller co-pilot” constructs a flowchart that maps out how these components interact and connect, turning a static layout into a dynamic flow of processes.

One unique challenge in this process is ensuring that the nuances of a wireframe are accurately captured in the flowchart. Wireframes are often high-level and may not include detailed information about every user interaction. Secondly, the wireframes rarely contain the business logic which sits behind a front-end design.

Preview #4: Whiteboard to flowcharts

Whiteboarding is a key, collaborative approach to ideation and design for most enterprises today. Using GPT-4's vision capabilities, we capture these initial bursts of creativity and structure them into actionable models.

We achieve this by taking images of the whiteboard drawings and importing them into Quality Modeller using the modeller co-pilot. Doing this allows us to convert these often chaotic and unstructured drawings into clear, organized flowcharts and models.

The process begins with a simple photograph of the whiteboard. Modeller co-pilot then analyses the content, deciphering text, diagrams, and even hastily drawn shapes. It intelligently recognizes the relationships and hierarchies within these drawings, transforming them into a digital format that serves as the starting point for more detailed models.

While this feature has opened new avenues in capturing and digitizing spontaneous ideas, it's important to acknowledge its current limitations. The accuracy of converting these drawings into models largely depends on the clarity of the whiteboard sketches. In instances where drawings are overly abstract or text is illegible, the system may face challenges in accurately interpreting the content.

Human-centric development, AI acceleration

In our exploration of GPT Vision through our co-pilot functionality in Modeller, we've made significant strides in enhancing the modelling processes with additional sources of data which often exist in images. From transforming application screenshots into detailed models and importing flowcharts from non-exportable applications, to converting wireframes and whiteboard sketches into structured flowcharts, each method has showcased the power of AI.

A key insight drawn from all of these examples is the inherent limitation of relying solely on image-based data. This underscores the critical role of having human expertise in the loop, to scrutinize AI-generated results and refine them with the knowledge of Subject Matter Experts (SMEs). With AI’s current capabilities and limitations, this is essential in ensuring high quality and accuracy of any AI generated content.

The synergy between AI capabilities and human expertise through modelling paves the way for a more accurate, and accelerated approach to software quality. To join Curiosity on our journey to create this synergy at enterprise scale, book a time to speak with one of our experts today: