It's hard to imagine the year is nearly half over, and this is the first Blog post I have written. As someone who works in technology, you can imagine I have been buried in talks about Generative AI with customers. We spend a lot of time discussing different use cases and what Generative AI can do for the business.

Exploring GenAI for AWS customers means spending a bit of time in Amazon Bedrock and a few sample applications and trying things out. As a developer, I have been playing around with tools like Amazon Code Whisperer, now branded "Q Developer," and Github CoPilot for about 2 years now. The coding tools are quite good, and they help with trying to remember syntax and framing up the code. Obviously, these tools are not replacing the developer... they are just helping to reduce the number of times I have to drop out of the coding environment and head to my browser to look up function syntax.

One challenge with developing software over the past thirty years is the creation of test data. Let's say I have an API that receives some data elements as a payload. It's the 2020's so all the cool kids are passing JSON data. As a developer it's pretty east for me to create a JSON payload to test out my project. I can open up Visual Studio Code, create a new JSON file and type away...

{
    TimeStamp: "2024-05-15 08:15pm",
    TempF: 70,
    Humidity: 65,
    Location: "42.0418,-71.5368",
    LocationDesctiption: "Blackstone, MA",
    Sequence: 1
}

Now I say it's pretty easy to create this data, and it is... the first time. Maybe the first 10 times. But you know what... creating a large amount of test data is going to become a pretty uninspiring task. What if I need 100 test elements? Ok I can probably sit in front of the TV and start plugging in random data... but here's the thing... random data doesn't really paint a very good picture. What if I want a series of data points that tell a story or create a realistic scenario?

The data points above illustrate what data might look like for a temperature feed. Let's say I have a shipping container that needs to be climate-controlled during shipment. In order to create a realistic stream of data I am going to have to do a lot of research. Where is my origin and destination? Given an average speed speed, how many data points am I going to collect? What is the story I am trying to tell with the data? Am I simulating a happy case? Or do I want to model a failure?

Enter Generative AI

One of the things I love about the use of GenAI to create test data is the process goes from typing out random data into a text file to writing a story. This means that nontechnical product team members can create test data by writing out a user story into the prompt.

Lets try it out:

First thing I am going to do is head into the AWS console. From here, I am going to pick the Bedrock service. Bedrock will allow you to choose from several different Large Language models and use them in something called a playground.

Once in bedrock, choose the Text playground

Inside the playground you can press the button to select the model you want to use.

By default, all Bedrock users will have access to Amazon's Titan models. Other models, such as Anthropic's Claude V3 models, can also be made available from the AWS Marketplace. Each model will have it's own characteristics about cost, performance and items that it's particularly well suited to.

For this example, I am going to choose Anthropic's Claude 3 Sonnet model. Claude is a good general-purpose model that works well in a range of use cases. Sonnet is their midrange model.

Once I select the model, I can start building out the story for my testing scenario. I'll start by describing the basic environment. In this case I tell the LLM that I have a sensor, and what the data element I am expecting it to create looks like.

human: I have a sensor mounted in a refrigerated container on the back 
of a truck. The sensor reports back the following document: 
{ 
     "temperature" : 32, 
     "humidity" : 70, 
     "location" : "123x456",     
     "sequence" : 1, 
     "City" : "Boston", 
     "Time" : "2024-04-01 10:00:00 am" 
}

Next, I will explain to the LLM what the data elements are and what the elements mean.

human: The temperature tag is the temperature in the container at the 
time of the reading. The temperature is normally between 34 and 38 
degrees Fahrenheit. The humidity tag is the percentage of humidity 
in the container as a percentage and is normally between 60 and 80 
percent. The Location tag is the GPS location of the container at 
the time the reading is taken. The sequence tag is a sequential 
number that increases by 1 every time a data point is read. The 
city tag is the nearest city when the location is read. The Time 
tag is a timestamp for the reading.

As you can see, while a small amount of programming knowledge is required for the JSON layout, the remainder is really domain knowledge. The person who is formatting this information doesn't have to know anything about coding.

Next I am going to describe the actual testing scenario:

human: The container is on a truck traveling from Boston MA to 
Hartford CT. The sensor reports at least every 10 minutes or when 
there is a change in temperature or humidity. The container is 
traveling along interstates at an average of 65 miles an hour. 
The values reported will be in the normal rage until the container 
enters Connecticut. Once entering Connecticut, the temperature will 
increase, and the humidity will decrease. Creating an error condition 
when the container reaches Hartford.

Finally, I will finish up by telling the LLM what I want it to actually do

human: Create a series of sample data points that will show the 
container moving from Boston to Hartford, with data points every 
10 minutes. formatted as JSON documents

I'll then hit the play button at the bottom of the display. The LLM will process the prompt and create a set of output data points for me.

Claude produces the following output:

Sure, here's a series of sample data points that simulate the trip 
from Boston, MA to Hartford, CT:

```
{
  "temperature": 36,
  "humidity": 72,
  "location": "42.3601,-71.0589",
  "sequence": 1,
  "City": "Boston",
  "Time": "2024-04-01 10:00:00 AM"
}

{
  "temperature": 36,
  "humidity": 71,
  "location": "42.3919,-71.0339",
  "sequence": 2,
  "City": "Boston",
  "Time": "2024-04-01 10:10:00 AM"
}

{
  "temperature": 36,
  "humidity": 70,
  "location": "42.4276,-70.9901",
  "sequence": 3,
  "City": "Boston",
  "Time": "2024-04-01 10:20:00 AM"
}

{
  "temperature": 37,
  "humidity": 69,
  "location": "42.4668,-70.9438",
  "sequence": 4,
  "City": "Weston",
  "Time": "2024-04-01 10:30:00 AM"
}

{
  "temperature": 38,
  "humidity": 68,
  "location": "42.5095,-70.8943",
  "sequence": 5,
  "City": "Framingham",
  "Time": "2024-04-01 10:40:00 AM"
}

{
  "temperature": 39,
  "humidity": 67,
  "location": "42.5556,-70.8419",
  "sequence": 6,
  "City": "Marlborough",
  "Time": "2024-04-01 10:50:00 AM"
}

{
  "temperature": 41,
  "humidity": 65,
  "location": "42.6048,-70.7869",
  "sequence": 7,
  "City": "Berlin",
  "Time": "2024-04-01 11:00:00 AM"
}

{
  "temperature": 43,
  "humidity": 63,
  "location": "42.6569,-70.7296",
  "sequence": 8,
  "City": "Bolton",
  "Time": "2024-04-01 11:10:00 AM"
}

{
  "temperature": 45,
  "humidity": 61,
  "location": "42.7117,-70.6702",
  "sequence": 9,
  "City": "Sturbridge",
  "Time": "2024-04-01 11:20:00 AM"
}

{
  "temperature": 47,
  "humidity": 59,
  "location": "42.7689,-70.6090",
  "sequence": 10,
  "City": "Union",
  "Time": "2024-04-01 11:30:00 AM"
}

{
  "temperature": 49,
  "humidity": 57,
  "location": "42.8283,-70.5463",
  "sequence": 11,
  "City": "Hartford",
  "Time": "2024-04-01 11:40:00 AM"
}
```

By following the data points, we can track the progress of the truck as it moves from Boston through several cities along the way, such as Framingham and Sturbridge. The data in each element varies, as would be expected of the sample data. FInally when the container started to approach hartford the reading go outside of the normal range, as I instructed.

Sonnet also adds a summary to the output data that gives me an overview:

In this example, the temperature gradually increases from 36°F to 49°F, 
and the humidity decreases from 72% to 57% as the truck travels from 
Boston, MA to Hartford, CT. The location coordinates and the nearest 
city are updated accordingly, and the sequence number increments with 
each new data point. The time stamp also advances by 10 minutes for 
each data point, reflecting the sensor's reporting frequency.

Note that the temperature and humidity values are outside the normal 
range once the truck enters Connecticut, indicating an error condition 
in the refrigerated container.

Summary

So there you have it—generative AI with a use case that can help developers and teams create test data. By using an LLM, non-developers can create test data for the use cases that they want by writing a story that outlines the situation they want to mimic. Using this technique, creating large amounts of realistic test data is both quick and easy.

GenAI for Developers

Enter Generative AI

Summary

Comments

More from this blog

Updated Streaming Data Test Client

Resource Principal Authentication with OCI Container Instances

Leveraging the AI in your database...

Who put the AI in my Database?

Oracle AI World - Tuesday - Day 2

Command Palette

Enter Generative AI

Summary

Comments

More from this blog