18 Mar 2025

Editing images with Google Flash 2.0

Google’s Gemini Flash 2.0 made some waves when it came out with it’s ability to edit images, and it’s pretty cool. As of Mar 16, 2025 they’ve gotten scooped by OpenAI’s multimodal GPT4 capabilities, but those are only available inside the ChatGPT app for now while Google’s offerings are available over their API (for free no less!)

First things first, grab an API key from Google AI Studio—just click that big “Get API key” button, and you’re good to go.

Google provides a quite generous free tier which you’re unlikely to go over as an individual, and you can upgrade to paid if you plan to use it in prod.

Once you’ve got your API key, you can just prompt your way to a new image from scratch:

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-exp:generateContent?key=$GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -X POST \
  -d '{
    "contents": [
      {
        "parts": [
          {
            "text": "Generate a high-quality image of a mountain landscape at sunset"
          }
        ]
      }
    ],
    "generationConfig": {
      "responseModalities": ["Text", "Image"]
    }
  }' | tee response.json

This will spit out some JSON, but since we used tee to save it to a file as well as showing it onscreen we can convert it back to a png file using jq and base64:

cat response.json | jq -r '.candidates[0].content.parts[0].inlineData.data' | base64 -d > response.png

On a Mac, you can open up Preview to view your new image like this (on windows try start and on linux xdg-open)

open response.png

So generating images is nice, but we’ve had that for years. Editing the image is where things get interesting. Let’s say you want your peaceful mountain to suddenly become a nightclub scene—here’s how:

First, encode your image back to base64:

BASE64_ENCODED_IMAGE="$(cat response.png | base64 -w 0)"

Then, let’s send in an edit prompt and save it to response2.json:

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-exp:generateContent?key=$GEMINI_API_KEY" \
  -H 'Content-Type: application/json' \
  -X POST \
  -d @- > >(tee response2.json) << EOF
{
  "contents": [{
    "parts": [
      {"text": "Edit this image so that it's happening in the middle of a nightclub"},
      {"inlineData": {"data": "$BASE64_ENCODED_IMAGE", "mimeType": "image/png"}}
    ]
  }],
  "generationConfig": {"responseModalities": ["Text", "Image"]}
}
EOF

Decode it again to view:

cat response2.json | jq -r '.candidates[0].content.parts[0].inlineData.data' | base64 -d > response2.png
open response2.png

Google is notoriously trigger happy with their safety filters though. I once asked it to “make an image of two dogs” and this is what it replied with:

Anyway, if you run into that, try adding the following to make google’s safety filtering less absurd:

safetySettings=$(jq -n '{
  safetySettings: [
    {
      category: "HARM_CATEGORY_HATE_SPEECH",
      threshold: "BLOCK_NONE"
    },
    {
      category: "HARM_CATEGORY_SEXUALLY_EXPLICIT",
      threshold: "BLOCK_NONE"
    },
    {
      category: "HARM_CATEGORY_DANGEROUS_CONTENT",
      threshold: "BLOCK_NONE"
    },
    {
      category: "HARM_CATEGORY_HARASSMENT",
      threshold: "BLOCK_NONE"
    },
    {
      category: "HARM_CATEGORY_CIVIC_INTEGRITY",
      threshold: "BLOCK_NONE"
    }
  ]
}')

You can add the snippet above into your edit command’s json like so:

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-exp-image-generation:generateContent?key=$GEMINI_API_KEY" \
  -H 'Content-Type: application/json' \
  -X POST \
  -d @- > >(tee response3.json) << EOF
{
  "model": "gemini-2.0-flash-exp-image-generation",
  "contents": [{
    "role": "user",
    "parts": [
      {
        "text": "Edit this image to add hula dancers all up and down the mountain, and change time to a bright sunny day"
      },
      {
        "inline_data": {
          "mime_type": "image/png",
          "data": "$BASE64_ENCODED_IMAGE"
        }
      }
    ]
  }],
  "safetySettings": $(echo $safetySettings | jq '.safetySettings'),
  "generationConfig": {
    "temperature": 1,
    "topP": 0.95,
    "topK": 40,
    "maxOutputTokens": 8192,
    "responseMimeType": "text/plain",
    "responseModalities": ["image", "text"]
  }
}
EOF

cat response3.json | jq -r '.candidates[0].content.parts[0].inlineData.data' | base64 -d > response3.png
open response3.png

All in all, Google’s Gemini Flash 2.0 is pretty cool! The images it generates aren’t as impressive as Flux or Midjourney’s (or even the gpt 4o multimodal), but being able to edit images inline definitely makes up for that.