Is string formatting in Python safe to use or not?

The mighty and powerful f-strings. Well, they are only a sugar syntax on string.format, but whatever. Here I will look at one of the usage scenarios that you should not commit - taking format string from the user. Well, I had to do it in my project, so…

Easy, what is an f-string?

Literal string prefixed with f with expression inside curly braces.

name = 'Pawel'
print(f"My name is {name}")

Next level, f-strings:

  • are evaluated at runtime
  • are parsed into literal strings and expressions
  • use the same format specifier mini-language as str.format
  • has full access to local and global variables
  • can run any valid Python expression, including function and method calls (ast.parse etc.)
  • can use lambdas.. game over.

Security issues?

Some prominent Python community personalities-celebrities observed that The F’s could be misused. That is a fact. There are a few ways to do it:

  • denial of service attack by a large string size
  • using F’s in logging in wrong way (read me)
  • accessing attributes within an object, especially if it holds some kind of secrets (web dev)
  • in general, user-supplied format string could inject unwanted code

Try that in your interpreter:

f"Goodbye! {exit()}"
$ BOOM

# on the other hand this will not work when using string.format

"{exit()}".format(exit()=1) # SyntaxError
"{exit()}".format(exit=1) # KeyError
"{exit()}".format() # KeyError

"{}".format(exit) # This will but is a different case than we talk about in this post

So is this a problem? Yes and no. Depends on the context and place where it is executed.

My application

I had a service that would read configuration files created by another team, and the clue was that they could put an array in yaml in such a form: *key: "{value}", where {value} would be used later for dynamic lookup in some data. Thanks to that I would not have to know the keys and values beforehand in my code. But what if user consciously or not will give me a bad input like one showed below in conf.yaml?

# conf.yaml
some-data:
 - key: "{value}"
 - foo: "{another} {self.secret} {exit()} {locals()}"
#app.py
from string import Formatter

formatter = Formatter()

resolved_fields_dict = dict()
external_data_source = dict(foo=1)

user_format_string = "{foo} {bar}" # This comes from conf.yaml

# Extract field from format string
fields = [field for _, field, _, _ in formatter.parse(user_format_string) if field]

# In: fields
# Out: ['foo', 'bar']

# Update my dict from external data if exists, otherwise use the default
for field in fields:
    resolved_fields_dict[field] = external_data_source.get(field, "Not Found")

# In: resolved_fields_dict
# Out: {'foo': 1, 'bar': 'Not Found'}

# Finally return
return user_format_string.format(**resolved_fields_dict)
# Out: '1 Not Found'

And while I am not afraid the other team would want to hack me, they could inject by mistake some bad input and make my Python throw various errors.

Solution

I used a customised string.Formatter in yaml config parser with overrided get_field method. This is based on an example found in stackoverflow:

from string import Formatter

class SafeFormatter(Formatter):

    def get_field(self, field_name, args, kwargs):
        if '.' in field_name or '[' in field_name:
            raise InsecureStringFormatError('Invalid format string.')
        return super().get_field(field_name,args,kwargs)

formatter = SafeFormatter()
try:
    formatter.format("{sys.exit()"}, num=1, id='hello')
except InsecureStringFormatError:
    print("Hack the world")

Now we can catch and handle bad input early.

Conclusions

  • Do not take format string from the user, and if you have to then do not use f-strings
  • string.Formatter could be used to create custom validations

Resources: