1 dev/P2P PP
TomZ edited this page 2023-04-29 23:54:01 +02:00

Introduction

Bitcoin Cash is meant to be used by everyone. The user experience of an actual payment being made hinges largely on which payment protocol is used. The earliest and still most used one is a simple QR code with basic information which works good enough, but is not very good at creating a great user experience.

The steps to take between the payer and the payee from start to finish of payment is the payment protocol, it is good to formalize this so as much as possible can me done by software and this helps make payments easier, safer and faster.

The most used payment protocol is still the QR code address, even though there is a much more advanced protocol called BIP70. There are some fatal flaws in that protocol which have hindered in its uptake. The main issues are the requirement for the server to use an x509 certificate. A requirement that implies asking permission from officials that mostly work in or with the financial sector.
Second it depends on protocol-buffers, a binary protocol that has caused many problems in that it is hard to debug and implementations on different languages were lacking.

This paper introduces a lessons-learned approach to re-designing a payment protocol for Bitcoin Cash, aiming to improve the user experience and increase its utility.
Fitting in with the overall goal of Flowee to accelerate the world towards a Bitcoin Cash economy.

Non-usecase:

Its important to make clear that this protocol isn't meant to support every imaginable usecase.

n1) A person has an online store which has its representation on something as inflexible as Facebook. Users state they want to buy a product by private message and the seller replies with an old fashioned bitcoincash:// style URL. (clickable or QR style) This can include the amount of funds expected to get paid and it naturally includes a unique address.

The challenge and response here is entirely human based and can take a long time before a response is given. If ever. The payment protocol, in other words, is a largely human-interaction based one.

n2) An organization or individual (artist) is producing products for free for the general public, but is asking for donations or other anonymous sending of funds. The receiver just posts a donation address or similar in a place either online or on a piece of paper allowing an interaction-free way to start and finish the payment.

Usecases:

  1. User buys a product over the Internet from a merchant accepting Bitcoin Cash. The user actively chooses the "buy" moment and has some website lead her into the payment process which allows the users wallet to create and share a custom made payment for the merchant. After the merchant confirms various details, the goods are released.

  2. Payment over the Internet, like UC 1, but the target is a more complex script. For instance a multi-sig script.

  3. Two people come to an agreement to exchange goods for payment. The seller uses his phone to request the payment while the buyer uses her phone to complete the payment.

  4. A new service was created to do kickstarter-type drives. A server is run that is capable of combining all the users payments into one actually validating BCH-transaction when the total amount of funds have been reached. The payment protocol is used by the server to request an amount of funds, with a minimum and a maximum range that will be deemed acceptable to the receiver.

The design is based on lighthouse / flipstarter where a special sighash is used that allows many people to sign only their inputs and the pre-defined outputs while pledging money.

This usecase means the payer needs to receive a partial transaction to adjust, sign and return to the receiver.

  1. (Web-)service using a profit sharing or similar approach. An online service where tips are given for content generated will be much more wide spread if they bind the users spending to income for themselves. Yet, this is often in conflict with the wish of the service operator to avoid operting a custodial wallet system. Any tip or purchase made on the website should go from customer to content-profider directly. With the service provider just creating the platform. The usecase here is that the website is the one operating the payment-protocol-server and they should be able to split the money paid in order to generete income for themselves. All in the same transaction to avoid ever having access to the users funds.

  2. Recurring payment, or pledge. Bitcoin Cash requires every payment to be signed by the current holder of funds. This means that a request for monthly payments can not be submitted to a bank or similar because there needs to be one transaction per period that needs to be signed by the sender, probably every period, at the time of payment to avoid having to own a year of funds when approving the recurring payment.

A recurring payment is thus nothing more than a promise to pay and support in the payment protocol can by extension be nothing more than a formally designed request to please OK a payment every period from now on. Yet, with this included more wallets may consider having a proper response to such a request. Anything from an "Unsupported" reply to the requester to a beautiful UX to automate this as much as possible.

Security considerations:

When Bob signs a transaction to send to Alice, there is a requirement that Bob is certain that the payment details provided to him are really from Alice and not somehow changed to pay someone else.

How this is ensured is highly dependent on the type of payment and amount of funds. The assumptions made in BIP70 are that Alice is represented by a well connected company with reliable uptime and with a recognizable name that can be found by the courts should anything go wrong.

The requirements of using X509 certificates based on the BIP70s assumption exclude people and small merchants from being payment-protocol receivers. The effect is that only big intermediaries implement BIP70.

The smaller merchants have traditionally been OK with SSL certificates, like HTTPS based websites use.

Users that have phones should be able to manage by just using a bluetooth channel started via NFC or QR code.

An idea from cash-accounts to use one or two pictograms may be useful too. The first two bytes of the TxId of the transaction-template can be shown as such icons, allowing a casual look on the screen to confirm that there was no modification in-flight.

General design of the payment protocol.

The payment protocol should be exact and detailed enough to allow different teams to implement it without talking to each other and then having a successful payment.

At the same time the protocol should stay flexible enough to allow future extensions and unforeseen transfer protocols to be used. We don't want to hardcode that HTTP should be used excluding bluetooth or other future solutions in future. Although for a first version the HTTP solution is what we describe.

The high-level design is one where there two pieces of information being communicated between the client and the server. Where the "server" is the one requesting the funds to be send to it.

  • Meta-data as a JSON
    This is a payment-request with various pieces of information. The amount in fiat, the amount in BCH and a comment. But also details like the suggested sighash for inputs and minimum fee.

  • A transaction-template
    The server creates a transaction with the output(s) it requires and sends this over to the client. The client should use this transaction as a base and add inputs to it in order to fund it.

    The sending of a transaction allows the server a huge amount of freedom on innovating with how they want to get paid while keeping it simple for the client which just has to add its inputs and sign them.

In terms of web-technology, the two pieces of information are requested by the client using two individual end-points. Both simple GET HTTP calls and separated because one is UTF8-JSON and the other is essentially a simple file-download.

After the client funded the template transaction, it signs it. This creates a final transaction. One that should be acceptable by the Bitcoin Cash Network. The actual sending of the transaction goes to the server we wish to pay, not the general network. This is a 3rd end-point, using POST, and the server will take up to a couple of seconds to respond and approve. In this time the server does all validation and then broadcasting of the transaction. Additionally it waits for a double spend proof for 2 or 3 seconds to make sure that the payment is most likely to be mined in the next block.

Should the client have failed to provide an acceptable transaction, the server can reject the payment with a stated reason. Almost always this is cause for the client to abort the payment and let the user decide what to do next. Reasons for cancelling could be that one of the inputs used to fund the transaction is known to be double-spent.

Handshake

If we assume the one receiving the funds is to be called the server, we need to establish a handshake between the client and the server.

This payment protocol will specify one such handshake in detail, but it should be noted that there will be future extensions that aim at different usecases which require different handshakes.

What is important is that the handshake leads to the acquisition of the meta-data JSON file.

Ideally all handshakes start with a QR code that is relatively small. Typically a URL or a Bluetooth name/mac.

HTTPS Handshake

The QR code shown to the client is a URL using the https:// protocol. The server is ready to answer HTTP GET requests on this URL.

The URL should be unique in that the request made on that URL is known to be related to the QR code shown on the screen. Others should not be able to guess the URL.

The client can do a simple HTTP GET on the URL, expecting the meta-data JSON to be returned. The client should validate the certificate provided by the webserver to be valid and the right one for the server it is used on. It is not a failure to validate this, but the client opens itself up to a man-in-the-middle attack if they skip this step.

Meta-data JSON

The handshake results in a the client having obtained a JSON file with the something like the following content:

{
    pp-ver: 1 // payment protocol version
    amount-fiat: "1 euro", // to be shown to the user
    amount-bch: 100000,  // what the wallet has to provide, in satoshis
    memo: "Cookies",    // comment from merchant
    // the txid for the template
    tx-template-id: "009215ce96fe7872cdf68e75feeb44818f58cdf61bb28d6064b8ceee6ca44cfa"
    // the url to download the actual template transaction
    template-tx-url: "https://someserver.com/tx/$1"
    
    // the end-point to send the signed transaction to
    response-url: "https://example.com/pay/da34123"
}

More details on each field, and other fields, in the Technical Specification, below.

When the server sends the tx-url field the client follows up with a simple HTTP-GET to fetch the file. This is a simple REST call that downloads a binary file of mime-type application/octet-stream.

The client should interpret this data as a transaction after doing a standard double-sha256 hash to confirm the downloaded file matches the TX-template-id as specified in the meta-data JSON.

Submit transaction

After the client has taken the template transaction and has added inputs to fund it, the resulting transaction can be submitted to the server.

Clients shall not submit the transaction to the Bitcoin Cash network, the server is responsible for that. Clients only submit the transaction to the location specified in the response-url.

The client submits the transaction at the requested end-point to complete the payment. Optional fields like "refund_to" are included to allow the server to process returns more swiftly.

After the client submits the transaction the server processes it and validates it for correctness, for risk-management and similar items.

The server returns with a response JSON file which states if everything went OK or if there was an error.

{ success: false message: "greater than 25 unconfirmed depth, rejected b/c too risky" }

Submit Transaction HTTP POST

The end-point specified in the meta-data file with response-url shall use the HTTPS protocol, HTTP is not allowed.

Submitting the data is done using a HTTP POST call. The key tx is to be used to upload the signed transaction.

The client may have used a funding transaction of which it can't ensure that it is available on the network. For instance because the client never sent it to the Bitcoin Cash network before. Those transactions should be provided as well, using a key in the form-data starting with funding_tx.

This allows the client to upload any number of transactions.

Additional keys may be added by the client for the server to use.

  • refund_to

    This should contain a single bitcoincash: based address that the merchant can use should the payment be cancelled and the funds returned.

Technical specification

  • pp-ver (number)

    The payment-protocol-version the server is using to serve us.

  • amount-fiat (string)

    A user-visible string of the amount of fiat that is requested to be paid. Notice that the wallet would be wise to do a conversion based on current rates from the actual amount-bch. This field is useful for tax purposes and similar.

  • amount-bch (number)

    A number that defines the amount of satoshi's that are expected to be paid. Notice that this does not include network fees.

  • memo (string)

    A merchant specified comment to indicate what it is we are paying for.

  • tx-template-id (string)

    This is a 64 character, hex-encoded (little-endian) transaction-id of the template transaction.

  • template-tx-url (string)

    If this is included then the client should download the transaction from the provided URL.

    The URL may have a $1 in the string which should be replaced with the TxId prior to download.

  • template-tx (string)

    The hex-encoded transaction. Mutually exclusive with the template-tx-url field.

  • response-url

    This is the URL that the client should call to finish the payment.

    The URL may have a $1 in the string which should be replaced with the TxId prior to upload.

-- TODO various other properties for features

  • int: suggested sighash for inputs
  • bool: additional op_return output allowed
  • min fee per KB (in sats)
  • network (main or test)
  • timestamp
  • expires

postscript

Undoubtedly people will question why a new one instead of using the existing BIP70 or its JSON equivalent.

First reason is the encoding of the transaction-data. All BIP70 based payment protocols have some way to encode partial transaction data. Like ouputs, or payment-addresses.

This approach raises the question why a new format is used instead of one that everyone needs to know anyway: the actual transaction. A transaction has a list of inputs, a list of outputs. It is perfectly fine to have zero inputs in a transaction. The format supports this. The inventors of the BIP70 protocol instead invented a new way to encode partial transaction data. THis overhead is irrelevant and can be removed.

The benefit of going with a plain transaction instead of something like a list of outputs or an address is that you gain a lot of future extensibility. The bip70 way is limiting the amount of information that can be sent from merchant to wallet. It is limiting the amount of usecases we can cover with this protocol.

WIth an actual transaction there is no filtering. The sequence, the lock-time, the tx-version and maybe some existing inputs are all possible to decide on by the merchant for the wallet to fund and sign. It is simpler and more powerful to just send the template-transaction.

Second the bip70 design is geared very much towards only web usage. Adding an NFC or bluetooth channel is going to be messy.

As a direct example there are various properties that the protocol stores in the http-headers. The new payment protocol this document introduces stores all of the data in one simple meta-data JSON instead. No matter what handshake is used, this metadata JSON is the result. This makes common handling much easier as we skip http or other protocol specific properties and code.

3rd Signatures (x509) are embedded in the application-layer data. The signatures prove that the sender owns the private keys.
But this is already proven by the fact that the data is sent (and encrypted by) a webservr using that same private key. There is no situation where an attacker can let the data originate from a secure-http server without having those keys. So the signatures inside of the message add nothing that was't already proven by the certificats on the transport layer..

Forcing servers to use HTTPS and making sure that wallets actually check the certificates solves this just fine. Again, dropping a lot of complexity helps adoption.

Additionally, when people meet face to face and use a protocol like NFC to run the payment protocol through, the security is provided in the physical realm. It is good to avoid store owners having to buy a certificate, or depend on a 3rd party, in order to keep things permissionless.

4th problem with BIP70 is that the sending of a transaction from the wallet to the merchant uses a semi-hardcoded endpoint. Same server as the request but a hard-coded location for the call.

This is an ugly solution which restricts dev-ops deployments and it has lead to multiple ways to finish up the payment with different calls using different protocols and it is generally a mess.

The simple solution where the payment request resulting in a meta-data file where the server defines what end-point to call with the result is much more elegant.

5th and last problem is the flexibility given in BIP70 with regards to the wallet finishing up its transaction. A lot of wallets end up broadcasting the transaction themselves to the network. Which leaves the payment server to realize this only after the network has relayed it to him again.

The problem that this creates is that it takes away a lot of risk-management options away from the merchant. Users that can send a very low-fee transaction or similar issues may occur that make the transaction stay in the mempool for a long time. It is hard to reject a payment after the customer already irreversibly made it leave his own wallet.
The merchant should be able to reject a suggested payment based on what the merchant decides is too high a risk before it is shown to anyone else.

Additionally, when the payment server is the one doing the broadcast-to-network it can check the timing and validate the response from the network. For instance double-spend-proofs are a useful weapon that work better when the server is the one broadcasting the transaction.