Unable to load image

Open Call For Coders: Please Help Me Fix Autodrama!

https://fsdfsd.net/HeyMoon/autodrama/src/branch/master/autodrama.py

hey bot-fans, you may have noticed that fan-favorite bot @autodrama stopped posting about three months ago. This is because the monkeys at pushshift screwed up the way that comments are retrieved. I call into pushshift's API here to get comments on a post, however, this randomly stopped working.

PMAW is a python wrapper for pushshift. I only use it once, and that is to get all the comments on a particular post. The API documentation says that I can do this with "link_id" (https://pushshift.io/api-parameters/) but when I try to hit api.pushshift.io/reddit/comment/search/?link_id=10pv7qa&subreddit=ScienceUncensored, I get this response:

{
    "data": [],
    "error": null,
    "errors": {
        "error": {
            "root_cause": [
                {
                    "type": "query_shard_exception",
                    "reason": "failed to create query: For input string: \"10pv7qa\"",
                    "index_uuid": "htpvQm2RT4uCWzM53Q3zuw",
                    "index": "rc_2005-12"
                },
                <above object 200 more times>
            ],
            "type": "search_phase_execution_exception",
            "reason": "all shards failed",
            "phase": "query",
            "grouped": true,
            "failed_shards": [
                {
                    "shard": 0,
                    "index": "rc_2005-12",
                    "node": "r5briY5hS_mr4RzlALKdow",
                    "reason": {
                        "type": "query_shard_exception",
                        "reason": "failed to create query: For input string: \"10pv7qa\"",
                        "index_uuid": "htpvQm2RT4uCWzM53Q3zuw",
                        "index": "rc_2005-12",
                        "caused_by": {
                            "type": "number_format_exception",
                            "reason": "For input string: \"10pv7qa\""
                        }
                    }
                },
                <above object 200 more times>
            ]
        },
        "status": 400
    }
}

Yes, I reported this issue to the maintainers of pushshift, but have gotten no response back.

Clearly, the backend is interpretting what should be strings as numbers, because this works: api.pushshift.io/reddit/comment/search/?link_id=100000&subreddit=trees

I also tried using pushshift's /comment_ids endpoint but shit's broken too.

Fellas, any thoughts? @automeme is my only child that is not estranged, unlike my other children @bbbb (who hates me 😭) and @automeme (who is clinically retarded)

40
Jump in the discussion.

No email address required.

It says number format exception, so it seems that it expects link_id to be an integer rather than a string of letters and digits.

The API documentation also says that link_id is an Integer. Apparently Reddit uses base 36 IDs, so you'll probably need to convert 10pv7qa from base 36 to base 10 and use that as the link_id argument.

Yep. This works:

https://api.pushshift.io/reddit/comment/search?link_id=2220229090&subreddit=ScienceUncensored

Edit: Actually, no, it doesn't work. That succeeds for 12 shards but fails for the rest with errors like this:

{"total":828,"successful":12,"skipped":0,"failed":816,"failures":[{"shard":0,"index":"rc_2005-12","node":"sYpN84pmQYuHQmYijuzUeg","reason":{"type":"query_shard_exception","reason":"failed to create query: Value [[32 32 32 30 32 32 39 30 39 30]] is out of range for an integer","index_uuid":"htpvQm2RT4uCWzM53Q3zuw","index":"rc_2005-12","caused_by":{"type":"illegal_argument_exception","reason":"Value [[32 32 32 30 32 32 39 30 39 30]] is out of range for an integer"}}},

I guess it's complaining because 2,220,229,090 is out of range for a 32-bit signed integer. But maybe that's okay, because it's only failing for older shards that predate the post anyway. You might want to limit your query so that it doesn't search for comments older than the post itself.

Edit: Got it working-ish here

Jump in the discussion.

No email address required.

well that's the other thing, if you look in the bottom of the response you will see other errors...

{
            "shard": 0,
            "index": "rc_2009-07",
            "node": "r5briY5hS_mr4RzlALKdow",
            "reason": {
              "type": "query_shard_exception",
              "reason": "failed to create query: Value [[32 32 32 30 32 32 39 30 39 30]] is out of range for an integer",
              "index_uuid": "IOqjwfolQCmRF2dyKMhkUA",
              "index": "rc_2009-07",
              "caused_by": {
                "type": "illegal_argument_exception",
                "reason": "Value [[32 32 32 30 32 32 39 30 39 30]] is out of range for an integer"
              }
            }
          },

I think some of the shards are failing due to the index being out of range so you only get an incomplete set of comments. So, from this command we only get 10 comment when there are 104 comments in the original post...

Jump in the discussion.

No email address required.

that's almost certainly what's going on but idk what to do to work around it :marseydepressed:

Jump in the discussion.

No email address required.

This gets all the comments for that post:

https://api.pushshift.io/reddit/comment/search?link_id=2220229090&after=1673802423&size=150&subreddit=ScienceUncensored

The reason you only got ten comments was that that's the default page size. The errors were unrelated; you weren't going to get any comments from those shards because they predate the post anyway. So when you query, just set the "after" parameter to the time the OP was posted, and it won't search the older shards, so you don't get any errors. Then by setting the "size" parameter, you can increase the page size. I believe it's to a maximum of 500. There's presumably a way to page through the comments, but you'll have to check the docs for that.

Jump in the discussion.

No email address required.

oh shit, based!

Jump in the discussion.

No email address required.

Jump in the discussion.

No email address required.

ye you can do int(x, 36) but there is another issue (commenter editted his comment, see my response to this suggestion here

Jump in the discussion.

No email address required.

If shards are fricked then I don't think you can fix it without upstream's intervention. :marseysad:

Jump in the discussion.

No email address required.

despair

:marseydespair:

Jump in the discussion.

No email address required.

Link copied to clipboard
Action successful!
Error, please refresh the page and try again.