Unable to load image

Open Call For Coders: Please Help Me Fix Autodrama!

https://fsdfsd.net/HeyMoon/autodrama/src/branch/master/autodrama.py

hey bot-fans, you may have noticed that fan-favorite bot @autodrama stopped posting about three months ago. This is because the monkeys at pushshift screwed up the way that comments are retrieved. I call into pushshift's API here to get comments on a post, however, this randomly stopped working.

PMAW is a python wrapper for pushshift. I only use it once, and that is to get all the comments on a particular post. The API documentation says that I can do this with "link_id" (https://pushshift.io/api-parameters/) but when I try to hit api.pushshift.io/reddit/comment/search/?link_id=10pv7qa&subreddit=ScienceUncensored, I get this response:

{
    "data": [],
    "error": null,
    "errors": {
        "error": {
            "root_cause": [
                {
                    "type": "query_shard_exception",
                    "reason": "failed to create query: For input string: \"10pv7qa\"",
                    "index_uuid": "htpvQm2RT4uCWzM53Q3zuw",
                    "index": "rc_2005-12"
                },
                <above object 200 more times>
            ],
            "type": "search_phase_execution_exception",
            "reason": "all shards failed",
            "phase": "query",
            "grouped": true,
            "failed_shards": [
                {
                    "shard": 0,
                    "index": "rc_2005-12",
                    "node": "r5briY5hS_mr4RzlALKdow",
                    "reason": {
                        "type": "query_shard_exception",
                        "reason": "failed to create query: For input string: \"10pv7qa\"",
                        "index_uuid": "htpvQm2RT4uCWzM53Q3zuw",
                        "index": "rc_2005-12",
                        "caused_by": {
                            "type": "number_format_exception",
                            "reason": "For input string: \"10pv7qa\""
                        }
                    }
                },
                <above object 200 more times>
            ]
        },
        "status": 400
    }
}

Yes, I reported this issue to the maintainers of pushshift, but have gotten no response back.

Clearly, the backend is interpretting what should be strings as numbers, because this works: api.pushshift.io/reddit/comment/search/?link_id=100000&subreddit=trees

I also tried using pushshift's /comment_ids endpoint but shit's broken too.

Fellas, any thoughts? @automeme is my only child that is not estranged, unlike my other children @bbbb (who hates me 😭) and @automeme (who is clinically retarded)

40
Jump in the discussion.

No email address required.

It says number format exception, so it seems that it expects link_id to be an integer rather than a string of letters and digits.

The API documentation also says that link_id is an Integer. Apparently Reddit uses base 36 IDs, so you'll probably need to convert 10pv7qa from base 36 to base 10 and use that as the link_id argument.

Yep. This works:

https://api.pushshift.io/reddit/comment/search?link_id=2220229090&subreddit=ScienceUncensored

Edit: Actually, no, it doesn't work. That succeeds for 12 shards but fails for the rest with errors like this:

{"total":828,"successful":12,"skipped":0,"failed":816,"failures":[{"shard":0,"index":"rc_2005-12","node":"sYpN84pmQYuHQmYijuzUeg","reason":{"type":"query_shard_exception","reason":"failed to create query: Value [[32 32 32 30 32 32 39 30 39 30]] is out of range for an integer","index_uuid":"htpvQm2RT4uCWzM53Q3zuw","index":"rc_2005-12","caused_by":{"type":"illegal_argument_exception","reason":"Value [[32 32 32 30 32 32 39 30 39 30]] is out of range for an integer"}}},

I guess it's complaining because 2,220,229,090 is out of range for a 32-bit signed integer. But maybe that's okay, because it's only failing for older shards that predate the post anyway. You might want to limit your query so that it doesn't search for comments older than the post itself.

Edit: Got it working-ish here

Jump in the discussion.

No email address required.

well that's the other thing, if you look in the bottom of the response you will see other errors...

{
            "shard": 0,
            "index": "rc_2009-07",
            "node": "r5briY5hS_mr4RzlALKdow",
            "reason": {
              "type": "query_shard_exception",
              "reason": "failed to create query: Value [[32 32 32 30 32 32 39 30 39 30]] is out of range for an integer",
              "index_uuid": "IOqjwfolQCmRF2dyKMhkUA",
              "index": "rc_2009-07",
              "caused_by": {
                "type": "illegal_argument_exception",
                "reason": "Value [[32 32 32 30 32 32 39 30 39 30]] is out of range for an integer"
              }
            }
          },

I think some of the shards are failing due to the index being out of range so you only get an incomplete set of comments. So, from this command we only get 10 comment when there are 104 comments in the original post...

Jump in the discussion.

No email address required.

that's almost certainly what's going on but idk what to do to work around it :marseydepressed:

Jump in the discussion.

No email address required.

This gets all the comments for that post:

https://api.pushshift.io/reddit/comment/search?link_id=2220229090&after=1673802423&size=150&subreddit=ScienceUncensored

The reason you only got ten comments was that that's the default page size. The errors were unrelated; you weren't going to get any comments from those shards because they predate the post anyway. So when you query, just set the "after" parameter to the time the OP was posted, and it won't search the older shards, so you don't get any errors. Then by setting the "size" parameter, you can increase the page size. I believe it's to a maximum of 500. There's presumably a way to page through the comments, but you'll have to check the docs for that.

Jump in the discussion.

No email address required.

oh shit, based!

Jump in the discussion.

No email address required.

Jump in the discussion.

No email address required.

ye you can do int(x, 36) but there is another issue (commenter editted his comment, see my response to this suggestion here

Jump in the discussion.

No email address required.

If shards are fricked then I don't think you can fix it without upstream's intervention. :marseysad:

Jump in the discussion.

No email address required.

despair

:marseydespair:

Jump in the discussion.

No email address required.

Hey, you idiot. It's not our fault that your bot stopped posting. It's because you're using a shitty Python wrapper that doesn't work. So go screw yourself, and learn how to use the API properly.

Jump in the discussion.

No email address required.

cute twink, it's not the wrapper's fault that ht escript stopped working, it is the API's fault. Frick you queer

Jump in the discussion.

No email address required.

That's a really ignorant thing to say. Maybe if you did some actual research you would know that it's not the wrapper's fault, but the API's. But I guess that would require too much effort for you.

Jump in the discussion.

No email address required.

Lmao, I barely even do the coding I'm paid to do

Jump in the discussion.

No email address required.

I am paid to do coding but they tell me not to which is why I do it anyways :marseymad:

omg muh hecking regressions

muh scrum

muh story points

AAAAAAAAAAAAAAAAAA FRICK AGILE I JUST WANNA COOOOOOOOOOOOOOOOOOOOODE

Jump in the discussion.

No email address required.

AAAAAAAAAAAAAAAAAA FRICK AGILE I JUST WANNA COOOOOOOOOOOOOOOOOOOOODE

#relatable

Jump in the discussion.

No email address required.

yeah but this is rdrama.net bro it's actually important

Jump in the discussion.

No email address required.

jc aevann sneks !!

Jump in the discussion.

No email address required.

https://old.reddit.com/r/pushshift/comments/103k1qe/anyone_have_luck_using_the_link_id_param_in_the/j2zyjkp/

To query by link_id using the new API, you must (at least right now) convert from base 36 to base 10. So for this submission, 103k1qe converted to base 10 is:

https://api.pushshift.io/reddit/search/comment?link_id=2182756550

Python:

int('103k1qe', 36)

JavaScript:

parseInt('103k1qe', 36)

Recent submissions will generate errors from older shards, it appears that you can ignore them. They apparently occur (this is new to me) on shards containing comments from older submissions (before Dec. 2023 / before link_id 231).

Jump in the discussion.

No email address required.

unfortunately, same result for newer posts as I described here, with incomplete results. but, I am glad other people know about this!

Jump in the discussion.

No email address required.

Yeah, the comment addresses this at the end. I don't think there's anything you can do about it. The maintainer of the API seems to be in the middle of reindexing his ES cluster.

Jump in the discussion.

No email address required.

>tfw automeme could be back by monday and i literally just have to sit on my butt and wait for it to happen

codechads, its a good day

Jump in the discussion.

No email address required.

interesting!!!! let me give this a try

Jump in the discussion.

No email address required.

Since when Python had base as a second parameter for int()? TIL.

Jump in the discussion.

No email address required.

Fellas, any thoughts? @automeme is my only child that is not estranged, unlike my other children @bbbb (who hates me 😭) and @automeme (who is clinically r-slurred)

:#marseythonk:

Jump in the discussion.

No email address required.

:marseyeyeroll:

Jump in the discussion.

No email address required.

Have you tried turning it off and on again?

Jump in the discussion.

No email address required.

yes.

Jump in the discussion.

No email address required.

:#marseyquestion:

Did you try adding Marseys to it?

Jump in the discussion.

No email address required.

I started reading through your code and this is the first non-trivial function I see.

def get_comment_basedness_out_of_five(basedness: int, absolute : bool):
    if (absolute):
        if basedness > 1000:
            score = 5
        elif basedness > 500:
            score = 4
        elif basedness > 100:
            score = 3
        elif basedness > 50:
            score = 2
        elif basedness > 10:
            score = 1
        else:
            score = 0
    else:
        if basedness > 100:
            score = 5
        elif basedness > 50:
            score = 4
        elif basedness > 10:
            score = 3
        elif basedness > 5:
            score = 2
        elif basedness > 1:
            score = 1
        else:
            score = 0
    return get_score_string(score, "🔥", "🔘")

:marseyretard3:

Jump in the discussion.

No email address required.

lol, hey if it works it works. easier that defining a new function every time I want to change the algorithim

Jump in the discussion.

No email address required.

It looks like the API is returning an error message indicating that it failed to create a query because of an input string issue. Specifically, the error message says "For input string: "10pv7qa"", which suggests that the value you're passing for the "link_id" parameter is not in the correct format.

According to the Pushshift API documentation, the "link_id" parameter should be in the format of a base-36 Reddit ID. This means that you should remove the "t3_" prefix from the ID and pass the remaining string as the value for "link_id".

For example, if the ID of the post you're trying to retrieve comments for is "t3_10pv7qa", you should pass "10pv7qa" as the value for "link_id".

Try modifying your API call to use the correct format for the "link_id" parameter and see if that resolves the issue.

Jump in the discussion.

No email address required.

Looks like you gotta rewrite it all in rust. Good luck :marseysoylentgrin:

Jump in the discussion.

No email address required.

Do it yourself cute twink

Jump in the discussion.

No email address required.

Me: Hey maybe I can help

Python

:marseyrain:

Jump in the discussion.

No email address required.

What kind of codecel are you that u don't even know python?

Jump in the discussion.

No email address required.

I know :marseyhesright: html :marsey404: and css.

:marsey4chan:

Jump in the discussion.

No email address required.

:marse#ylaugh:

Jump in the discussion.

No email address required.

Did you try clearing out your cookies


https://i.rdrama.net/images/17191743323420358.webp

Jump in the discussion.

No email address required.

nothing to do with cookies

Jump in the discussion.

No email address required.

I didn't ask if it had anything to do with your cookies, I asked if you tried clearing them. Maybe that's your problem, you don't read.


https://i.rdrama.net/images/17191743323420358.webp

Jump in the discussion.

No email address required.

Link copied to clipboard
Action successful!
Error, please refresh the page and try again.